How to Build a Smart Memory System for OpenClaw
Cut your API costs by stopping OpenClaw from sending your entire context window on every single message. Copy-paste prompts, zero config guesswork.
Table of Contents
Prefer video? Watch the complete step-by-step walkthrough above.
Why OpenClaw Gets Expensive
Every time you send a message in OpenClaw, it includes your entire context window in the API call — your system prompt, every tool result, every message in your session history, every file it has loaded. All of it, every time, whether it is relevant to your question or not.
This is called the context window problem. A fresh session might cost you a few hundred tokens per turn. An active session with loaded files and a few hours of history? Thousands of tokens per turn. Multiply that across a full day of work and you are paying for the same context over and over again.
The core issue: OpenClaw has no built-in system that saves what matters and discards what does not before the context resets. So everything accumulates until it hits the limit, then gets wiped — and whatever was not saved is gone.
The fix is a smart memory architecture: a set of rules and config changes that teach your agent what to save before a reset, what is worth keeping long-term, how to keep loaded files lean, and how to maintain that system week over week without it drifting back into bloat.
A 3-Workstream Memory System
The system is split into three independent workstreams, each handling one layer of the problem. Each workstream has one prompt. It reads your current config, shows you exactly what will change, and waits. Nothing is applied until you reply implement. If your agent proposes something different from the plan, review it carefully before agreeing.
Total prompts across 3 workstreams
Config changes applied without your review first
How to Use These Prompts
Each prompt is fully standalone — you do not need to run them in order within the same session. You can paste any prompt into a fresh OpenClaw session on any day and it will auto-detect your setup from scratch.
⚠️ Important — Back Up Your Config First
Before making any changes to your OpenClaw configuration, it's a good idea to create a backup. Open your terminal and run cp /root/.openclaw/openclaw.json /root/.openclaw/openclaw.json.bak — this creates a copy of your config file called openclaw.json.bak in the same directory. If something goes wrong and you need to restore the original, simply run cp /root/.openclaw/openclaw.json.bak /root/.openclaw/openclaw.json and your config will be back to the way it was before your changes.
Recommended Order for First-Time Setup
Paste P1, P2, and P3 into your agent one at a time. Each will read your current config and show you exactly what it plans to change — nothing is applied yet. Review the plan, and if you agree, reply implement. If the agent proposes something different from the plan, review it carefully before approving. Once all three are done, re-run any that need tuning.
Save-Before-Reset
By default, OpenClaw has no memory flush — when the context window fills up and compaction fires, everything in the session is wiped. Whatever was not manually saved is gone.
This workstream installs a memory flush: a silent background turn that fires automatically when your context gets close to the limit. Think of it like a low-fuel warning that triggers a quick save before the tank empties. Two numbers control it:
How the Timing Works
- reserveTokensFloor: 20,000 — a hard reserve at the bottom of the context window. The agent always has this headroom to finish its reply, no matter what.
- softThresholdTokens: 4,000 — the early warning trigger. The flush fires 4,000 tokens above the floor.
In a 200,000 token window: 200,000 − 20,000 − 4,000 = 176,000 — the flush fires when 176,000 tokens are used, leaving 24,000 tokens of safe working space.
The prompt text inside the flush is engineered to be short and unambiguous — a binary yes/no filter on every piece of context. No vague rules, no room for interpretation. Fewer tokens spent deciding, more signal saved.
Important: If your agent's workspaceAccess is set to ro or none, the flush will silently fail even if configured correctly. P1 checks this automatically and flags it as a blocker before proceeding.
Step 1 — Read & Plan (shows you what will change, nothing applied yet)
WORKSTREAM A — Save-Before-Reset (PLAN ONLY, DO NOT IMPLEMENT)
Goal:
Assess current state and produce an exact planned config change for agents.defaults.compaction. Do not apply config, do not restart, do not update anything.
STEP 1 — DETECT SCOPE
Auto-detect whether this OpenClaw instance is single-agent or multi-agent.
Definitions:
- multi-agent = more than one configured agent (including current/main)
- single-agent = only one configured agent (current/main)
Discovery order (strict):
1) agents_list
2) if unavailable/insufficient, read config
3) if still unavailable, output UNKNOWN and explain why
Do not assume agent names.
STEP 2 — READ CURRENT CONFIG STATE
Read openclaw.json and inspect:
- agents.defaults.compaction.reserveTokensFloor
- agents.defaults.compaction.memoryFlush.enabled
- agents.defaults.compaction.memoryFlush.softThresholdTokens
- agents.defaults.compaction.memoryFlush.systemPrompt
- agents.defaults.compaction.memoryFlush.prompt
Also check workspaceAccess for agents in scope.
- If workspaceAccess is "ro" or "none", STOP and flag BLOCKER.
- Do not continue unless workspace access is confirmed writable/effectively writable.
Reference docs ONLY if needed for unclear keys:
https://docs.openclaw.ai/concepts/memory
STEP 3 — PLAN EXACT TARGET CONFIG (NO IMPLEMENTATION)
Target values (use exactly, no edits):
- agents.defaults.compaction.reserveTokensFloor = 20000
- agents.defaults.compaction.memoryFlush.enabled = true
- agents.defaults.compaction.memoryFlush.softThresholdTokens = 4000
- agents.defaults.compaction.memoryFlush.systemPrompt = "Context nearing compaction. Curate memory now. Save only items passing at least one test: useful in 30+ days, changes future execution, or is a confirmed decision/preference/project state/commitment/constraint. Skip greetings, chat, noise, speculation, duplicates, secrets/PII. Merge/update existing entries instead of appending. Nothing qualifies = NO_REPLY."
- agents.defaults.compaction.memoryFlush.prompt = "Run memory curation before compaction. MEMORY.md: long-term truths only. memory/YYYY-MM-DD.md: today operational notes only. Apply the three-test filter from system instructions. Before writing: read target file, merge/update overlaps, never append duplicates. Nothing qualifies = NO_REPLY."
If keys differ, show exact BEFORE vs AFTER.
If keys are missing, show as "not set" in BEFORE and exact target in AFTER.
Do not invent alternative values.
REQUIRED OUTPUT FORMAT
Return exactly these sections in order:
1) Detected setup:
2) Agents in scope:
3) workspaceAccess status:
4) Existing config found:
5) Config diff:
- BEFORE:
- AFTER:
6) Risk notes
TRANSPORT-SAFE OUTPUT RULES (MANDATORY)
- Keep response concise and reliable for chat delivery.
- No full resulting JSON block.
- Report only required keys and exact BEFORE/AFTER values for those keys.
- Keep formatting simple and compact.
When I reply "implement", apply the pruning config changes exactly as shown and verify they worked.
Low-Token Controls
Think of every message you send to your agent like a delivery truck making a trip. The truck has to carry everything in the context window on every single trip — not just your new question, but every file the agent has open, every instruction it was given, every result from every tool it ran earlier in the session. The heavier the truck, the more it costs.
This workstream attacks the weight from two directions. The first is cleaning up old cargo that never gets unloaded. The second is making sure the permanent cargo the truck always carries is as light as possible. Together they reduce what gets sent to the API on every turn — which directly reduces your bill.
Problem 1 — Old Tool Results That Pile Up During a Session
Every time your agent reads a file, runs a command, or searches the web, the result gets added to the context window and stays there for the rest of the session. Even after the agent has extracted the one line it needed from a 500-line file, all 500 lines keep riding along on every subsequent API call.
Session pruning fixes this. It is a background process that runs automatically before each API call and quietly removes old tool results that are no longer needed — the ones where the agent already got what it needed and moved on. It only touches these stale tool outputs; it never deletes your messages or the agent's replies, and it never touches the transcript saved on disk.
One important detail: session pruning is Anthropic-only. It does not work if you are routing through a different provider. P2 will check whether it is already running and flag this clearly. If it is not enabled, turning it on is the first recommended step — it is a zero-config win that can meaningfully cut per-turn costs in long sessions.
Problem 2 — Always-Loaded Files That Grow Over Time
Some files ride in the truck on every trip, no matter what — your agent's instruction file, MEMORY.md, system prompts, and any workspace files set to auto-load. These are the permanent cargo.
Over weeks of use, these files grow. A few extra paragraphs in your instruction file. A MEMORY.md that never gets pruned. Notes that were useful once and never removed. Each extra line is a small tax, but it gets paid on every single API call, forever.
File size awareness helps here. P2 reads what is actually loaded right now and reports each file's token cost so you know where the weight is. It recommends a max size target per file but does not edit the files — trimming file content is a manual step you do yourself once you know which files are worth pruning. There is also a setting called bootstrapMaxChars (default: 20,000 characters) that hard-caps how much workspace content can be injected per message. Tuning it down is one of the fastest ways to cut baseline token usage.
Step 1 — Read & Plan (shows you what will change, nothing applied yet)
WORKSTREAM B - Low-Token Controls: plan only, do not implement yet.
# STEP 1 - DETECT SCOPE:
Auto-detect whether this OpenClaw instance is single-agent or multi-agent. Multi-agent means more than one configured agent (including main/current). Single-agent means only the main/current agent is configured. Do not assume agent names. Use this discovery order:
1. agents_list
2. if unavailable, read config
3. if still unavailable, write UNKNOWN and explain why
# STEP 2 - READ CURRENT CONFIG STATE:
Read the current OpenClaw configuration before designing anything. Check for any native file size limits, token caps, or trim settings. Check if OpenClaw already has controls for always-loaded file sizes. Check session pruning config - session pruning trims old tool results in-memory before each LLM call and may already be running for free token savings. If any of these exist, show what they are and whether we just need to enable or tune them. Only design new controls for what does not already exist.
# STEP 3 - RUN DIAGNOSTIC:
Check which workspace files are injected into the system prompt and their sizes. Report the top contributors by token estimate. You can read files directly or check the session's systemPromptReport if available.
# STEP 4 - DESIGN LOW-TOKEN CONTROLS:
Add low-token controls based on what the diagnostic revealed.
# Requirements:
- If session pruning is not enabled, include enabling it as a zero-effort first step.
- For always-loaded files that are oversized, recommend a max size target per file.
- Prefer summaries and delta updates over long dumps.
- Do not propose file edits here - pruning is a config change, not a file trim. File content changes are a separate decision the user makes manually.
# Output format - bullet lists only, no markdown tables:
- Detected setup: single-agent OR multi-agent
- Agents in scope:
- Session pruning: current state (enabled/disabled/unknown) and recommended action (enable/tune/no change needed)
- Always-loaded files found:
- FILE: (name)
- CURRENT SIZE: (lines or token estimate)
- RECOMMENDED MAX: (target size, or "no change needed")
Config diff for any pruning settings to change:
- BEFORE: (exact current value or "not set")
- AFTER: (exact proposed value)
When I reply "implement", apply the pruning config changes exactly as shown and verify they worked.
Weekly Maintenance
Memory systems drift. Without regular pruning, MEMORY.md grows, duplicates slip back in, and the token savings you worked for quietly erode over time. This workstream is a simple weekly cleanup prompt — paste it in, let the agent do the work, and it tells you what it did when it's done.
What It Does
Merges duplicate entries in MEMORY.md, archives anything that hasn't been relevant in 30+ days, keeps daily notes in their dated files rather than bleeding into MEMORY.md, and applies a quality gate — if an entry isn't useful for future execution, it doesn't stay. The agent replies with a short summary and the word DONE when finished.
Paste into OpenClaw — Runs Directly, No Plan Step Needed
Do two things:
1. Run a memory cleanup now using these rules:
• Merge duplicates in MEMORY.md
• Archive anything not relevant for 30+ days
• Keep daily notes in memory/YYYY-MM-DD.md (no bleed into MEMORY.md)
• If not useful for future execution, do not save it
2. Set this as an automated weekly task with the same rules.
Do not implement yet.
When I reply “implement”, run it now, create the weekly automation, and verify both succeeded.
What You Just Built
Three prompts. Three config changes. A memory system that now works the way OpenClaw should have worked out of the box.
Workstream A means your agent no longer loses everything when the context resets — it saves what matters before the window fills, automatically, every time. Workstream B means you stopped paying for the same dead weight on every API call. Workstream C means the system stays lean over time instead of slowly drifting back into bloat.
None of it required custom code, a new tool, or rebuilding your setup from scratch. Just config and prompts — the kind of thing that should take an afternoon and then disappear into the background.
If this helped, share it with someone paying too much for their OpenClaw sessions. More tutorials at komputermechanic.com.