Home TutorialsOpenClaw Memory System

— AI Agents

How to Build a Smart Memory System for OpenClaw

Cut your API costs by stopping OpenClaw from sending your entire context window on every single message. Copy-paste prompts, zero config guesswork.

Usage & License

Use the prompts and downloadable templates on this page in your own personal and commercial projects. What's not allowed: reselling, repackaging, redistributing, or republishing this material — on YouTube, Gumroad, paid courses, newsletters, or anywhere else. Send people to komputermechanic.com instead. Violations get reported and taken down.

Tutorial Sponsor

Sodales AI

Proudly supporting this tutorial.

Table of Contents

01.The Problem: Why OpenClaw Gets Expensive
02.The Solution: A 3-Workstream Memory System
03.How to Use These Prompts
04.Workstream A — Save-Before-Reset
05.Workstream B — Low-Token Controls
06.Workstream C — Weekly Maintenance

— Watch the Full Tutorial

Prefer video? Watch the complete step-by-step walkthrough above.

THE PROBLEM

Why OpenClaw Gets Expensive

Every time you send a message in OpenClaw, it includes your entire context window in the API call — your system prompt, every tool result, every message in your session history, every file it has loaded. All of it, every time, whether it is relevant to your question or not.

This is called the context window problem. A fresh session might cost you a few hundred tokens per turn. An active session with loaded files and a few hours of history? Thousands of tokens per turn. Multiply that across a full day of work and you are paying for the same context over and over again.

The core issue: OpenClaw has no built-in system that saves what matters and discards what does not before the context resets. So everything accumulates until it hits the limit, then gets wiped — and whatever was not saved is gone.

The fix is a smart memory architecture: a set of rules and config changes that teach your agent what to save before a reset, what is worth keeping long-term, how to keep loaded files lean, and how to maintain that system week over week without it drifting back into bloat.

THE SOLUTION

A 3-Workstream Memory System

The system is split into three independent workstreams, each handling one layer of the problem. Each workstream has one prompt. It reads your current config, shows you exactly what will change, and waits. Nothing is applied until you reply implement. If your agent proposes something different from the plan, review it carefully before agreeing.

3

Total prompts across 3 workstreams

0

Config changes applied without your review first

P1

A — Save-Before-Reset

Plans the memory flush config. Detects scope, reads current state, shows exact diff.

P2

B — Low-Token Controls

Runs a context diagnostic, identifies the biggest token contributors, plans trims.

P3

C — Weekly Maintenance

Merges duplicates, archives stale entries, keeps daily notes separate, applies quality gate. Replies DONE when finished.

BEFORE YOU START

How to Use These Prompts

Each prompt is fully standalone — you do not need to run them in order within the same session. You can paste any prompt into a fresh OpenClaw session on any day and it will auto-detect your setup from scratch.

⚠️ Important — Back Up Your Config First

Before making any changes to your OpenClaw configuration, it's a good idea to create a backup. Open your terminal and run cp /root/.openclaw/openclaw.json /root/.openclaw/openclaw.json.bak — this creates a copy of your config file called openclaw.json.bak in the same directory. If something goes wrong and you need to restore the original, simply run cp /root/.openclaw/openclaw.json.bak /root/.openclaw/openclaw.json and your config will be back to the way it was before your changes.

Recommended Order for First-Time Setup

Paste P1, P2, and P3 into your agent one at a time. Each will read your current config and show you exactly what it plans to change — nothing is applied yet. Review the plan, and if you agree, reply implement. If the agent proposes something different from the plan, review it carefully before approving. Once all three are done, re-run any that need tuning.

WORKSTREAM A

Save-Before-Reset

By default, OpenClaw has no memory flush — when the context window fills up and compaction fires, everything in the session is wiped. Whatever was not manually saved is gone.

This workstream installs a memory flush: a silent background turn that fires automatically when your context gets close to the limit. Think of it like a low-fuel warning that triggers a quick save before the tank empties. Two numbers control it:

How the Timing Works

reserveTokensFloor: 20,000 — a hard reserve at the bottom of the context window. The agent always has this headroom to finish its reply, no matter what.
softThresholdTokens: 4,000 — the early warning trigger. The flush fires 4,000 tokens above the floor.

In a 200,000 token window: 200,000 − 20,000 − 4,000 = 176,000 — the flush fires when 176,000 tokens are used, leaving 24,000 tokens of safe working space.

The prompt text inside the flush is engineered to be short and unambiguous — a binary yes/no filter on every piece of context. No vague rules, no room for interpretation. Fewer tokens spent deciding, more signal saved.

Important: If your agent's workspaceAccess is set to ro or none, the flush will silently fail even if configured correctly. P1 checks this automatically and flags it as a blocker before proceeding.

Step 1 — Read & Plan (shows you what will change, nothing applied yet)

P1 — Save-Before-Reset: Plan

Loading...

WORKSTREAM B

Low-Token Controls

Think of every message you send to your agent like a delivery truck making a trip. The truck has to carry everything in the context window on every single trip — not just your new question, but every file the agent has open, every instruction it was given, every result from every tool it ran earlier in the session. The heavier the truck, the more it costs.

This workstream attacks the weight from two directions. The first is cleaning up old cargo that never gets unloaded. The second is making sure the permanent cargo the truck always carries is as light as possible. Together they reduce what gets sent to the API on every turn — which directly reduces your bill.

Problem 1 — Old Tool Results That Pile Up During a Session

Every time your agent reads a file, runs a command, or searches the web, the result gets added to the context window and stays there for the rest of the session. Even after the agent has extracted the one line it needed from a 500-line file, all 500 lines keep riding along on every subsequent API call.

Session pruning fixes this. It is a background process that runs automatically before each API call and quietly removes old tool results that are no longer needed — the ones where the agent already got what it needed and moved on. It only touches these stale tool outputs; it never deletes your messages or the agent's replies, and it never touches the transcript saved on disk.

One important detail: session pruning is Anthropic-only. It does not work if you are routing through a different provider. P2 will check whether it is already running and flag this clearly. If it is not enabled, turning it on is the first recommended step — it is a zero-config win that can meaningfully cut per-turn costs in long sessions.

Problem 2 — Always-Loaded Files That Grow Over Time

Some files ride in the truck on every trip, no matter what — your agent's instruction file, MEMORY.md, system prompts, and any workspace files set to auto-load. These are the permanent cargo.

Over weeks of use, these files grow. A few extra paragraphs in your instruction file. A MEMORY.md that never gets pruned. Notes that were useful once and never removed. Each extra line is a small tax, but it gets paid on every single API call, forever.

File size awareness helps here. P2 reads what is actually loaded right now and reports each file's token cost so you know where the weight is. It recommends a max size target per file but does not edit the files — trimming file content is a manual step you do yourself once you know which files are worth pruning. There is also a setting called bootstrapMaxChars (default: 20,000 characters) that hard-caps how much workspace content can be injected per message. Tuning it down is one of the fastest ways to cut baseline token usage.

Step 1 — Read & Plan (shows you what will change, nothing applied yet)

P2 — Low-Token Controls: Plan

Loading...

WORKSTREAM C

Weekly Maintenance

Memory systems drift. Without regular pruning, MEMORY.md grows, duplicates slip back in, and the token savings you worked for quietly erode over time. This workstream is a simple weekly cleanup prompt — paste it in, let the agent do the work, and it tells you what it did when it's done.

What It Does

Merges duplicate entries in MEMORY.md, archives anything that hasn't been relevant in 30+ days, keeps daily notes in their dated files rather than bleeding into MEMORY.md, and applies a quality gate — if an entry isn't useful for future execution, it doesn't stay. The agent replies with a short summary and the word DONE when finished.

Paste into OpenClaw — Runs Directly, No Plan Step Needed

P3 — Weekly Maintenance

Loading...

YOU'RE DONE

What You Just Built

Three prompts. Three config changes. A memory system that now works the way OpenClaw should have worked out of the box.

Workstream A means your agent no longer loses everything when the context resets — it saves what matters before the window fills, automatically, every time. Workstream B means you stopped paying for the same dead weight on every API call. Workstream C means the system stays lean over time instead of slowly drifting back into bloat.

None of it required custom code, a new tool, or rebuilding your setup from scratch. Just config and prompts — the kind of thing that should take an afternoon and then disappear into the background.

If this helped, share it with someone paying too much for their OpenClaw sessions. More tutorials at komputermechanic.com.