· EN

Claude Code's Memory System, Explained

Claude Code’s Memory system is the core infrastructure that lets the agent truly “know you.” Unlike traditional conversation history, it is a cross-session, structured, persistent memory mechanism — the agent not only remembers what you said, but also who you are, your preferences, your project context, and even your feedback on how it works.

1. The 5-layer memory architecture

Memory is not a single store. It is a hierarchy of five layers, each with its own lifecycle, write mechanism, and purpose.

Interactive · Memory: 5-Layer Architecture
Memory: 5-Layer Architecture
Click each layer for details
Single session
CLAUDE.mdUser-authoredAlways loaded
Session MemoryFEATURE-GATED
Conversation Chat Historycost: zero
↓ Cross-session / global
Team Memory
Auto MemoryFocus of this note
This is where CC learns about the user across conversations: role, preferences, project context, and pointers to external systems. Stored as local .md files with YAML frontmatter.
4 types: user / feedback / project / ref
~/.claude/projects/<slug>/memory/

From top to bottom, the first three layers (CLAUDE.md, session memory, conversation history) all have lifecycles bounded by a single session or actively managed by the user. The truly interesting layer is at the bottom: automatic memory — a layer that Claude Code learns and manages autonomously across multiple conversations.

2. Four memory types

Automatic memory doesn’t just record everything in a lump. It strictly distinguishes four types, each with different triggering conditions and purposes. These four types are essentially labels provided for the agent during retrieval — helping it quickly judge whether a memory is relevant to the current task.

Interactive · Four Memory Types
Four Memory Types
Essentially labels that support agent retrieval
User (user)
Role, domain, preferences
Who you are
Feedback (feedback)
Corrections and confirmations
How you want CC to work
Project (project)
Deadlines, decisions
What code and Git cannot see
Reference (ref)
Concerns outside the codebase
Pointers to external systems
MEMORY.md (index file):
1 - [User Profile](user_profile.md) — Backend engineer, 5 years of Python experience
2 - [Testing Rules](feedback_testing.md) — No mocking the database in integration tests
3 - [Auth Rewrite](project_auth.md) — Driven by compliance, deadline 2026-04-15
4 - [Bug Tracking](reference_linear.md) — Pipeline bugs tracked in Linear INGEST project
feedback_testing.md (single memory file):
---
name: Testing Rules
description: Integration tests must use real database connections; no mocking
type: feedback
---

The storage format is extremely simple: one .md file per memory, with a YAML frontmatter (name, description, type), plus a MEMORY.md index file as a table of contents. This design is both friendly for the agent to read and write, and easy for humans to view and edit directly.

3. How are memories written?

Writing memories happens in three stages: real-time extraction, periodic consolidation, and deletion judgment.

Interactive · Memory Write Flow
How is Memory written?
Three phases: extract → consolidate → delete
Background agent scans the last N messages
Receives existing memories to avoid duplicates
Worth remembering?
yes
Create a new memory file
Or update an existing one
Write to the MEMORY.md index
Table of contents + one-line summary for later retrieval
no
skip

Key design decisions:

  • Per-turn extraction is incremental — the background agent only looks at the most recent few messages, never re-reading the entire conversation.
  • Periodic consolidation is performed by a separate autoDream sub-agent with its own context, so it doesn’t interfere with the main conversation.
  • Deletion is conservative — better to keep possibly stale memories than to risk deleting useful information.

4. How is Memory retrieved?

Retrieval is the most elegant part of the Memory system. The core problem: a project may have hundreds of memories, but each conversation has a limited context window — how do you pick the most relevant ones?

Interactive · Memory Retrieval Flow
How is Memory retrieved?
How are the memories to load chosen?
MEMORY.md is always loaded into the system prompt
Index file, capped at 200 lines / 25KB
But individual memory files are not
↓ Non-blocking
Sonnet Relevance Filter
Even when the main model is Opus, filtering is still done by Sonnet
① Scan the frontmatter of every memory file
Up to 200 files, newest first
② Format a list
[type] filename (timestamp): description
③ Send the list + user query to Sonnet
④ Sonnet returns the top 5 most relevant filenames
When unsure, pick nothing
⑤ Only these 5 files are loaded into context
Already-shown files are excluded so new memories get a slot
That is why the description field is critical — it is the only thing Sonnet sees when judging relevance
Older memories are added with suspicion

A few subtle design choices stand out:

  • Sonnet does the filtering, not the main model — even if you’re using Opus, memory filtering is still done by Sonnet. This achieves separation of concerns: the main model focuses on reasoning, the filter model focuses on relevance judgment.
  • It only sees description, not content — during filtering, Sonnet can only see the description field in the frontmatter, not the memory’s full content. This is why the quality of your descriptions is critical.
  • Staleness warnings are framework-injected — they don’t rely on the agent’s self-discipline; the system automatically attaches warnings when loading memories.

5. How is Memory security guaranteed?

Letting an AI agent autonomously read and write the local file system is an unavoidable security question. Claude Code’s Memory system uses three layers of defense:

Interactive · Memory Security
Memory Security: Three-Layer Defense
Layer 1
Global Lockdown
Storage path only changeable globally
Layer 2
Path Validation
Blocks escaping paths
Layer 3
Sandbox Allowlist
Outside the list, rejected
Safe Write
Projects cannot change the path, preventing hijack by malicious repos
Climbing up with ..? Blocked. Pointing at root? Blocked
Agent runs in a sandbox; every write is checked one by one
Blocks every attempt to "sneak a write through the memory feature"

The core principle: don’t trust the model’s self-discipline. Security isn’t enforced by “please don’t do bad things” written into a prompt — it’s enforced by hard constraints at the code level. Paths are locked down, permissions are checked, sandboxes isolate execution — every layer is a code-level guarantee, not reliant on the model’s “understanding” or “cooperation.”

Summary

Interactive · Summary
The model is powerful, but the Harness does not trust it to manage its own memory unsupervised. Every step is constrained.
Write
Format enforced
YAML metadata
Four fixed types
Model cannot pick the format
Retrieve
Separate model filters
Sonnet does the filtering
Main model cannot intervene
Main model never touches filtering
Delete
No automatic trigger
Only judged during consolidation
Never silently removed
Memory only grows
Stale
Framework injects warnings
Dates mandatory
Agent must verify first
Old memories carry an "expiry date"

The Memory system embodies a core tenet of Claude Code’s architectural philosophy: the model is powerful, but the harness does not trust it to manage its own memory unsupervised. Every operation — write, retrieve, delete, stale handling — has an independent constraint mechanism. This isn’t a denial of the model’s capabilities; it’s engineering pragmatism: until agents are truly reliable, a safety net at the framework level is necessary.

From a user’s perspective, the Memory system turns Claude Code from “an assistant that starts from zero every time” into “a collaborator that knows you.” It remembers your coding style, your testing preferences, your project context, and even the things you’d rather it not do. As the conversation accumulates, this personalization becomes more and more precise — perhaps the most underrated feature in today’s AI coding tools.

Comments

0 comments