Claude Code's Memory System, Explained

Claude Code’s Memory system is the core infrastructure that lets the agent truly “know you.” Unlike traditional conversation history, it is a cross-session, structured, persistent memory mechanism — the agent not only remembers what you said, but also who you are, your preferences, your project context, and even your feedback on how it works.

1. The 5-layer memory architecture

Memory is not a single store. It is a hierarchy of five layers, each with its own lifecycle, write mechanism, and purpose.

Interactive · Memory: 5-Layer Architecture

Memory: 5-Layer Architecture

Click each layer for details

Single session

CLAUDE.mdUser-authoredAlways loaded

Session MemoryFEATURE-GATED

Conversation Chat Historycost: zero

↓ Cross-session / global

Team Memory

■Auto MemoryFocus of this note

This is where CC learns about the user across conversations: role, preferences, project context, and pointers to external systems. Stored as local .md files with YAML frontmatter.

4 types: user / feedback / project / ref

~/.claude/projects/<slug>/memory/

From top to bottom, the first three layers (CLAUDE.md, session memory, conversation history) all have lifecycles bounded by a single session or actively managed by the user. The truly interesting layer is at the bottom: automatic memory — a layer that Claude Code learns and manages autonomously across multiple conversations.

2. Four memory types

Automatic memory doesn’t just record everything in a lump. It strictly distinguishes four types, each with different triggering conditions and purposes. These four types are essentially labels provided for the agent during retrieval — helping it quickly judge whether a memory is relevant to the current task.

Interactive · Four Memory Types

Four Memory Types

Essentially labels that support agent retrieval

User (user)

Role, domain, preferences

Who you are

Feedback (feedback)

Corrections and confirmations

How you want CC to work

Project (project)

Deadlines, decisions

What code and Git cannot see

Reference (ref)

Concerns outside the codebase

Pointers to external systems

MEMORY.md (index file):

- [User Profile](user_profile.md) — Backend engineer, 5 years of Python experience
- [Testing Rules](feedback_testing.md) — No mocking the database in integration tests
- [Auth Rewrite](project_auth.md) — Driven by compliance, deadline 2026-04-15
- [Bug Tracking](reference_linear.md) — Pipeline bugs tracked in Linear INGEST project

feedback_testing.md (single memory file):
---
name: Testing Rules
description: Integration tests must use real database connections; no mocking
type: feedback
---

The storage format is extremely simple: one .md file per memory, with a YAML frontmatter (name, description, type), plus a MEMORY.md index file as a table of contents. This design is both friendly for the agent to read and write, and easy for humans to view and edit directly.

3. How are memories written?

Writing memories happens in three stages: real-time extraction, periodic consolidation, and deletion judgment.

Interactive · Memory Write Flow

How is Memory written?

Three phases: extract → consolidate → delete

Background agent scans the last N messages

Receives existing memories to avoid duplicates

Worth remembering?

yes

Create a new memory file

Or update an existing one

Write to the MEMORY.md index

Table of contents + one-line summary for later retrieval

skip

Key design decisions:

Per-turn extraction is incremental — the background agent only looks at the most recent few messages, never re-reading the entire conversation.
Periodic consolidation is performed by a separate autoDream sub-agent with its own context, so it doesn’t interfere with the main conversation.
Deletion is conservative — better to keep possibly stale memories than to risk deleting useful information.

4. How is Memory retrieved?

Retrieval is the most elegant part of the Memory system. The core problem: a project may have hundreds of memories, but each conversation has a limited context window — how do you pick the most relevant ones?

Interactive · Memory Retrieval Flow

How is Memory retrieved?

How are the memories to load chosen?

MEMORY.md is always loaded into the system prompt

Index file, capped at 200 lines / 25KB

But individual memory files are not

↓ Non-blocking

Sonnet Relevance Filter

Even when the main model is Opus, filtering is still done by Sonnet

① Scan the frontmatter of every memory file

Up to 200 files, newest first

② Format a list

[type] filename (timestamp): description

③ Send the list + user query to Sonnet

④ Sonnet returns the top 5 most relevant filenames

When unsure, pick nothing

⑤ Only these 5 files are loaded into context

Already-shown files are excluded so new memories get a slot

That is why the description field is critical — it is the only thing Sonnet sees when judging relevance

Older memories are added with suspicion▾

A few subtle design choices stand out:

Sonnet does the filtering, not the main model — even if you’re using Opus, memory filtering is still done by Sonnet. This achieves separation of concerns: the main model focuses on reasoning, the filter model focuses on relevance judgment.
It only sees description, not content — during filtering, Sonnet can only see the description field in the frontmatter, not the memory’s full content. This is why the quality of your descriptions is critical.
Staleness warnings are framework-injected — they don’t rely on the agent’s self-discipline; the system automatically attaches warnings when loading memories.

5. How is Memory security guaranteed?

Letting an AI agent autonomously read and write the local file system is an unavoidable security question. Claude Code’s Memory system uses three layers of defense:

Interactive · Memory Security

Memory Security: Three-Layer Defense

Layer 1

Global Lockdown

Storage path only changeable globally

→

Layer 2

Path Validation

Blocks escaping paths

→

Layer 3

Sandbox Allowlist

Outside the list, rejected

→

Safe Write

Projects cannot change the path, preventing hijack by malicious repos

Climbing up with ..? Blocked. Pointing at root? Blocked

Agent runs in a sandbox; every write is checked one by one

Blocks every attempt to "sneak a write through the memory feature"

The core principle: don’t trust the model’s self-discipline. Security isn’t enforced by “please don’t do bad things” written into a prompt — it’s enforced by hard constraints at the code level. Paths are locked down, permissions are checked, sandboxes isolate execution — every layer is a code-level guarantee, not reliant on the model’s “understanding” or “cooperation.”

Summary

Interactive · Summary

The model is powerful, but the Harness does not trust it to manage its own memory unsupervised. Every step is constrained.

Write

Format enforced

YAML metadata

Four fixed types

Model cannot pick the format

Retrieve

Separate model filters

Sonnet does the filtering

Main model cannot intervene

Main model never touches filtering

Delete

No automatic trigger

Only judged during consolidation

Never silently removed

Memory only grows

Stale

Framework injects warnings

Dates mandatory

Agent must verify first

Old memories carry an "expiry date"

The Memory system embodies a core tenet of Claude Code’s architectural philosophy: the model is powerful, but the harness does not trust it to manage its own memory unsupervised. Every operation — write, retrieve, delete, stale handling — has an independent constraint mechanism. This isn’t a denial of the model’s capabilities; it’s engineering pragmatism: until agents are truly reliable, a safety net at the framework level is necessary.

From a user’s perspective, the Memory system turns Claude Code from “an assistant that starts from zero every time” into “a collaborator that knows you.” It remembers your coding style, your testing preferences, your project context, and even the things you’d rather it not do. As the conversation accumulates, this personalization becomes more and more precise — perhaps the most underrated feature in today’s AI coding tools.