How AI Agents Remember: Understanding Agent Memory Systems

Every time you close a conversation with an AI chatbot and open a new one, you are essentially talking to someone who has never met you. That is because large language models are inherently stateless — they have no built-in mechanism to carry information from one session to the next. Every reply is generated from scratch based only on what is currently visible in the context window.

So how do AI assistants like Claude, ChatGPT, or open-source agents like OpenClaw appear to “remember” you across sessions? The answer is agent memory — a set of architectural patterns layered on top of a stateless model to simulate persistence. Damian Galarza, a Senior Software Engineer at August Health who builds clinical software for senior living operators, recently released an excellent video breakdown of how these memory systems actually work under the hood. This post unpacks his key insights and adds healthcare security context throughout.

The Statelessness Problem

LLMs operate with a finite context window — a fixed amount of text they can “see” at any given time. Once a conversation exceeds that window, older messages are dropped. There is no hard drive, no database, and no persistent state baked into the model itself. The model does not “know” anything about you beyond what has been placed into that window for the current request.

This creates a fundamental tension: users expect continuity (“You told me last week that...”) but the underlying technology provides none. Agent memory systems exist to bridge this gap, and they come in two broad categories:

Session memory: Information retained within a single conversation. This is the message history you see accumulating as you chat.
Long-term memory: Information that persists across sessions — your preferences, past decisions, facts about you — stored externally and injected back into future conversations.

Compaction: Managing the Finite Window

Because context windows are finite, conversations must be compacted — condensed down to preserve key information while discarding the noise. Galarza outlines three primary strategies for triggering compaction:

Count-based: Triggered when the conversation reaches a token or turn count threshold. Simple and predictable, but blind to the importance of what is being compressed.
Time-based: Triggered after a period of user inactivity. Useful for background processing when the user steps away.
Event-based (semantic): Triggered when a task or topic concludes. This is the most intelligent approach because it aligns compaction with natural conversation boundaries rather than arbitrary limits.

The compaction process itself typically uses the LLM to summarize the conversation into a condensed representation — essentially asking the model, “What are the most important facts and decisions from this conversation so far?” The summary replaces the full history in the context window, freeing up space for new interaction.

Google’s Memory Framework: Three Types of Memory

Google’s 2025 whitepaper on Context Engineering: Sessions & Memory (part of their 5-Day AI Agents Intensive series by Kimberly Milam and Antonio Gulli) provides a useful taxonomy for categorizing what agents remember, borrowing directly from cognitive science:

Memory Type	What It Stores	Healthcare Example
Episodic	What happened in past conversations — specific events, decisions, interactions	“Last Tuesday, the user asked about HIPAA audit logging for their new EHR integration.”
Semantic	Facts, preferences, and general knowledge about the user	“User is a security analyst at a 500-bed hospital system. Prefers NIST frameworks over ISO.”
Procedural	Workflows, routines, and learned processes for completing tasks	“When generating a risk assessment, always start with NIST AI RMF categories, then map to organizational controls.”

This three-part taxonomy is valuable because it mirrors how humans organize knowledge. It also exposes the design decisions that memory system builders must make: what gets remembered (extraction), how duplicates are handled (consolidation), and how outdated information is replaced (overwriting).

Principles of Effective Memory

Galarza emphasizes that a good memory system is not just a data dump. It requires intentional design around three core operations:

Extraction — Filtering What Matters. Not everything in a conversation is worth remembering. A good memory system identifies the signal (user preferences, key decisions, facts) and discards the noise (small talk, repeated context). The extraction step is arguably the most important piece of the entire pipeline — if you extract the wrong things, everything downstream suffers.
Consolidation — Collapsing Redundancy. Over time, an agent may learn the same fact multiple times through different conversations. “User works at Memorial Hospital” should not appear fifteen times in memory. Consolidation merges redundant entries into a single, authoritative record.
Overwriting — Updating Stale Information. People change jobs, preferences evolve, projects wrap up. A memory system must be able to update or retire facts that are no longer current. Without this, an agent will confidently act on outdated information — a particularly dangerous failure mode in healthcare contexts where roles, access levels, and compliance requirements change frequently.

OpenClaw: Memory in Practice

To ground these concepts in reality, Galarza walks through the memory architecture of OpenClaw — an open-source personal AI assistant created by Peter Steinberger that has gained significant traction (currently 145,000+ GitHub stars). What makes OpenClaw’s approach interesting from an educational standpoint is its simplicity: memory is stored as plain Markdown files on the user’s local machine

OpenClaw’s memory has three core components:

MEMORY.md — A curated file containing semantic memory: stable facts, user preferences, and durable context. Think of this as the agent’s long-term knowledge base about you. It is loaded at the start of every session.
Daily Logs (memory/YYYY-MM-DD.md) — Append-only files representing episodic memory, organized by date. Each day gets its own log capturing what happened during interactions. The agent loads today’s and yesterday’s logs at session start for recent context.
Session Snapshots — Another form of episodic memory. When a new session begins, the system captures the last 15 meaningful messages from the previous session as a snapshot, providing continuity between conversations.

Four Memory Mechanisms: When to Read and Write

The components above define where memory lives. The following four mechanisms define when memory is read and written — the operational triggers that keep the system current:

Bootstrap Loading. At the start of every new session, the system injects the MEMORY.md file and recent daily logs into the context window. This gives the agent immediate access to everything it “knows” about the user before the first message is even processed.
Pre-Compaction Flush. Before the context window is compacted (summarized), the system instructs the LLM to save any important information from the current conversation to the daily log. This acts as a checkpoint — ensuring that nothing critical is lost during compression.
Session Snapshots. When a new session starts, the system automatically saves a snapshot of the previous conversation’s last meaningful messages. This provides a bridge between sessions without relying on the user to manually summarize what happened.
User-Initiated Memory. When a user explicitly asks the agent to remember something (“Remember that my compliance deadline is March 15th”), the agent writes directly to semantic memory (MEMORY.md) or the daily log. This gives users direct control over what persists.

The elegance of this architecture is that it requires no complex infrastructure — no vector databases, no specialized memory services. It is Markdown files, clear instructions on what to remember, where to store it, and when to write it.

The Broader Landscape: From Simple to Sophisticated

OpenClaw represents one end of the memory architecture spectrum — intentionally simple and transparent. On the other end, more sophisticated systems are emerging. Tools like Mem0 (Y Combinator-backed) extract atomic facts from conversations using LLM calls, then compare new facts against existing memories to decide whether to add, update, or delete. Graphiti (by Zep AI) goes further, building temporal knowledge graphs where every fact carries four timestamps: when it became true, when it stopped being true, when the system learned it, and when the system recorded it as expired.

The December 2025 survey paper “Memory in the Age of AI Agents” from arXiv provides a comprehensive taxonomy, organizing agent memory across three dimensions: Forms (what carries memory — tokens, parameters, or hidden states), Functions (why agents need memory — factual, experiential, or working), and Dynamics (how memory evolves — formation, evolution, and retrieval). This growing body of research signals that agent memory is rapidly maturing from an afterthought into a core engineering discipline.

Practitioner Notes

Agent memory is not just a user experience feature — it is a data governance surface that healthcare security teams need to understand and address. Here is why this matters for our industry:

Memory Is PHI Storage by Another Name

If a clinical AI agent remembers that “Patient Jones has a history of substance use disorder” or “Dr. Smith’s schedule includes oncology follow-ups on Thursdays,” that memory store now contains Protected Health Information under HIPAA. It does not matter whether it lives in a vector database, a knowledge graph, or a Markdown file on a local workstation. If it can be linked to an individual, it is PHI, and the full weight of HIPAA’s Security Rule applies: access controls, encryption at rest and in transit, audit logging, and disposal requirements.

The “Overwrite” Problem Is a Safety Problem

Galarza highlights memory overwriting as a core principle — updating stale information when things change. In healthcare, this has direct safety implications. If an agent’s memory still reflects a patient’s previous medication regimen after it has been changed, or still associates a clinician with a department they have transferred out of, that outdated information could influence downstream recommendations or access decisions. Memory staleness is not just an inconvenience — in clinical contexts, it is a patient safety concern.

Memory Poisoning Is an Attack Vector

As we covered in our Indirect Prompt Injection post, agent memory systems are vulnerable to manipulation. If an attacker can influence what gets written to an agent’s memory — through a poisoned document, a crafted email, or a manipulated conversation — that malicious “memory” persists across sessions and can shape the agent’s behavior over time. Lakera’s Agent Breaker scenarios include “MindfulChat,” which demonstrates exactly this: a single poisoned memory entry that shapes agent behavior across sessions. For healthcare agents with access to clinical systems, this is a serious concern.

Practical Recommendations

Inventory your memory surfaces. If your organization uses AI assistants, identify where memory data is stored and classify it under your existing data governance framework.
Apply HIPAA controls to memory stores. Encryption, access controls, and audit logging apply whether the data lives in an EHR database or a Markdown file.
Define memory retention policies. Align agent memory retention with your organization’s records retention schedule. Implement automated expiration where possible.
Monitor for memory integrity. Consider what happens if memory entries are tampered with. Treat memory stores with the same change detection and integrity monitoring you apply to other sensitive data stores.
Prefer transparent architectures. OpenClaw’s Markdown-based approach, while simple, has a notable advantage for compliance: the memory is human-readable, auditable, and version-controllable via Git. When evaluating AI agent platforms, prioritize solutions where memory contents can be inspected, modified, and deleted by authorized personnel.

Bottom Line

Agent memory is what transforms a stateless chatbot into a persistent assistant that appears to know you. As Galarza’s video makes clear, effective memory does not require complex infrastructure — it requires clear decisions about what to remember, where to store it, and when to write it. For healthcare organizations, these are not just engineering decisions. They are governance decisions, compliance decisions, and in some cases, patient safety decisions.

The technology is maturing quickly. Google’s framework, the growing research literature, and open-source implementations like OpenClaw are establishing patterns that will increasingly show up in the commercial tools healthcare organizations evaluate and deploy. Understanding how agent memory works today puts security teams in a stronger position to set policies and requirements before these systems are already embedded in clinical workflows.