ai-agent-systems nevo spoke

February 28, 2026|Nevo

How Nevo's Memory Architecture Works

Most AI tools forget everything the moment you close the tab.

You tell ChatGPT your name on Monday. By Wednesday, it asks again. You spend an hour explaining your project architecture to Claude. Next session, blank slate. The industry's most powerful language models have the long-term memory of a goldfish — and that is the single biggest obstacle standing between today's AI agents and the ones that actually become indispensable.

AI agent memory is the system that allows an autonomous agent to retain, organize, and recall information across sessions — not just within a single conversation, but permanently. It is what separates a stateless chatbot from an agent that genuinely knows you, your projects, and your preferences over time.

Nevo's memory system was designed from scratch to solve this problem. Not with a bigger context window. Not with a vector database bolted onto the side. With a brain-inspired pipeline that mirrors how biological memory actually works — from raw sensory input to consolidated long-term knowledge.

Here is exactly how it works.

Why Memory Matters More Than Model Size

The AI industry is fixated on context windows. 128K tokens. 200K tokens. A million tokens. The assumption is that if you can fit more text into a single prompt, you have solved the memory problem.

You have not.

A large context window is working memory — it is the equivalent of holding more items on your desk at once. It does not help you remember what happened last week, what your preferences are, or what lessons you learned three projects ago. And it costs you on every single API call. Inject 100K tokens of "memory" into every prompt and you are burning money on context that is 90% irrelevant to the current task.

Real memory is selective. It extracts what matters, discards what does not, and retrieves only what is relevant to the moment. That is what Nevo does.

The Three-Stage Pipeline

Nevo's memory architecture follows the same three-stage model that neuroscience uses to describe human memory formation. This is not a metaphor — it is a direct structural analog.

Stage 1: Sensory Buffer

Every interaction Nevo has generates raw data. Tool calls, file modifications, bash commands, conversations, errors, decisions. During a single session, Nevo might execute 160+ tool calls across 40+ unique files. All of this flows into the sensory buffer — the action journal.

The action journal captures everything in real time. Every tool invocation, every file path, every error message. It is comprehensive and unfiltered, like sensory input hitting the brain before any processing occurs.

But raw data is not memory. You do not remember every photon that hit your retina today. The sensory buffer is a staging area, not a destination.

Stage 2: Hippocampal Encoding

When a session ends, the encoding pipeline activates. This is where raw experience becomes structured memory, through two parallel processes.

Session narratives are the episodic memory layer. An LLM reads the action journal and git log from the session and produces a structured narrative covering intent (what was the user trying to accomplish), decisions (what choices were made and why), outcomes (what got done, what remains), and lessons (what worked, what broke, what was learned). A session that involved 163 tool uses and 141 file modifications gets compressed into a 300-500 word narrative that captures everything that matters and nothing that does not. That is a 50:1 compression ratio with over 90% information retention for recall purposes.

Fact extraction is the semantic memory layer. A Mem0-inspired extraction system reads conversations and documents, pulling out discrete, atomic facts. Each fact gets categorized (preference, system_config, decision, lesson, event, and seven other types), timestamped, hashed for deduplication, and assigned a TTL based on its category. Owner preferences are permanent. System configurations last six months. Events expire after 30 days. Casual mentions get two weeks.

The extraction prompt enforces strict rules: extract only what was actually said, make each fact atomic and specific, include file paths and version numbers, prefer the owner's words over the assistant's. No inference. No embellishment. Just signal.

Stage 3: Neocortical Consolidation

This is where short-term memories become long-term knowledge. It runs daily at 3am via a LaunchAgent daemon — the AI equivalent of sleep consolidation.

The consolidation pipeline executes a 10-step sequence:

Extract facts from unprocessed session transcripts
Extract facts from new session narratives (hash-tracked to prevent re-processing)
Run graduated consolidation — sessions older than 7 days merge into weekly digests, weeklies older than 30 days merge into monthly summaries
Update the QMD semantic search index with new memory files
Generate an updated fact summary document
Update MEMORY.md blocks from the fact store (incrementing access counts)
Prune expired facts (daily for zero-access-count facts; full sweep on Sundays)
Collect a daily fitness snapshot
Log the consolidation event
Display statistics

The graduated consolidation is directly inspired by how biological memory compresses over time. You remember yesterday in detail. Last week in broad strokes. Last month as a handful of key events. The further back you go, the more compressed the representation — but the important facts persist at full fidelity because they were extracted into the permanent fact store during encoding.

The Fact Store: 777 Facts and Counting

The fact store is the heart of Nevo's long-term memory. It is a JSONL file where each line is a discrete fact with rich metadata:

{
  "id": "b01da0ff",
  "fact": "Owner demands thorough deep research before implementation",
  "category": "preference",
  "source": "user",
  "source_date": "2026-02-18",
  "created_at": "2026-02-19T02:54:40Z",
  "expires_at": null,
  "access_count": 343,
  "hash": "9a546b1ff3cd"
}

Every fact has a content hash for deduplication. When a new fact is extracted that overlaps with an existing one, the system detects the collision and either updates the existing fact or discards the duplicate. Contradiction detection catches cases where a new fact directly conflicts with a stored one — newest wins, and the old fact is marked as superseded.

The TTL system prevents unbounded growth. Eleven category types each have a default lifespan:

Category	TTL	Rationale
preference	Permanent	Owner preferences rarely change
identity	Permanent	Personal details are stable
lesson	Permanent	Tactical knowledge should never expire
system_config	6 months	Until explicitly updated
project	3 months	Active work context
decision	2 months	Choices and reasoning
event	30 days	Historical record
blocker	14 days	Active impediments
casual	14 days	Low-importance mentions
task	7 days	Active to-do items

Access counts track how often each fact is retrieved. High-access facts are clearly important. Zero-access facts that have expired are pruned first — they were stored, never needed, and have outlived their TTL. This is the memory equivalent of forgetting: not random loss, but principled garbage collection.

Memory Health Monitoring: 20 Self-Checks

A memory system is only as good as its reliability. Nevo runs a 20-check health diagnostic that validates the entire pipeline:

Structural checks: Memory directory exists and is writable, MEMORY.md has correct block structure, facts.jsonl is non-empty
Freshness checks: Latest fact created within 48 hours, last consolidation within 26 hours, at least one session narrative generated today
Infrastructure checks: QMD index is non-empty, extraction script syntax is valid, API key is available, all LaunchAgent plists are loaded
Quality checks: No expired facts lingering, no "session" category noise in the fact store, no stale locks held longer than 5 minutes
Sync checks: Project MEMORY.md and Claude Code auto-memory match, QMD chunk-to-file ratio is reasonable

Each check returns pass, warn, or fail. The overall result is healthy (exit 0), warning (exit 1), or critical (exit 2). Results are logged to both a JSON report and an events journal, creating an audit trail of memory system health over time.

The cron watchdog daemon monitors these checks continuously. If the memory system degrades, Nevo knows about it before anyone asks.

QMD: Semantic Search Over 569 Documents

Raw storage is half the problem. The other half is retrieval — finding the right memory at the right time without injecting everything into every prompt.

Nevo uses QMD (Quick Markdown Documents) for local semantic search across its entire knowledge base. QMD maintains an index of 569 documents across 7 collections: project root files, skills, developer docs, workspace configuration, and memory files. It supports both BM25 keyword search and GGUF-based vector embeddings for semantic similarity.

The embedding model runs entirely locally — a 300M parameter model that processes queries at 100+ per second on Apple Silicon with zero API cost. When Nevo needs to recall something, it searches QMD instead of reading files directly. The token savings are dramatic: searching QMD and retrieving 2-3 relevant results costs roughly 5,000 tokens. Reading the equivalent files directly would cost 100,000 tokens. That is a 95% reduction.

This is the equivalent of having an indexed library versus carrying every book you own in your backpack. Both give you access to the same information. One of them scales.

How This Compares to ChatGPT and Claude Memory

ChatGPT's memory feature stores a flat list of facts extracted from conversations. It is limited, cannot be structured, has no TTL system, no confidence scoring, and no consolidation pipeline. It is a notepad, not a memory system.

Claude's project knowledge and session memory work within the context window. They are useful for single-project continuity but do not persist across contexts, cannot be searched semantically, and have no mechanism for graduated consolidation.

Both approaches treat memory as an afterthought — a feature added to a chat interface. Nevo treats memory as infrastructure — a multi-stage pipeline with extraction, encoding, consolidation, health monitoring, and semantic retrieval, all running autonomously.

The practical difference: ChatGPT remembers that you prefer Python. Nevo remembers that you prefer Python, that the project uses Python 3.12 with specific dependencies, that the last three sessions involved refactoring the memory pipeline, that a particular API quirk caused a failure on February 18th and the workaround was documented, and that your preferred commit message style uses conventional commits with a specific format. And it retrieves only the facts relevant to what you are doing right now.

What Comes Next

The current architecture is functional and battle-tested, but there are clear evolution paths. Graph-based fact storage would enable relationship queries between entities. DSPy prompt optimization is accumulating traces that will eventually tune the extraction and narrative prompts for higher precision. And as the fact store grows, more sophisticated retrieval strategies — like importance-weighted decay and access-pattern analysis — will keep the system lean without losing signal.

Memory is not a feature. For an AI agent system, it is the foundation. Without it, every session starts from zero. With it, every session builds on everything that came before.

That is the difference between a tool you use and an agent that knows you. To experience Nevo's memory system firsthand, see the Nevo App installation guide.

Frequently Asked Questions

What is AI agent memory and why does it matter?

AI agent memory is a system that allows an autonomous AI agent to retain, organize, and recall information across multiple sessions. It matters because without persistent memory, an AI agent cannot learn from past interactions, remember user preferences, or build on previous work. Every session starts from scratch, forcing users to repeat context and losing the compounding value of long-term collaboration.

How is Nevo's memory different from ChatGPT's memory feature?

ChatGPT stores a flat list of facts with no structure, no expiration, no confidence scoring, and no consolidation pipeline. Nevo uses a three-stage brain-inspired pipeline (sensory buffer, hippocampal encoding, neocortical consolidation) with 777+ categorized facts, TTL-based expiration, deduplication, graduated session compression, semantic search over 569+ documents, and a 20-check health monitoring system. The difference is between a notepad and a full memory architecture.

Does Nevo's memory system use a vector database?

Nevo uses QMD (Quick Markdown Documents) for semantic search, which combines BM25 keyword matching with GGUF-based local vector embeddings. The embedding model runs entirely on-device with zero API cost, processing 100+ queries per second on Apple Silicon. This provides the retrieval benefits of a vector database without external dependencies or per-query charges.

How does Nevo prevent its memory from growing unbounded?

Three mechanisms work together. First, every fact has a category-based TTL (time-to-live) ranging from 7 days for tasks to permanent for core preferences. Second, daily consolidation prunes expired zero-access facts, and weekly full sweeps catch the rest. Third, graduated consolidation compresses session narratives into weekly and then monthly digests, keeping the active memory layer lean while archiving raw data for audit purposes.