AI Agent Components: Memory, Reasoning, Tools, and Planning
Strip away the branding, the marketing, the breathless press releases, and every AI agent on the planet reduces to five components: memory, reasoning, tool use, planning, and observation. These are the load-bearing walls. Everything else is furniture.
The difference between an agent that ships production code and one that hallucinates itself into a corner is not which large language model sits at the center. It is how well these five components are designed and how gracefully they handle failure. A frontier model with no memory is a genius with amnesia. A mid-tier model with excellent planning and tool integration will outperform it on any multi-step task.
AI agent components are the architectural building blocks -- memory, reasoning, tool use, planning, and observation -- that together enable an AI system to perceive its environment, make decisions, take action, and improve over time.
This post breaks down each component, explains why it matters, and shows what the difference looks like between a naive implementation and one that actually works in production. If you are new to AI agents, start with our foundational guide: What Are AI Agents?. For how these components combine into different architectures, see Types of AI Agents.
Component 1: Observation
Observation is where the agent loop begins. Before an agent can reason, plan, or act, it has to perceive the current state of its environment. In cognitive science, this is called grounding -- connecting internal representations to external reality. An agent that reasons without grounding is just dreaming.
What Observation Looks Like in Practice
For a coding agent, observation means reading source files, parsing compiler output, scanning test results, checking git history, inspecting directory structures, and understanding the shape of a codebase before touching it. For a business agent, it might mean reading emails, monitoring dashboards, or watching for specific events in a data stream.
The quality of observation determines the quality of everything downstream. An agent that reads a file superficially -- skimming the first 50 lines instead of understanding the full module -- will produce reasoning contaminated by incomplete context. An agent that reads thoroughly but injects 200K tokens of raw file content into its prompt will exhaust its context window before it gets to do anything useful.
The Observation Trade-Off
Every observation consumes tokens, and tokens are the fundamental currency of agent computation. Read too little and you miss critical context. Read too much and you drown in noise. The best agent systems solve this with selective observation -- retrieving only what is relevant rather than dumping everything into the prompt.
Nevo handles this with QMD, a local semantic search system indexing over 500 documents. When Nevo needs to recall its own architecture or a past decision, it searches the index and retrieves 2-3 relevant results for roughly 5,000 tokens. Reading those files directly would cost 100,000 tokens. That 95% reduction is the difference between an agent that sustains complex multi-step tasks and one that runs out of context window halfway through.
Component 2: Memory
If observation is the agent's eyes, memory is its continuity. Without memory, every session starts from zero. The agent has no idea what it did yesterday, what you prefer, what failed last time, or what lessons it learned. You are working with a different stranger every time you open a new conversation.
AI agent memory is the system that allows an autonomous agent to retain, organize, and recall information across sessions, enabling it to learn from past interactions, remember context, and build on previous work.
Memory is the single component that separates a tool from a partner. And it is the component most agent systems get wrong.
Types of Memory
Agent memory comes in three fundamental forms, mirroring the types identified by cognitive science.
Working memory is what the agent holds in its context window right now. It is the current conversation, the files it has read, the tool outputs it has received. Every LLM has this -- it is the context window itself. The limitation is capacity: even a 200K-token window cannot hold a month of work history. Working memory is volatile and expensive to fill.
Episodic memory records specific experiences. What happened during a particular session. What decisions were made and why. What errors occurred and how they were resolved. Episodic memory lets an agent answer the question "What did I do last Tuesday?" -- and more importantly, "What went wrong and how did I fix it?"
Semantic memory stores distilled facts and knowledge. Not the full narrative of a session, but the atomic takeaways: the user prefers conventional commit messages, the project uses Python 3.12, this specific API has a rate limit quirk that requires exponential backoff. Semantic memory is compact, structured, and searchable.
Naive vs. Production Memory
The naive approach to agent memory is a flat list of facts stuffed into the system prompt. ChatGPT's memory feature works roughly this way. It catches basic preferences ("I like Python") but collapses under complexity. There is no structure, no expiration, no mechanism for resolving contradictions, and no way to search semantically.
A production memory system needs several things the flat-list approach lacks:
Extraction. Raw conversations are noisy. A production system extracts discrete, atomic facts, categorizes them, timestamps them, and hashes them for deduplication so the system does not store "user prefers dark mode" seventeen times.
Consolidation. Yesterday's session exists in full detail. Last week's sessions compress into summaries. Last month's compress further. This graduated consolidation mirrors biological memory -- the important facts persist at full fidelity in semantic memory while narratives compress over time.
Retrieval. Storing facts is half the problem. Finding the right fact at the right time is the other half. Semantic search -- combining keyword matching with vector embeddings -- lets an agent retrieve relevant memories without paying the token cost of loading everything.
Expiration. Memory without garbage collection grows unbounded. A production system assigns TTLs based on fact category: preferences are permanent, configurations last months, casual mentions expire in weeks.
Nevo's memory architecture implements all four of these through a brain-inspired three-stage pipeline: sensory buffer (raw session data), hippocampal encoding (fact extraction and session narratives), and neocortical consolidation (graduated compression, pruning, and long-term storage). The daily consolidation runs autonomously at 3am -- the AI equivalent of sleep-time memory processing.
Why Most Agents Still Have No Memory
Building real memory is hard. It requires infrastructure that runs outside the LLM itself -- extraction pipelines, storage backends, embedding models, consolidation jobs, health checks. The model forgets everything when the session ends. Memory has to be built around the model, not inside it.
This is why the largest context window in the world does not solve the memory problem. A million-token context window is a bigger desk. It is not a filing cabinet, an archive, or a library.
Component 3: Reasoning
Reasoning is the cognitive core -- the component that takes observations and memories and produces decisions. It is where the LLM earns its keep. Given what the agent sees and knows, what should it do next?
AI agent reasoning is the process by which an agent evaluates its current state, considers possible actions, and selects the approach most likely to achieve its goals.
Chain-of-Thought Reasoning
The most common reasoning pattern in modern agents is chain-of-thought (CoT): the model works through a problem step by step, making each intermediate conclusion explicit before reaching a final answer. Each step follows logically from the previous one, and the model's reasoning is transparent and auditable.
Chain-of-thought works well for linear problems but breaks down when the problem space branches -- when there are multiple valid approaches, when trade-offs need evaluation, when the first path attempted might be wrong.
Tree-of-Thought and Branching Reasoning
Tree-of-thought (ToT) reasoning addresses this by exploring multiple solution paths, evaluating each branch, and selecting the most promising one. Few agent systems implement full tree-of-thought due to the computational cost, but the principle influences good agent design: before committing to an approach, evaluate alternatives. Nevo codifies this as an operating rule: "Assess before acting. Choose the right path, not the first path."
Structured Reasoning Pipelines
The most reliable reasoning systems do not rely on the model's internal chain-of-thought alone. They structure reasoning into explicit stages with external verification at each step.
Nevo's 8-stage quality pipeline is reasoning made structural. Every piece of code flows through: WRITE, TYPECHECK, TEST, LINT, CRITIQUE, REFINE, ESCALATE, ARBITER. Each stage is a distinct reasoning step, often handled by a different model optimized for that specific judgment. Type-checking does not require a frontier model's reasoning capacity. Architectural critique does.
This pipeline approach transforms reasoning from a single-shot inference ("write good code") into a multi-stage verification process where errors are caught at the stage closest to their origin. A type error caught at stage 2 costs nothing. The same error caught by a human reviewer after deployment costs hours.
The pattern generalizes beyond code. Any domain where quality matters -- legal documents, financial analysis, medical reasoning -- benefits from breaking reasoning into explicit, verifiable stages rather than trusting a single model to get everything right in one pass.
Self-Critique and Reflection
The most capable agents can evaluate their own reasoning. After producing an output, they ask: "Is this correct? Did I miss anything? Is there a better approach?" This metacognitive loop catches errors that single-pass reasoning misses.
Nevo's code-critic agent evaluates code against a structured rubric covering correctness, readability, performance, security, and architectural coherence. If the critique finds issues, the code cycles back through refinement. If refinement cannot resolve them, the problem escalates to a human reviewer. This is reasoning about reasoning -- one of the strongest differentiators between agents that produce reliable output and those that produce plausible-looking output that falls apart under scrutiny.
Component 4: Planning
Planning is reasoning projected forward in time. Where reasoning asks "what should I do next?", planning asks "what is the full sequence of steps required to reach this goal, and in what order should they execute?"
AI agent planning is the process of decomposing a high-level goal into an ordered sequence of sub-tasks, managing dependencies between them, tracking progress, and adapting the plan when circumstances change.
Task Decomposition
The foundational planning skill is breaking a large goal into smaller, actionable steps. "Build a blog for nevo.systems" is not an actionable task. It is a project. An agent that tries to execute it as a single step will produce incoherent output. An agent that decomposes it into stories -- set up the blog infrastructure, create the content model, write the first pillar post, implement the SEO configuration, design the template -- can execute each story independently and verify it before moving on.
Nevo formalizes this through the PRD (Product Requirements Document) framework. Any task touching three or more components gets decomposed into a structured PRD with granular stories, each sized to fit within a single context window. Each story has defined acceptance criteria, file scope, and dependency relationships. The PRD is a living document -- updated after each story completes, committed to git, so that progress survives anything.
Dependency Management
Not all sub-tasks are independent. Some require the output of others. A test cannot run before the code it tests is written. A deployment cannot happen before the build passes. Planning must account for these dependencies.
Simple agents execute everything sequentially, which is safe but slow. Sophisticated agents identify which tasks are independent and execute them in parallel, only serializing where dependencies demand it. Nevo dispatches up to four parallel sub-agents using isolated git worktrees, then merges results back with a sequential rebase. Stories without dependency conflicts run simultaneously. Stories with shared file scope run sequentially. The result is faster execution without the chaos of uncoordinated parallel changes.
Goal Tracking and Adaptation
A plan is not static. Steps fail. Requirements change. New information invalidates earlier assumptions. A planning component must track progress and adapt when the plan needs to change.
The simplest adaptation is retry with adjustment. More sophisticated adaptation involves replanning -- recognizing that the current approach is fundamentally flawed and restructuring the remaining steps rather than grinding through a broken plan.
Nevo implements this with an escalation threshold: after three failed approaches to the same problem, stop and escalate. Summarize what was tried, why each attempt failed, and propose the most promising remaining option. Persistence without reflection is just stubbornness.
Hierarchical Planning
Complex projects require planning at multiple levels of abstraction. A high-level plan breaks into phases, phases into stories, stories into implementation steps. The best agent systems maintain this hierarchy explicitly -- PRDs capture the high-level plan, story definitions capture mid-level execution, and the agent's chain-of-thought handles step-by-step implementation. This separation of concerns keeps planning manageable at every level while ensuring coherence across the whole project.
Component 5: Tool Use
An agent without tools is a brain in a jar. It can think, but it cannot do. Tool use is the component that bridges reasoning and reality -- the mechanism by which an agent's decisions become actions in the world.
AI agent tool use is the ability of an agent to invoke external functions, APIs, and services to take actions that extend beyond text generation -- reading files, executing code, querying databases, calling web services, and interacting with software systems.
Function Calling
The most basic form of tool use is function calling: the model outputs structured data (typically JSON) that specifies which function to call and with what arguments. The runtime executes the function and returns the result to the model. The model then reasons about the result and decides whether to call another function or produce a final response.
Every major LLM provider supports this pattern. The model receives a schema describing available functions, their parameters, and their return types. When a function call is the appropriate next action, the model emits the call specification instead of prose. Function calling works well for simple, well-defined operations -- API queries, database lookups, record retrieval -- where the model knows exactly which function to use and what arguments to pass.
The Model Context Protocol (MCP)
For more complex tool ecosystems, function calling alone becomes unwieldy. Each tool requires its own schema definition, its own integration code, its own error handling. Adding a new tool means modifying the agent's core configuration.
The Model Context Protocol (MCP) solves this with a standardized interface between AI models and external tools. MCP defines a client-server architecture where tool providers expose capabilities through a consistent protocol, and agents consume them without needing tool-specific integration code. Adding a new MCP server gives the agent instant access to all the tools that server exposes.
This is not hypothetical. Nevo connects to MCP servers for browser automation, file management, semantic search, and other capabilities. Each server exposes its tools through the MCP protocol. Nevo discovers available tools at startup, understands their schemas, and invokes them as needed during task execution. No custom integration code per tool. No schema duplication.
The practical impact: when Nevo needs a new capability -- say, Cloudflare DNS management or Shopify theme editing -- adding an MCP server or a specialized agent with the right tools is a configuration change, not an architecture change. This is how tool ecosystems scale.
Skills: Tool Use With Expertise
Raw tool access is necessary but not sufficient. Knowing that a hammer exists does not make you a carpenter. AI agent skills bridge this gap by packaging tool use with domain expertise -- not just what tools to use, but how and when to use them, in what sequence, with what guardrails.
A skill combines a procedure, tool references, quality criteria, and domain knowledge. Nevo maintains 36 skills covering everything from code quality assessment to PRD-driven development to autonomous research. When a task matches a skill's domain, the skill dramatically improves output quality and consistency compared to ad-hoc tool use.
Tool Use Error Handling
The most underappreciated aspect of tool use is what happens when tools fail. APIs return errors. Files do not exist. Commands time out. An agent that treats tool failures as terminal cannot survive in production.
Robust tool use requires retry logic with backoff, fallback strategies, error classification (transient vs. permanent), and structured escalation when automated recovery fails. Nevo's error-to-rule pipeline goes further: when a tool failure occurs, the incident monitor detects the pattern, the incident analyst identifies the root cause, and a new rule is generated and applied automatically. Tool failures become permanent improvements -- the same class of error never recurs.
How Components Interact
These five components do not operate in isolation. They form a continuous loop:
- Observation gathers the current state of the environment
- Memory provides relevant context from past experience
- Reasoning evaluates the situation and considers options
- Planning decomposes the goal into actionable steps
- Tool use executes the next step
- The cycle repeats: observe the result, recall relevant context, reason about what to do next, adjust the plan, act again
The quality of the whole system is determined by the weakest component. Excellent reasoning with poor memory means repeating mistakes. Excellent planning with poor observation means plans built on incomplete information. Excellent tool use with poor reasoning means executing the wrong actions efficiently.
This is why evaluating agent systems by their underlying LLM alone misses the point. The model is one component -- reasoning. The other four components are engineered systems built around the model. The best agents are the ones where all five components are strong and tightly integrated.
The Component Gap: Where Most Agents Fall Short
Understanding these components also reveals where the industry's current gaps are.
Memory remains primitive. Most agent frameworks offer no persistent memory at all. The few that do implement flat fact lists with no consolidation, no TTL, and no semantic retrieval.
Planning is implicit. Most agents plan through chain-of-thought rather than explicit task decomposition with dependency management. This works for simple tasks but fails for interrelated components.
Tool ecosystems are fragmented. Despite MCP's growing adoption, most agents still rely on hand-coded function definitions rather than standardized protocols.
Observation is brute-force. Agents read everything or read nothing. Selective observation with token budget management requires dedicated infrastructure most frameworks lack.
Reasoning lacks verification. Single-pass reasoning with no structured quality gate is still the norm. Multi-stage pipelines like Nevo's 8-stage quality chain are expensive to build but dramatically more reliable.
These gaps define the frontier of agent development in 2026. The systems that close them will be meaningfully more capable than the current generation.
Frequently Asked Questions
What are the core components of an AI agent?
The five core components of an AI agent are observation (perceiving the environment), memory (retaining information across sessions), reasoning (evaluating options and making decisions), planning (decomposing goals into ordered sub-tasks), and tool use (executing actions through external functions, APIs, and services). Together, these components enable the perceive-reason-act-learn loop that defines autonomous agent behavior.
What is AI agent memory and why does it matter?
AI agent memory is the system that allows an autonomous agent to retain, organize, and recall information across sessions. It matters because without persistent memory, every interaction starts from zero -- the agent cannot learn from past mistakes, remember user preferences, or build on previous work. Memory comes in three forms: working memory (current context window), episodic memory (records of past sessions), and semantic memory (distilled facts and knowledge).
How does AI agent planning work?
AI agent planning works by decomposing high-level goals into ordered sequences of sub-tasks, managing dependencies between them, tracking progress, and adapting when circumstances change. Effective planning involves task decomposition (breaking large goals into actionable stories), dependency management (identifying which tasks can run in parallel vs. sequentially), goal tracking (monitoring progress and detecting when replanning is needed), and hierarchical abstraction (maintaining plans at multiple levels of detail).
What is the difference between AI agent reasoning and planning?
Reasoning asks "what should I do next?" based on current observations and memory. Planning asks "what is the full sequence of steps required to reach this goal?" Reasoning is present-focused and evaluative -- it considers options and selects actions. Planning is future-focused and structural -- it builds sequences, manages dependencies, and tracks progress toward objectives. In practice, the two components work together: planning creates the roadmap, reasoning navigates each step.
How do AI agents use tools?
AI agents use tools through function calling (outputting structured specifications for external functions), the Model Context Protocol (MCP, a standardized client-server interface for tool discovery and invocation), and skills (packaged procedures that combine tool use with domain expertise). Tools bridge the gap between an agent's reasoning and the real world -- without them, the agent can think but cannot act. Production-grade agents also implement error handling, retry logic, and fallback strategies for when tools fail.
What makes one AI agent better than another?
The quality of an AI agent is determined by the strength of its weakest component. A frontier LLM with no memory repeats mistakes. Excellent planning with poor observation builds on incomplete data. The best agents -- like Nevo -- invest equally in all five components: selective observation via semantic search, brain-inspired persistent memory, multi-stage reasoning pipelines, explicit task decomposition with parallel execution, and standardized tool ecosystems with automated error recovery.