|Nevo
Build Your First AI Agent in Python: Step-by-Step Tutorial

Build Your First AI Agent in Python: Step-by-Step Tutorial

Most "AI agent" tutorials are glorified chatbot wrappers. You call an API, print the response, and the tutorial declares victory. That is not an agent. That is a function call with extra steps.

A real AI agent accepts a goal, reasons about how to accomplish it, uses tools to interact with the world, remembers what it has done, and keeps going until the goal is achieved or it determines the goal is impossible. The difference between a chatbot and an agent is autonomy -- the ability to take multiple actions in sequence without a human pressing "enter" between each one.

This tutorial builds a real one. By the end, you will have a working Python agent that takes a natural language goal, plans its approach, calls tools (web search, file operations, a calculator), maintains memory across turns, and runs autonomously until the job is done. No frameworks. No abstractions hiding the interesting parts. Just Python and the Anthropic API.

An AI agent is a software system that uses a large language model as its reasoning engine, combined with tools, memory, and a control loop, to autonomously accomplish goals that require multiple steps of perception, planning, and action.

This is a simplified version of what systems like Nevo do under the hood -- stripped down to the essentials so you can understand every moving part. For foundational context on how agents work conceptually, start with What Are AI Agents?. For a deep dive into the individual components we are about to build, see AI Agent Components: Memory, Reasoning, Tools, and Planning.


What You Will Build

A command-line AI agent with five capabilities:

  1. Goal acceptance -- Takes a natural language objective and works toward it
  2. Reasoning loop -- Plans, acts, observes, and iterates autonomously
  3. Tool use -- Searches the web, reads and writes files, performs calculations
  4. Memory -- Maintains conversation history with automatic summarization
  5. Autonomous execution -- Runs until the goal is complete, no human in the loop

The entire agent is under 400 lines of Python. No external agent frameworks required.


Prerequisites

Before starting, you need:

  • Python 3.10+ -- Required for modern type hints and match statements
  • An Anthropic API key -- Sign up at console.anthropic.com and generate a key
  • pip -- Python's package manager (ships with Python)
  • Basic Python knowledge -- Functions, classes, dictionaries, and loops

You do not need prior experience with LLMs, the Anthropic API, or agent architectures. This tutorial explains everything from scratch.


Step 1: Project Setup

Create a project directory and install dependencies:

mkdir my-ai-agent && cd my-ai-agent
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install anthropic requests

Two dependencies:

  • anthropic -- The official Anthropic Python SDK for calling Claude
  • requests -- For the web search tool (HTTP requests to a search API)

Set your API key as an environment variable:

export ANTHROPIC_API_KEY="your-api-key-here"

On Windows, use set ANTHROPIC_API_KEY=your-api-key-here instead.

Create the main file:

touch agent.py

Step 2: Define Your Tools

Tools are what give an agent the ability to act on the world. Without tools, an LLM can only generate text. With tools, it can search the internet, read files, perform calculations, send emails, query databases -- anything you can express as a Python function.

In the Anthropic API, tools are defined as JSON schemas that describe the function name, its purpose, and the shape of its inputs. When Claude decides to use a tool, it returns a structured tool_use block with the tool name and arguments. Your code then executes the actual function and passes the result back.

Here are three tools that cover a useful range of capabilities:

# agent.py

import anthropic
import json
import os
import requests
import math
from datetime import datetime

# --- Tool Definitions (JSON Schema for Claude) ---

TOOLS = [
    {
        "name": "web_search",
        "description": (
            "Search the web for current information. Use this when you need "
            "facts, data, or information that might be beyond your training data. "
            "Returns a summary of the top search results."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query to look up"
                }
            },
            "required": ["query"]
        }
    },
    {
        "name": "file_operation",
        "description": (
            "Read from or write to files on disk. Use 'read' to examine file "
            "contents, 'write' to create or overwrite a file, and 'append' to "
            "add content to an existing file."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "operation": {
                    "type": "string",
                    "enum": ["read", "write", "append"],
                    "description": "The file operation to perform"
                },
                "path": {
                    "type": "string",
                    "description": "The file path to operate on"
                },
                "content": {
                    "type": "string",
                    "description": "Content to write (required for write/append)"
                }
            },
            "required": ["operation", "path"]
        }
    },
    {
        "name": "calculator",
        "description": (
            "Evaluate a mathematical expression. Supports arithmetic, "
            "exponents, square roots, trigonometry, and common math functions. "
            "Use this instead of doing mental math."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": (
                        "The math expression to evaluate, e.g. '2 ** 10', "
                        "'math.sqrt(144)', 'math.sin(math.pi / 4)'"
                    )
                }
            },
            "required": ["expression"]
        }
    }
]

Each tool schema has three parts: a name the model uses to call it, a description that helps the model decide when to use it, and an input_schema that defines the expected arguments. Good descriptions matter -- they are the model's only documentation for understanding what the tool does and when to reach for it.


Step 3: Implement Tool Execution

The schemas above tell Claude what tools exist. Now you need the actual Python functions that run when Claude calls them. This is the bridge between the model's intent and real-world side effects.

# --- Tool Implementation ---

def execute_web_search(query: str) -> str:
    """Search the web using a simple search API."""
    try:
        # Using DuckDuckGo's instant answer API (free, no key needed)
        response = requests.get(
            "https://api.duckduckgo.com/",
            params={"q": query, "format": "json", "no_html": 1},
            timeout=10
        )
        data = response.json()

        results = []

        # Abstract/instant answer
        if data.get("Abstract"):
            results.append(f"Summary: {data['Abstract']}")

        # Related topics
        for topic in data.get("RelatedTopics", [])[:5]:
            if isinstance(topic, dict) and "Text" in topic:
                results.append(f"- {topic['Text']}")

        if not results:
            return f"No results found for: {query}"

        return "\n".join(results)

    except Exception as e:
        return f"Search failed: {str(e)}"


def execute_file_operation(operation: str, path: str, content: str = "") -> str:
    """Perform file read, write, or append operations."""
    try:
        match operation:
            case "read":
                if not os.path.exists(path):
                    return f"Error: File not found: {path}"
                with open(path, "r") as f:
                    file_content = f.read()
                if len(file_content) > 10000:
                    return file_content[:10000] + "\n... [truncated]"
                return file_content

            case "write":
                os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
                with open(path, "w") as f:
                    f.write(content)
                return f"Successfully wrote {len(content)} characters to {path}"

            case "append":
                os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
                with open(path, "a") as f:
                    f.write(content)
                return f"Successfully appended {len(content)} characters to {path}"

            case _:
                return f"Unknown operation: {operation}"

    except Exception as e:
        return f"File operation failed: {str(e)}"


def execute_calculator(expression: str) -> str:
    """Safely evaluate a mathematical expression.

    Uses a restricted execution environment that only exposes math
    functions. The __builtins__ dict is emptied so arbitrary Python
    code cannot be executed through this tool.

    NOTE: In a production system, replace this with a proper expression
    parser (like `simpleeval` or `asteval`) instead of using eval.
    """
    safe_names = {
        "math": math,
        "abs": abs,
        "round": round,
        "min": min,
        "max": max,
        "sum": sum,
        "pow": pow,
        "int": int,
        "float": float,
    }

    try:
        # Restrict eval to only safe math operations
        result = eval(expression, {"__builtins__": {}}, safe_names)  # noqa: S307
        return str(result)
    except Exception as e:
        return f"Calculation error: {str(e)}"


def run_tool(name: str, tool_input: dict) -> str:
    """Route a tool call to the correct implementation."""
    match name:
        case "web_search":
            return execute_web_search(tool_input["query"])
        case "file_operation":
            return execute_file_operation(
                tool_input["operation"],
                tool_input["path"],
                tool_input.get("content", "")
            )
        case "calculator":
            return execute_calculator(tool_input["expression"])
        case _:
            return f"Unknown tool: {name}"

A few things to notice:

Error handling everywhere. Every tool wraps its logic in try/except and returns error messages as strings rather than raising exceptions. When a tool fails, the agent needs to know what went wrong so it can try a different approach. An unhandled exception kills the loop. A descriptive error message lets the agent adapt.

The calculator uses a restricted environment. The __builtins__ dict is emptied and only safe math functions are exposed. This prevents the model from executing arbitrary Python code through the calculator tool. In a production system, you would use a proper expression parser like simpleeval or asteval instead of eval entirely.

File reads are truncated. If a file is larger than 10,000 characters, only the first portion is returned. This prevents a single file read from consuming the model's entire context window.


Step 4: Build the Memory System

An agent without memory is a genius with amnesia. It does the same work twice, forgets its own conclusions, and loses track of long-running tasks. Memory is what turns a sequence of API calls into a coherent reasoning process.

For this tutorial, memory has two layers:

  1. Conversation history -- The full message log exchanged between your code and Claude
  2. Summaries -- Compressed versions of older conversation turns, so the context window does not overflow
# --- Memory System ---

class AgentMemory:
    """Manages conversation history with automatic summarization."""

    def __init__(self, max_history: int = 50, summary_threshold: int = 30):
        self.messages: list[dict] = []
        self.summaries: list[str] = []
        self.max_history = max_history
        self.summary_threshold = summary_threshold

    def add_message(self, role: str, content) -> None:
        """Add a message to conversation history."""
        self.messages.append({"role": role, "content": content})

        # Trigger summarization when history gets long
        if len(self.messages) > self.summary_threshold:
            self._summarize_old_messages()

    def _summarize_old_messages(self) -> None:
        """Compress older messages into a summary to free context space."""
        # Keep the most recent messages intact
        keep_count = 10
        old_messages = self.messages[:-keep_count]
        recent_messages = self.messages[-keep_count:]

        # Build a text summary of old messages
        summary_parts = []
        for msg in old_messages:
            role = msg["role"]
            content = msg["content"]
            if isinstance(content, str):
                summary_parts.append(f"[{role}]: {content[:200]}")
            elif isinstance(content, list):
                # Handle structured content (tool_use, tool_result blocks)
                for block in content:
                    if isinstance(block, dict):
                        if block.get("type") == "tool_use":
                            summary_parts.append(
                                f"[{role}]: Called tool '{block['name']}'"
                            )
                        elif block.get("type") == "tool_result":
                            result_text = block.get("content", "")
                            if isinstance(result_text, str):
                                summary_parts.append(
                                    f"[{role}]: Tool returned: "
                                    f"{result_text[:100]}"
                                )
                        elif block.get("type") == "text":
                            summary_parts.append(
                                f"[{role}]: {block['text'][:200]}"
                            )

        summary = (
            "CONVERSATION HISTORY SUMMARY:\n"
            + "\n".join(summary_parts[-20:])  # Keep last 20 entries
        )

        self.summaries.append(summary)
        self.messages = recent_messages

        print(
            f"  [Memory] Summarized {len(old_messages)} old messages. "
            f"{len(self.messages)} messages retained."
        )

    def get_messages(self) -> list[dict]:
        """Return messages with summaries prepended as context."""
        if not self.summaries:
            return self.messages.copy()

        # Inject summaries as a system-level context message
        combined_summary = "\n---\n".join(self.summaries)
        summary_message = {
            "role": "user",
            "content": (
                f"[CONTEXT FROM EARLIER IN THIS SESSION]\n{combined_summary}\n"
                "[END CONTEXT]"
            )
        }

        result = [summary_message]

        # Ensure the next message is not also a user message
        # (API requires alternating roles)
        if self.messages and self.messages[0]["role"] == "user":
            result.append({
                "role": "assistant",
                "content": "Understood. I have the context from earlier. Continuing."
            })

        result.extend(self.messages)
        return result

This memory system does something important: it compresses older messages into summaries before they overflow the context window. Without this, a long-running agent would eventually hit the token limit and crash. With it, the agent can run for dozens of turns while keeping the most recent and most relevant context intact.

In production systems like Nevo, memory goes much further -- extracting discrete facts, storing them in searchable databases, consolidating across sessions with brain-inspired pipelines, and retrieving relevant context via semantic search. But the principle is the same: retain what matters, compress what does not, and never let the context window be the bottleneck.

For a deeper look at how production memory systems work, see How AI Agents Work.


Step 5: Build the Agent Loop

This is the core of the entire system. The agent loop is the cycle that makes an AI agent autonomous: receive a goal, reason about what to do, act (call a tool or produce output), observe the result, and repeat until done.

The loop has three possible outcomes on each iteration:

  1. Tool use -- Claude decides to call a tool. Your code executes the tool and feeds the result back. The loop continues.
  2. Final answer -- Claude produces a text response without calling any tools. The goal is considered complete.
  3. Maximum iterations reached -- A safety limit that prevents infinite loops.
# --- The Agent ---

class Agent:
    """An autonomous AI agent with tools, memory, and a reasoning loop."""

    def __init__(self, model: str = "claude-sonnet-4-6", max_iterations: int = 20):
        self.client = anthropic.Anthropic()  # Uses ANTHROPIC_API_KEY env var
        self.model = model
        self.max_iterations = max_iterations
        self.memory = AgentMemory()

        self.system_prompt = """You are an autonomous AI agent. You receive goals \
from the user and work to accomplish them step by step.

IMPORTANT RULES:
1. Break complex goals into smaller steps. Think about what you need to do \
before doing it.
2. Use tools when you need information or need to take action. Do not guess \
when a tool can give you the real answer.
3. After each tool result, assess your progress. Are you closer to the goal? \
Do you need more information? Should you try a different approach?
4. When the goal is fully accomplished, provide a clear final summary of what \
you did and what the result is.
5. If you determine a goal is impossible or requires capabilities you lack, \
say so clearly rather than looping endlessly.
6. Always explain your reasoning before taking an action. This helps the user \
understand your approach.

You have access to these tools:
- web_search: Look up current information on the internet
- file_operation: Read, write, or append to files on disk
- calculator: Evaluate mathematical expressions safely

You are goal-directed. Every action should move you closer to completing the \
user's request."""

    def run(self, goal: str) -> str:
        """Execute the agent loop until the goal is achieved."""
        print(f"\n{'='*60}")
        print(f"  AGENT GOAL: {goal}")
        print(f"{'='*60}\n")

        # Add the user's goal to memory
        self.memory.add_message("user", goal)

        for iteration in range(1, self.max_iterations + 1):
            print(f"--- Iteration {iteration}/{self.max_iterations} ---")

            # Call Claude with full conversation history and tools
            response = run_with_retries(
                lambda: self.client.messages.create(
                    model=self.model,
                    max_tokens=4096,
                    system=self.system_prompt,
                    tools=TOOLS,
                    messages=self.memory.get_messages()
                )
            )

            # Process the response
            assistant_content = response.content
            self.memory.add_message("assistant", assistant_content)

            # Check what Claude returned
            tool_calls = []
            text_parts = []

            for block in assistant_content:
                if block.type == "tool_use":
                    tool_calls.append(block)
                elif block.type == "text":
                    text_parts.append(block.text)

            # Print any reasoning text
            for text in text_parts:
                print(f"\n  Agent: {text}\n")

            # If no tool calls, the agent is done
            if not tool_calls:
                final_answer = "\n".join(text_parts) if text_parts else ""
                print(f"\n{'='*60}")
                print("  GOAL COMPLETE")
                print(f"{'='*60}\n")
                return final_answer

            # Execute each tool call and collect results
            tool_results = []
            for tool_call in tool_calls:
                print(f"  Tool: {tool_call.name}({json.dumps(tool_call.input)})")

                result = run_tool(tool_call.name, tool_call.input)
                print(f"  Result: {result[:200]}{'...' if len(result) > 200 else ''}")

                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": tool_call.id,
                    "content": result
                })

            # Feed tool results back to Claude
            self.memory.add_message("user", tool_results)

        # If we hit max iterations, return what we have
        print(f"\n  [Agent] Reached maximum iterations ({self.max_iterations})")
        return "Maximum iterations reached. The goal may be partially complete."

Let us walk through the critical parts.

The system prompt tells Claude it is an autonomous agent, not a chatbot. It sets expectations: break goals into steps, use tools instead of guessing, assess progress after each action, and stop when done. The system prompt is arguably the most important piece of an agent's architecture. A vague or chatbot-oriented system prompt produces vague, chatbot-like behavior.

The main loop calls client.messages.create on each iteration with the full conversation history and tool definitions. Claude returns content blocks that can be either text (reasoning, final answers) or tool_use (structured requests to call specific tools with specific arguments).

Tool execution is synchronous. When Claude asks to use a tool, your code immediately runs the Python function, captures the result, and sends it back as a tool_result message. Claude then sees the result and decides what to do next -- use another tool, or produce a final answer.

The stop condition is implicit. When Claude produces a response with only text and no tool calls, the loop ends. The model itself decides when the goal is done. This is the autonomy -- the model is not just answering a question, it is managing a workflow and deciding when that workflow is complete.

The iteration limit is a critical safety measure. Without it, a confused or overly ambitious model could loop indefinitely, burning API credits and never converging. Twenty iterations is a reasonable default for most tasks.


Step 6: Add Error Handling and Robustness

A production agent needs to handle real-world failures gracefully. API rate limits, network errors, malformed tool outputs -- these are not edge cases, they are Tuesday. Here is a wrapper that adds retry logic and structured error handling:

# --- Error Handling ---

import time


def run_with_retries(
    func,
    max_retries: int = 3,
    base_delay: float = 1.0
):
    """Execute a function with exponential backoff retry logic."""
    for attempt in range(max_retries):
        try:
            return func()
        except anthropic.RateLimitError:
            delay = base_delay * (2 ** attempt)
            print(f"  [Retry] Rate limited. Waiting {delay}s...")
            time.sleep(delay)
        except anthropic.APIConnectionError:
            delay = base_delay * (2 ** attempt)
            print(f"  [Retry] Connection error. Waiting {delay}s...")
            time.sleep(delay)
        except anthropic.APIStatusError as e:
            if e.status_code >= 500:
                delay = base_delay * (2 ** attempt)
                print(f"  [Retry] Server error ({e.status_code}). Waiting {delay}s...")
                time.sleep(delay)
            else:
                raise  # Client errors (4xx) should not be retried

    raise RuntimeError(f"Failed after {max_retries} retries")

This handles the three most common API failure modes:

  • Rate limits (429) -- Wait and retry with exponential backoff
  • Connection errors -- Network issues are usually transient, so retry
  • Server errors (5xx) -- Anthropic's servers occasionally return 500s under load

Client errors (400, 401, 403) are not retried because they indicate a problem with your request, not a transient failure. Retrying a bad request just burns time.


Step 7: Wire Up the Entry Point

The final piece: a command-line interface that lets you run the agent interactively or with a single goal.

# --- Entry Point ---

def main():
    """Run the agent in interactive mode."""
    agent = Agent()

    print("\n  AI Agent Ready")
    print("  Type a goal and press Enter. The agent will work autonomously.")
    print("  Type 'quit' or 'exit' to stop.\n")

    while True:
        try:
            goal = input("  Goal > ").strip()

            if not goal:
                continue
            if goal.lower() in ("quit", "exit"):
                print("  Shutting down.")
                break

            result = agent.run(goal)
            print(f"\n  Final Result:\n  {result}\n")

        except KeyboardInterrupt:
            print("\n  Interrupted. Shutting down.")
            break


if __name__ == "__main__":
    main()

Running Your Agent

Start the agent:

python agent.py

You will see a prompt. Type a goal and watch the agent work.

Example 1: Research Task

Goal > What is the current population of Tokyo and how does it compare to New York City? Save the comparison to a file called cities.md

The agent will:

  1. Search the web for Tokyo's population
  2. Search the web for New York City's population
  3. Compare the numbers using the calculator
  4. Write the comparison to cities.md
  5. Confirm the file was written and present a summary

Example 2: Math and File Operations

Goal > Calculate the compound interest on $10,000 at 7% annual rate over 30 years with monthly compounding, then save the results to investment.txt

The agent will:

  1. Use the calculator to compute the compound interest formula
  2. Format the results
  3. Write them to investment.txt
  4. Report the final amount

Example 3: Multi-Step Research

Goal > Research the top 5 programming languages by popularity in 2026, list their key strengths, and save the analysis to languages.md

The agent will:

  1. Search for current programming language rankings
  2. Search for individual language strengths if needed
  3. Compile the analysis
  4. Write it to a structured markdown file
  5. Present a summary

Each of these goals triggers the agent loop for multiple iterations. The agent reasons about what it needs, decides which tool to use, processes the result, and decides what to do next -- all without human intervention.


The Complete Agent

Here is the full agent.py for reference. Every piece has already been explained above -- this is the assembled version you can copy and run:

#!/usr/bin/env python3
"""A complete AI agent with tools, memory, and autonomous reasoning."""

import anthropic
import json
import math
import os
import time
from datetime import datetime

import requests

# --- Tool Definitions ---

TOOLS = [
    {
        "name": "web_search",
        "description": (
            "Search the web for current information. Use this when you need "
            "facts, data, or information that might be beyond your training data. "
            "Returns a summary of the top search results."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query to look up"
                }
            },
            "required": ["query"]
        }
    },
    {
        "name": "file_operation",
        "description": (
            "Read from or write to files on disk. Use 'read' to examine file "
            "contents, 'write' to create or overwrite a file, and 'append' to "
            "add content to an existing file."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "operation": {
                    "type": "string",
                    "enum": ["read", "write", "append"],
                    "description": "The file operation to perform"
                },
                "path": {
                    "type": "string",
                    "description": "The file path to operate on"
                },
                "content": {
                    "type": "string",
                    "description": "Content to write (required for write/append)"
                }
            },
            "required": ["operation", "path"]
        }
    },
    {
        "name": "calculator",
        "description": (
            "Evaluate a mathematical expression. Supports arithmetic, "
            "exponents, square roots, trigonometry, and common math functions. "
            "Use this instead of doing mental math."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": (
                        "The math expression to evaluate, e.g. '2 ** 10', "
                        "'math.sqrt(144)', 'math.sin(math.pi / 4)'"
                    )
                }
            },
            "required": ["expression"]
        }
    }
]


# --- Tool Implementations ---

def execute_web_search(query: str) -> str:
    """Search the web using DuckDuckGo's instant answer API."""
    try:
        response = requests.get(
            "https://api.duckduckgo.com/",
            params={"q": query, "format": "json", "no_html": 1},
            timeout=10
        )
        data = response.json()

        results = []
        if data.get("Abstract"):
            results.append(f"Summary: {data['Abstract']}")

        for topic in data.get("RelatedTopics", [])[:5]:
            if isinstance(topic, dict) and "Text" in topic:
                results.append(f"- {topic['Text']}")

        if not results:
            return f"No results found for: {query}"

        return "\n".join(results)
    except Exception as e:
        return f"Search failed: {str(e)}"


def execute_file_operation(operation: str, path: str, content: str = "") -> str:
    """Perform file read, write, or append operations."""
    try:
        match operation:
            case "read":
                if not os.path.exists(path):
                    return f"Error: File not found: {path}"
                with open(path, "r") as f:
                    file_content = f.read()
                if len(file_content) > 10000:
                    return file_content[:10000] + "\n... [truncated]"
                return file_content
            case "write":
                os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
                with open(path, "w") as f:
                    f.write(content)
                return f"Successfully wrote {len(content)} characters to {path}"
            case "append":
                os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
                with open(path, "a") as f:
                    f.write(content)
                return f"Successfully appended {len(content)} characters to {path}"
            case _:
                return f"Unknown operation: {operation}"
    except Exception as e:
        return f"File operation failed: {str(e)}"


def execute_calculator(expression: str) -> str:
    """Safely evaluate a mathematical expression.

    Uses a restricted execution environment that only exposes math
    functions. In production, replace with a proper expression parser
    like simpleeval or asteval.
    """
    safe_names = {
        "math": math,
        "abs": abs,
        "round": round,
        "min": min,
        "max": max,
        "sum": sum,
        "pow": pow,
        "int": int,
        "float": float,
    }
    try:
        result = eval(expression, {"__builtins__": {}}, safe_names)  # noqa: S307
        return str(result)
    except Exception as e:
        return f"Calculation error: {str(e)}"


def run_tool(name: str, tool_input: dict) -> str:
    """Route a tool call to the correct implementation."""
    match name:
        case "web_search":
            return execute_web_search(tool_input["query"])
        case "file_operation":
            return execute_file_operation(
                tool_input["operation"],
                tool_input["path"],
                tool_input.get("content", "")
            )
        case "calculator":
            return execute_calculator(tool_input["expression"])
        case _:
            return f"Unknown tool: {name}"


# --- Retry Logic ---

def run_with_retries(func, max_retries: int = 3, base_delay: float = 1.0):
    """Execute a function with exponential backoff retry logic."""
    for attempt in range(max_retries):
        try:
            return func()
        except anthropic.RateLimitError:
            delay = base_delay * (2 ** attempt)
            print(f"  [Retry] Rate limited. Waiting {delay}s...")
            time.sleep(delay)
        except anthropic.APIConnectionError:
            delay = base_delay * (2 ** attempt)
            print(f"  [Retry] Connection error. Waiting {delay}s...")
            time.sleep(delay)
        except anthropic.APIStatusError as e:
            if e.status_code >= 500:
                delay = base_delay * (2 ** attempt)
                print(f"  [Retry] Server error ({e.status_code}). Waiting {delay}s...")
                time.sleep(delay)
            else:
                raise
    raise RuntimeError(f"Failed after {max_retries} retries")


# --- Memory ---

class AgentMemory:
    """Manages conversation history with automatic summarization."""

    def __init__(self, max_history: int = 50, summary_threshold: int = 30):
        self.messages: list[dict] = []
        self.summaries: list[str] = []
        self.max_history = max_history
        self.summary_threshold = summary_threshold

    def add_message(self, role: str, content) -> None:
        """Add a message to conversation history."""
        self.messages.append({"role": role, "content": content})
        if len(self.messages) > self.summary_threshold:
            self._summarize_old_messages()

    def _summarize_old_messages(self) -> None:
        """Compress older messages into a summary to free context space."""
        keep_count = 10
        old_messages = self.messages[:-keep_count]
        recent_messages = self.messages[-keep_count:]

        summary_parts = []
        for msg in old_messages:
            role = msg["role"]
            content = msg["content"]
            if isinstance(content, str):
                summary_parts.append(f"[{role}]: {content[:200]}")
            elif isinstance(content, list):
                for block in content:
                    if isinstance(block, dict):
                        if block.get("type") == "tool_use":
                            summary_parts.append(
                                f"[{role}]: Called tool '{block['name']}'"
                            )
                        elif block.get("type") == "tool_result":
                            result_text = block.get("content", "")
                            if isinstance(result_text, str):
                                summary_parts.append(
                                    f"[{role}]: Tool returned: "
                                    f"{result_text[:100]}"
                                )
                        elif block.get("type") == "text":
                            summary_parts.append(
                                f"[{role}]: {block['text'][:200]}"
                            )

        summary = (
            "CONVERSATION HISTORY SUMMARY:\n"
            + "\n".join(summary_parts[-20:])
        )
        self.summaries.append(summary)
        self.messages = recent_messages
        print(
            f"  [Memory] Summarized {len(old_messages)} old messages. "
            f"{len(self.messages)} messages retained."
        )

    def get_messages(self) -> list[dict]:
        """Return messages with summaries prepended as context."""
        if not self.summaries:
            return self.messages.copy()

        combined_summary = "\n---\n".join(self.summaries)
        summary_message = {
            "role": "user",
            "content": (
                f"[CONTEXT FROM EARLIER IN THIS SESSION]\n{combined_summary}\n"
                "[END CONTEXT]"
            )
        }
        result = [summary_message]
        if self.messages and self.messages[0]["role"] == "user":
            result.append({
                "role": "assistant",
                "content": "Understood. I have the context from earlier. Continuing."
            })
        result.extend(self.messages)
        return result


# --- Agent ---

class Agent:
    """An autonomous AI agent with tools, memory, and a reasoning loop."""

    def __init__(self, model: str = "claude-sonnet-4-6", max_iterations: int = 20):
        self.client = anthropic.Anthropic()
        self.model = model
        self.max_iterations = max_iterations
        self.memory = AgentMemory()
        self.system_prompt = """You are an autonomous AI agent. You receive goals \
from the user and work to accomplish them step by step.

IMPORTANT RULES:
1. Break complex goals into smaller steps. Think about what you need to do \
before doing it.
2. Use tools when you need information or need to take action. Do not guess \
when a tool can give you the real answer.
3. After each tool result, assess your progress. Are you closer to the goal? \
Do you need more information? Should you try a different approach?
4. When the goal is fully accomplished, provide a clear final summary of what \
you did and what the result is.
5. If you determine a goal is impossible or requires capabilities you lack, \
say so clearly rather than looping endlessly.
6. Always explain your reasoning before taking an action. This helps the user \
understand your approach.

You have access to these tools:
- web_search: Look up current information on the internet
- file_operation: Read, write, or append to files on disk
- calculator: Evaluate mathematical expressions safely

You are goal-directed. Every action should move you closer to completing the \
user's request."""

    def run(self, goal: str) -> str:
        """Execute the agent loop until the goal is achieved."""
        print(f"\n{'='*60}")
        print(f"  AGENT GOAL: {goal}")
        print(f"{'='*60}\n")

        self.memory.add_message("user", goal)

        for iteration in range(1, self.max_iterations + 1):
            print(f"--- Iteration {iteration}/{self.max_iterations} ---")

            response = run_with_retries(
                lambda: self.client.messages.create(
                    model=self.model,
                    max_tokens=4096,
                    system=self.system_prompt,
                    tools=TOOLS,
                    messages=self.memory.get_messages()
                )
            )

            assistant_content = response.content
            self.memory.add_message("assistant", assistant_content)

            tool_calls = []
            text_parts = []

            for block in assistant_content:
                if block.type == "tool_use":
                    tool_calls.append(block)
                elif block.type == "text":
                    text_parts.append(block.text)

            for text in text_parts:
                print(f"\n  Agent: {text}\n")

            if not tool_calls:
                final_answer = "\n".join(text_parts) if text_parts else ""
                print(f"\n{'='*60}")
                print("  GOAL COMPLETE")
                print(f"{'='*60}\n")
                return final_answer

            tool_results = []
            for tool_call in tool_calls:
                print(
                    f"  Tool: {tool_call.name}"
                    f"({json.dumps(tool_call.input)})"
                )
                result = run_tool(tool_call.name, tool_call.input)
                print(
                    f"  Result: {result[:200]}"
                    f"{'...' if len(result) > 200 else ''}"
                )
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": tool_call.id,
                    "content": result
                })

            self.memory.add_message("user", tool_results)

        print(f"\n  [Agent] Reached max iterations ({self.max_iterations})")
        return "Maximum iterations reached. The goal may be partially complete."


# --- Entry Point ---

def main():
    """Run the agent in interactive mode."""
    agent = Agent()

    print("\n  AI Agent Ready")
    print("  Type a goal and press Enter. The agent will work autonomously.")
    print("  Type 'quit' or 'exit' to stop.\n")

    while True:
        try:
            goal = input("  Goal > ").strip()
            if not goal:
                continue
            if goal.lower() in ("quit", "exit"):
                print("  Shutting down.")
                break
            result = agent.run(goal)
            print(f"\n  Final Result:\n  {result}\n")
        except KeyboardInterrupt:
            print("\n  Interrupted. Shutting down.")
            break


if __name__ == "__main__":
    main()

What You Just Built vs. Production Agents

This tutorial agent is real. It works. It reasons, uses tools, maintains memory, and runs autonomously. But the gap between this and a production agent system like Nevo is the gap between a go-kart and a Formula 1 car. They both have engines and wheels. The engineering depth is different.

Here is what separates a tutorial agent from a production system:

Multi-Agent Orchestration

This tutorial uses a single agent. Production systems coordinate multiple specialized agents -- one for code writing, one for code review, one for testing, one for deployment. Nevo runs 14 specialized sub-agents, each with their own system prompts, tool sets, and areas of expertise. The main agent orchestrates, it does not do the work itself.

Quality Gates

This agent trusts whatever Claude produces. A production agent verifies. Nevo passes every piece of code through an 8-stage quality pipeline -- typecheck, test, lint, critique, refinement, escalation, and arbitration. If the code is not right after three iterations, fresh reviewers are brought in. The agent does not ship unverified work.

Persistent Memory

This agent's memory lasts for one session. When you restart the script, everything is gone. Production agents persist knowledge across sessions using databases, embeddings, and retrieval systems. Nevo's memory architecture processes raw sessions into facts, compresses them over time, and retrieves relevant context via semantic search -- mimicking how biological memory works.

Error-to-Rule Learning

When this agent makes a mistake, the mistake is forgotten. In a production system, every unique error triggers an analysis pipeline that traces the root cause and generates a preventive rule. The error becomes a permanent improvement to the system's operating instructions. The same mistake never happens twice.

Model Routing

This agent uses one model for everything. Production systems route tasks to different models based on complexity -- cheap fast models for simple work, expensive powerful models for hard problems. You do not need Opus-tier intelligence to format a string.

Tool Ecosystem

Three tools is a starting point. Production agents connect to dozens of services through protocols like MCP (Model Context Protocol) -- databases, APIs, code execution environments, messaging platforms, monitoring systems. The more tools an agent has, the more problems it can solve autonomously.

For a comprehensive guide on building toward production, see How to Build an AI Agent System from Scratch.


Where to Go Next

You have a working agent. Here are the natural next steps, roughly in order of impact:

Add more tools. The agent's usefulness scales directly with its tool set. Add a tool for running shell commands, querying a database, calling a specific API, or sending notifications. Each new tool unlocks a new category of goals the agent can accomplish autonomously.

Add persistent storage. Replace the in-memory history with a SQLite database or JSON file. Let the agent remember things across restarts. Start simple: save the full conversation log to disk and reload it on startup.

Implement streaming. The current agent waits for complete responses before printing anything. The Anthropic SDK supports streaming, which lets you see the agent's reasoning in real time as it generates tokens. This makes the experience dramatically more interactive.

Add a planning step. Before entering the tool loop, have the agent generate an explicit plan -- a numbered list of steps it intends to take. Then execute the plan step by step, checking off items as they complete. This improves reliability on complex goals.

Connect to Claude Code. If you want to see what this architecture looks like at scale -- with 20 agents, 36 skills, quality pipelines, and self-improvement loops -- explore Claude Code and the systems built on top of it.


Frequently Asked Questions

What is an AI agent in Python?

An AI agent in Python is a program that uses a large language model as its reasoning engine, combined with tools (Python functions the model can call), memory (conversation history and knowledge persistence), and an autonomous control loop that iterates until a goal is achieved. It is distinct from a chatbot because it takes actions, not just generates responses.

How much does it cost to run an AI agent?

Cost depends on the model and the complexity of the goal. Claude Sonnet 4.6 costs $3 per million input tokens and $15 per million output tokens. A typical agent run of 5-10 iterations on a moderate task costs between $0.02 and $0.15. Long-running complex tasks with many tool calls can cost $0.50 to $2.00. You can reduce costs by using cheaper models for simple subtasks.

Can I use a different LLM instead of Claude?

Yes. The agent architecture is model-agnostic. The tool use pattern (define tools as schemas, receive tool_use blocks, return tool_result blocks) is supported by OpenAI, Google Gemini, and other providers with similar but slightly different APIs. The core loop logic remains the same -- only the API client and message format change.

How do I add new tools to the agent?

Adding a tool requires two changes: add a JSON schema to the TOOLS list describing the tool's name, purpose, and input parameters, then add a Python function that implements the tool's logic and wire it into the run_tool router. The model automatically discovers new tools from the schema.

Is this agent safe to run?

This tutorial agent has limited capabilities by design. The file operations are restricted to read, write, and append. The calculator uses a sandboxed environment with only math functions exposed. The web search is read-only. For production use, add authentication, rate limiting, file path restrictions, and audit logging. Never give an agent write access to directories containing sensitive data without proper safeguards.

How does this compare to using LangChain or CrewAI?

Frameworks like LangChain and CrewAI provide pre-built abstractions for agent loops, tool management, and memory. This tutorial builds the same components from scratch so you understand how they work. The trade-off: frameworks save development time but add dependencies, complexity, and opinions about architecture. Building from scratch gives you full control and complete understanding. Many production agent systems, including Nevo, use no external agent framework at all.


This tutorial shows the simplified version of what powers Nevo, a self-improving AI agent system built from scratch. To understand the full architecture -- including multi-agent orchestration, quality pipelines, error-to-rule learning, and brain-inspired memory -- explore our complete guide to AI agents.