What Are AI Agent Skills? The Complete Guide to Teaching Agents How to Work
An AI agent skill is a reusable, markdown-based instruction set that teaches an AI agent how to perform a specific task -- including the procedures, decision criteria, tool usage patterns, and quality checks that turn a general-purpose model into a domain specialist.
If you have used an AI agent, you know they can reason, plan, and use tools. What most people do not realize is that the gap between a mediocre agent and an exceptional one is not the model -- it is the skills loaded into that model's context. Skills are the procedural knowledge layer that determines whether an agent approaches a task like an intern or like a senior engineer with ten years of domain experience.
This guide covers everything developers need to know about AI agent skills: what they are, how they work, how they differ from plugins and MCP servers, what a skill file actually looks like, how skills get triggered, the major categories, and the frontier capability that changes everything -- agents that write their own skills.
Why Skills Matter
Large language models are general-purpose reasoners. They know a little about everything and a lot about common patterns in their training data. But "general-purpose" has a ceiling. Ask a raw LLM to implement a test-driven development workflow, and it will produce something plausible. Ask a skilled agent -- one loaded with a TDD skill containing the exact workflow steps, assertion patterns, edge case checklists, and framework-specific conventions -- and the output is categorically better.
The difference is not intelligence. It is knowledge. Specifically, it is procedural knowledge: the step-by-step understanding of how to do something well in a specific context.
An AI agent skill is a packaged unit of procedural knowledge that can be loaded into an agent's context on demand.
Skills solve three problems that raw models cannot:
-
Consistency -- Without skills, an agent improvises its approach every time. The same task might be handled five different ways across five sessions. Skills enforce a specific, proven workflow.
-
Specialization -- No model can hold deep expertise in every domain simultaneously. Skills provide domain-specific knowledge when the agent needs it, without consuming context on every other topic.
-
Institutional memory -- When a team discovers the right way to handle a complex workflow, that knowledge typically lives in someone's head or in a wiki that nobody reads. Skills encode that knowledge in a format the agent actually uses, every time.
Skills vs. Plugins vs. MCP Servers
Three terms get conflated constantly: skills, plugins, and MCP servers. They serve fundamentally different purposes.
AI agent skills are behavioral knowledge. A skill tells an agent how to approach a task -- the workflow, the decision points, the quality criteria, the domain conventions. Skills are instruction sets written in markdown, loaded into the agent's context window. They shape the agent's reasoning and behavior.
AI agent plugins are capability packages. A plugin gives an agent the ability to do something it could not do before -- persist execution across sessions, manage parallel workspaces, route between multiple LLMs. Plugins extend the agent's runtime. To understand how plugins differ from skills in practice, see our guide to AI Agent Plugins. For a head-to-head breakdown, read AI agent skills vs plugins vs MCP.
MCP servers are tool connections. The Model Context Protocol provides a standardized interface for agents to interact with external services -- databases, APIs, file systems, search engines. MCP servers give the agent hands to interact with the world.
Here is the practical distinction:
| Layer | What It Provides | Format | Example |
|---|---|---|---|
| Skill | Procedural knowledge (how to) | Markdown instruction set | "How to conduct keyword research" |
| Plugin | Runtime capability (can do) | Code package | "Persist execution across sessions" |
| MCP server | External tool access (reach) | Standardized API bridge | "Query a local document database" |
A keyword research skill tells the agent how to analyze search volume, classify keyword tiers, map to content pillars, and produce a structured brief. An MCP server gives the agent access to a web search API. A plugin might let the agent run the research as a long-running background task. All three layers work together, but each contributes something distinct.
The most capable agent systems -- including the multi-agent architectures now emerging -- combine all three. Skills provide the brain. Plugins provide the body. MCP servers provide the tools.
Anatomy of a Skill File
An AI agent skill file is a structured markdown document with two required components: YAML frontmatter that defines metadata and triggers, and a markdown body that contains the actual instructions.
Here is the minimal viable structure:
---
name: keyword-research
description: >
Structured keyword research and content brief generation.
Maps topics to a topical authority taxonomy, performs SERP
gap analysis, assigns keyword tiers, and outputs a
ready-to-write content brief. Triggers: "keyword research",
"find keywords for", "content brief for", "/keyword-research".
---
# Keyword Research -- Structured Content Brief Generator
Given a topic or keyword, produce a structured content brief.
## Step 1: Map to Topical Pillar
Identify which pillar the topic belongs to...
## Step 2: Classify Keyword Tier
Assign every keyword to a tier...
## Step 3: SERP Analysis
Use WebSearch to research the current search landscape...
Frontmatter: The Triggering Mechanism
The YAML frontmatter contains exactly two fields:
-
name-- The skill's identifier. Must be unique across the skill library. -
description-- The most important field in the entire file. This is what the agent reads to decide whether to activate the skill. It must include both what the skill does and the specific triggers or contexts that should activate it.
The description serves as the skill's "when to use" instruction. The agent's runtime scans all skill descriptions against the current task, and skills whose descriptions match get loaded into context. A vague description means the skill never triggers. An overly broad description means it triggers when it should not, consuming context unnecessarily.
Good descriptions are specific about triggers:
description: >
Browser automation with persistent page state. Use when users
ask to navigate websites, fill forms, take screenshots, extract
web data, test web apps, or automate browser workflows. Trigger
phrases include "go to [url]", "click on", "fill out the form",
"take a screenshot", "scrape", "automate", "test the website".
Body: The Procedural Knowledge
The markdown body is the skill's payload -- the instructions, workflows, checklists, decision trees, and examples that guide the agent through the task. This content is only loaded after the skill triggers, which is a critical design consideration.
Effective skill bodies share these characteristics:
- Imperative voice -- "Analyze the search results" not "The search results should be analyzed"
- Step-by-step structure -- Numbered steps, clear sequence, explicit decision points
- Concrete examples -- Show the expected output format, not just describe it
- Appropriate freedom -- Tight guardrails for fragile operations, loose guidance for creative work
Bundled Resources: The Supporting Cast
Beyond the core SKILL.md file, skills can include supporting directories:
keyword-research/
SKILL.md # Required -- instructions
references/ # Optional -- detailed docs loaded on demand
serp-patterns.md
keyword-tiers.md
scripts/ # Optional -- executable code
analyze-volume.py
assets/ # Optional -- templates, images, output files
brief-template.md
References are documentation loaded into context only when the agent determines it needs them. This follows a progressive disclosure pattern: the skill's metadata is always visible (around 100 words), the body loads when triggered (under 5,000 words), and references load only when specifically relevant (unlimited size).
Scripts are executable code for tasks that require deterministic reliability. Instead of the agent rewriting the same Python snippet every time it needs to rotate a PDF, the skill bundles a tested rotate_pdf.py that gets executed directly.
Assets are files used in the skill's output -- templates, icons, boilerplate code -- that the agent copies or modifies rather than generating from scratch.
How Skills Get Triggered
An agent skill system uses three triggering mechanisms, and understanding all three is essential for designing skills that activate when they should.
Automatic Matching
The primary trigger. When a user sends a request, the agent's runtime compares the request against every skill's description field. Skills whose descriptions match the request's intent get loaded into context automatically.
This is why the description field matters so much. It is not documentation for humans -- it is a matching surface for the agent's skill selection logic. Include specific phrases, keywords, and contexts that should activate the skill.
Slash Commands
Many skill systems support explicit invocation via slash commands. A user types /keyword-research and the skill loads immediately, regardless of matching logic. This is useful for skills that are not easy to trigger naturally through conversation -- utility skills, system administration skills, or skills with ambiguous trigger conditions.
Manual Invocation
An agent can deliberately load a skill during execution if it determines the skill would be helpful. For example, a coding agent working on a frontend task might load a dev-browser skill mid-workflow to visually verify its changes, even though the original request did not mention browser automation.
The best skill designs work across all three mechanisms. The description handles automatic matching for common cases, the slash command provides a reliable escape hatch, and the skill's name is descriptive enough that the agent recognizes when it should load the skill proactively.
Categories of AI Agent Skills
Agent skill systems tend to organize skills into three broad categories based on what kind of knowledge they encode.
Process Skills
Process skills encode methodologies -- how to approach a class of problems. They are workflow-level instructions that apply across many different specific tasks.
Examples:
- Test-driven development -- The full TDD cycle: write the test first, verify it fails, implement the minimum code to pass, refactor, verify again
- Compound engineering -- A development methodology where each unit of work makes subsequent work easier by feeding learnings back into the system: Plan (40%) -> Work (20%) -> Review (20%) -> Compound (20%)
- PRD-driven execution -- Decompose any project into a structured Product Requirements Document with dependency-ordered stories, acceptance criteria, and file scopes before writing a single line of code
- Code quality escalation -- What to do when the quality pipeline reaches iteration three and the critic still has findings: diagnostic report, research team, fresh review, arbitration
Process skills are the highest-leverage type because they apply broadly. A single TDD skill improves every coding task the agent performs.
Implementation Skills
Implementation skills encode domain-specific technical knowledge -- how to work with a particular technology, API, framework, or toolchain.
Examples:
- Dev-browser -- Browser automation with persistent page state, covering Chromium launch, profile management, element selection strategies, screenshot capture, and form interaction
- Image generation -- How to generate images using local models (Flux via mflux) and remote APIs, including prompt engineering patterns, model selection, and output format handling
- MCP builder -- How to construct a new Model Context Protocol server from scratch, including the server skeleton, tool registration, transport configuration, and testing procedures
Implementation skills are narrower but deeper. They contain the specific knowledge that makes the difference between an agent that fumbles through an API and one that uses it like an expert.
Domain Skills
Domain skills encode expertise in a non-technical field -- business processes, content strategy, industry knowledge, or organizational conventions.
Examples:
- Keyword research -- How to analyze search volume, classify keywords into tiers, perform SERP gap analysis, map content to topical pillars, and produce structured content briefs with GEO guidance
- Content writing -- Voice and tone guidelines, blog post structure, SEO integration points, internal linking strategy, and definitional sentence patterns for AI citation
- Token optimization -- How to analyze agent token usage, identify high-cost patterns, prioritize optimization candidates by frequency times cost, and generate token-efficient skills
Domain skills bridge the gap between what a model "knows" from training and what it needs to know to operate effectively in a specific context. They are particularly valuable because the knowledge they contain -- organizational conventions, industry-specific workflows, proprietary processes -- is exactly the kind of knowledge that is not well-represented in any model's training data.
The Context Window as a Public Good
One design principle separates well-engineered skill systems from naive ones: treating the context window as a shared, finite resource.
An AI agent's context window is not unlimited. Every token consumed by a skill is a token unavailable for the actual task, for conversation history, for tool outputs, for other skills that might be relevant. Loading a 10,000-word skill into context when only a 200-word section is relevant is wasteful, and wasteful context usage degrades agent performance.
The progressive disclosure pattern addresses this directly:
- Level 1: Metadata only -- The skill's name and description (around 100 words) are always in context. This is the selection layer.
- Level 2: Skill body -- The main instructions (under 5,000 words) load only when the skill triggers. This is the knowledge layer.
- Level 3: Bundled resources -- Reference documents, scripts, and assets load only when the agent determines it needs them. This is the detail layer.
A system with 36 skills and level-1 metadata always loaded consumes roughly 3,600 words of context. The same 36 skills fully expanded would consume over 100,000 words -- an impossible context budget. Progressive disclosure makes large skill libraries practical.
The practical implication for skill authors: keep the body lean. Default assumption is that the model is already smart. Only include knowledge the model does not already have. Prefer concise examples over verbose explanations. If a skill approaches 500 lines, split the details into reference files.
Self-Writing Skills: Agents That Teach Themselves
This is where skill systems cross from useful infrastructure into something genuinely new.
A self-writing AI agent skill system is one where the agent itself identifies gaps in its own capabilities and authors new skills to fill those gaps -- autonomously, without human intervention.
The concept sounds abstract until you see the pipeline. In Nevo's implementation, self-writing works through a system called the Skill Forge:
DETECT --> EVALUATE --> GENERATE --> VALIDATE --> DEPLOY --> TRACK
Detection
Three sources trigger skill generation:
-
Incident analysis -- When the error-to-rule pipeline identifies a root cause of "missing knowledge" or "process gap," it flags the incident as a skill candidate. A rule prevents the immediate recurrence, but a skill prevents the entire class of problem.
-
Token optimization -- When the token monitor identifies a high-cost pattern -- the same complex workflow being improvised from scratch every time -- it flags the pattern as a skill creation candidate. A dedicated skill encodes the workflow once, saving tokens on every future execution.
-
Human request -- A user simply asks the system to generate a skill for a specific purpose.
Evaluation
Not every gap warrants a skill. The system evaluates whether the right solution is a skill, a rule, a hook, or an agent configuration change:
- If the fix is a simple behavioral instruction (1-2 sentences), it becomes a rule
- If the fix requires enforcement at the tool level, it becomes a hook
- If the fix is agent-specific, it goes into that agent's configuration
- If the fix involves a reusable workflow or repeated high-cost pattern, it becomes a skill
Generation
The Skill Writer agent -- running on the most capable available model -- authors a complete SKILL.md following the same anatomy described earlier in this guide. It includes frontmatter with accurate trigger descriptions, a structured body with step-by-step instructions, and any necessary reference files or scripts.
Validation
Automated checks verify the generated skill meets quality standards: valid YAML frontmatter, required fields present, body under 500 lines, no duplicate functionality with existing skills, and scripts (if any) pass syntax checks.
Deployment and Tracking
Validated skills are placed in a generated/ directory where the agent runtime auto-discovers them. Each generated skill is tracked in an inventory with creation date, source trigger, and usage statistics. Skills that prove effective stay. Skills that cause problems or go unused get deactivated.
The result is a compound effect. Every session where the agent encounters a gap makes the agent permanently more capable for all future sessions. Over weeks and months, the skill library grows denser, the agent's coverage of edge cases expands, and the frequency of improvised (and therefore inconsistent) approaches shrinks.
Nevo currently operates with 36 skills across three scopes -- user-level, project-level, and generated. That number grows autonomously as the system runs.
Real-World Skill Examples
Abstract descriptions only go so far. Here are three skills from a production agent system, showing how different categories manifest in practice.
Keyword Research Skill (Domain)
This skill transforms a general-purpose LLM into a content strategist. It encodes a seven-step workflow: map the topic to a topical pillar, classify keywords by tier (head, category, long-tail, branded), analyze the SERP landscape, determine optimal content format, build an internal linking map, apply GEO guidance for AI citation optimization, and output a structured content brief.
Without this skill, an agent asked to "do keyword research for AI agent skills" would produce something generic. With it, the agent produces a brief that includes SERP gap analysis, keyword tier assignments, schema markup requirements, and definitional sentences designed for AI model extraction.
Code Quality Skill (Process)
This skill defines an 8-stage mandatory quality pipeline: write, typecheck, test, lint, critique, refine, escalate, and arbitrate. Each stage has defined agents, inputs, outputs, and escalation criteria. If iteration three of the critique-refine cycle still produces findings, the skill specifies a four-step escalation: diagnostic report, research team validation, fresh independent review, and final arbitration.
The skill also includes an escalation reference file (loaded only when escalation triggers) that details agent roles, report formats, and decision criteria for the escalation flow. This is progressive disclosure in action -- the escalation details consume zero context during normal quality pipeline runs.
Dev-Browser Skill (Implementation)
This skill provides browser automation with persistent page state. It covers two operation modes (standalone Chromium with persistent profiles, and extension mode connecting to an existing browser), element selection strategies (AI snapshots for unknown layouts, direct selectors for known code), and incremental workflow patterns (prove out a workflow with small scripts, then write a single automated script for repetitive work).
The skill includes setup instructions, server launch commands, profile storage paths, and mode selection guidance -- implementation details that the model does not know from training and that vary by environment.
Building Your Own Skills
If you are building or configuring an agent system, here is the practical process for creating effective skills.
To write your first skill, see the hands-on AI agent skill writing guide for a complete walkthrough.
Step 1: Identify the repeated pattern. Look for tasks where the agent improvises a different approach each time, where the quality of output varies based on how the request was phrased, or where you find yourself giving the same corrective instructions across sessions.
Step 2: Document the ideal workflow. Write down the exact steps, decision points, quality criteria, and output format you want. Be specific. "Review the code" is too vague. "Run the type checker, execute the test suite, lint against project conventions, then have the code critic evaluate readability and correctness" is a skill.
Step 3: Set the right freedom level. Match instruction specificity to the task's fragility. A narrow bridge with cliffs needs guardrails (exact scripts, strict sequences). An open field allows many routes (heuristic guidance, flexible approaches). Most skills fall somewhere in between.
Step 4: Write the description for matching. This is the step most people underinvest in. Spend time listing every phrase, context, and trigger condition that should activate the skill. Test it by imagining different ways a user might request the task.
Step 5: Iterate based on real usage. Use the skill on actual tasks. Notice where the agent struggles or deviates. Update the skill. The first version is never the final version.
The Future of Agent Skill Systems
Skill systems are evolving in three directions simultaneously.
Skill sharing and marketplaces. Today, skills are mostly authored per-agent or per-organization. As the format standardizes, expect skill sharing to follow the pattern of package managers -- searchable repositories of community-contributed skills covering common workflows.
Cross-model portability. Skills written for one agent runtime (Claude Code, for example) do not automatically work in another. The industry is moving toward more standardized skill formats that work across different agent frameworks, similar to how Docker containers standardized application deployment.
Autonomous skill ecosystems. Self-writing systems today are single-agent loops -- one system generates skills for itself. The next step is multi-agent skill generation, where a swarm of agents collaboratively authors, tests, and refines skills based on collective operational experience. The agents do not just use skills; they actively maintain and evolve the skill library as a shared resource.
The trajectory is clear. Skills are transitioning from static configuration files into living, evolving knowledge bases that grow with the systems they serve. The agents that learn fastest will be the ones with the best skill infrastructure -- not because they have better models, but because they accumulate and compound procedural knowledge more effectively.
Frequently Asked Questions
What is an AI agent skill?
An AI agent skill is a reusable markdown-based instruction set that teaches an AI agent how to perform a specific task. Skills contain procedural knowledge -- step-by-step workflows, decision criteria, quality checks, and domain conventions -- packaged in a format that loads into the agent's context window on demand. Skills are what transform a general-purpose language model into a specialized agent capable of consistent, expert-level performance on specific tasks.
How are AI agent skills different from plugins?
Skills and plugins serve complementary but distinct purposes. An AI agent skill provides behavioral knowledge -- it tells the agent how to approach a task using workflows, checklists, and decision criteria. A plugin provides runtime capability -- it gives the agent the ability to do something new, like persist execution across sessions or manage parallel workspaces. Skills shape reasoning; plugins extend functionality. Most agent systems need both.
What is a Claude Code skill file?
A Claude Code skill file is a markdown document stored in the .claude/skills/ directory that Claude Code's runtime can discover and load automatically. It consists of YAML frontmatter (with name and description fields) and a markdown body containing instructions. Claude Code scans skill descriptions to determine relevance and loads matching skills into context when a task aligns with their trigger conditions. Skills can also be invoked explicitly via slash commands.
Can AI agents write their own skills?
Yes. Self-writing skill systems enable agents to identify gaps in their own capabilities and author new skills autonomously. The process typically follows a pipeline: detect the gap (through error analysis, token optimization, or direct request), evaluate whether a skill is the right solution, generate the skill using the agent's own authoring capabilities, validate against quality standards, deploy to the skill library, and track effectiveness over time. This creates a compound improvement effect where the agent grows permanently more capable with each session.
How many skills does an agent need?
There is no fixed number. The right quantity depends on the breadth of tasks the agent handles and how specialized the workflows need to be. A focused coding agent might need 10-15 skills covering development methodology, testing, deployment, and code review. A general-purpose agent handling code, content, research, and system administration might use 30-40 skills. The constraint is not the number of skills but the context budget -- skill metadata must fit within the context window, which makes progressive disclosure (loading full skill content only when triggered) essential for larger libraries.
What makes a good AI agent skill?
An effective AI agent skill has four qualities. First, a precise description that triggers only when relevant -- vague descriptions waste context, overly narrow descriptions miss valid use cases. Second, concise instructions that add knowledge the model does not already have -- no need to explain Python syntax to a model that already knows Python. Third, appropriate freedom levels -- tight guardrails for fragile operations, loose guidance for creative work. Fourth, progressive disclosure -- core workflow in the main file, detailed references split into supporting documents that load only when needed.