How to Write a Skill for an AI Agent [Complete Guide]
An AI agent skill is a markdown file that teaches an agent how to perform a specific task -- the workflow steps, decision criteria, quality checks, and domain conventions that transform a general-purpose model into a specialist. Writing a good skill is the single highest-leverage thing you can do to improve an AI agent's output.
This is not an abstract concept. A skill is a file. It has a specific structure, specific triggers, and specific rules for how instructions should be written. This tutorial walks you through building one from scratch, with real examples you can adapt to your own workflows.
If you are unfamiliar with the concept, start with our guide on what AI agent skills are and why they matter. If you are wondering how skills differ from plugins, see our comparison of skills and plugins. New to AI agents? Start with what AI agents are and how they work.
What Makes a Good Skill
Before writing a single line, understand what separates an effective skill from a useless one.
A good skill encodes knowledge the model does not already have. Large language models already know how to write Python, structure JSON, and compose emails. They do not know your company's deployment checklist, your team's code review criteria, your product's specific API quirks, or the exact steps to debug your infrastructure. That is what skills are for.
A good skill is concise. The context window is shared space. Every token your skill consumes is a token unavailable for the conversation, other skills, and the agent's reasoning. A 2,000-line skill that could have been 200 lines is not thorough -- it is wasteful.
A good skill triggers correctly. If the agent never activates your skill, it does not exist. If it activates when it should not, it pollutes context. The trigger mechanism is the most important design decision you will make.
Skill File Structure
Every skill lives in its own directory and requires one file: SKILL.md. Everything else is optional.
my-skill/
SKILL.md # Required -- metadata + instructions
references/ # Optional -- detailed docs loaded on demand
api-docs.md
examples.md
scripts/ # Optional -- executable code
validate.sh
assets/ # Optional -- templates, images, output files
template.html
The SKILL.md file has two parts: YAML frontmatter and a markdown body. The frontmatter is always in context. The body is loaded only when the skill triggers.
Step 1: Write the Frontmatter
The frontmatter is the skill's identity card. It contains exactly two fields.
---
name: deployment-checklist
description: >
Pre-deployment validation checklist for production releases.
Verifies database migrations, environment variables, API
compatibility, rollback plan, and monitoring configuration
before any deploy. Use when deploying to staging or production,
when a PR is labeled 'deploy', or when the user says "deploy",
"release", "push to prod", or "ship it".
---
The name Field
A unique, lowercase, hyphenated identifier. This is how the skill is referenced in logs, configuration, and the agent's internal registry. Keep it descriptive but compact.
Good names: keyword-research, code-review, api-load-test, pdf-processing
Bad names: skill1, my-skill, helper, utils
The description Field
This is the most important text in the entire skill file. The agent reads all skill descriptions to decide which skills to activate for the current task. If your description is vague, the skill will not trigger. If it is too broad, it will trigger when irrelevant.
An effective description answers three questions:
- What does this skill do? One or two sentences explaining the capability.
- When should the agent use it? Specific contexts, task types, and conditions.
- What trigger phrases activate it? Exact words or patterns the user might say.
Here is a strong description:
description: >
Structured keyword research and content brief generation.
Maps topics to a topical authority taxonomy, performs SERP
gap analysis, assigns keyword tiers, and outputs a
ready-to-write content brief. Use when the user asks for
keyword research, content planning, SEO analysis, or says
"find keywords for", "content brief for", "what should I
write about", or "/keyword-research".
Here is a weak description:
description: Helps with SEO stuff.
The weak version might match on some SEO-related queries, but the agent has no way to know what kind of SEO work the skill supports, what the output format is, or what specific triggers should activate it.
Step 2: Write the Body
The markdown body is where the procedural knowledge lives. This is what the agent reads after the skill triggers. Write it in imperative voice -- "Analyze the results", not "The results should be analyzed."
Choose Your Structure
There are two main approaches, and the right one depends on how much variation the task allows.
Rigid skills use numbered steps, explicit checklists, and specific commands. Use this structure when the task is fragile, when consistency matters more than creativity, or when skipping a step could cause failure.
Flexible skills use guidelines, decision frameworks, and heuristics. Use this structure when multiple valid approaches exist, when the task requires judgment, or when the agent needs room to adapt to context.
Most skills fall somewhere between these extremes. The key question: if the agent deviates from your instructions, how much damage could it cause? High damage potential means rigid. Low damage means flexible.
Rigid Skill Example: Database Migration
# Database Migration Checklist
Run this checklist before applying any database migration to staging or production.
## Pre-Migration
- [ ] Back up the target database: `pg_dump -Fc $DATABASE_URL > backup_$(date +%Y%m%d_%H%M%S).dump`
- [ ] Verify backup integrity: `pg_restore --list backup_*.dump | head -20`
- [ ] Review migration SQL for destructive operations (DROP, TRUNCATE, ALTER...DROP COLUMN)
- [ ] If destructive: confirm rollback migration exists and has been tested
- [ ] Check for long-running queries that may conflict: `SELECT pid, query, state FROM pg_stat_activity WHERE state != 'idle';`
## Migration Execution
1. Enable maintenance mode: `curl -X POST $API_URL/admin/maintenance --data '{"enabled": true}'`
2. Apply migration: `npx prisma migrate deploy`
3. Verify schema matches expected state: `npx prisma migrate status`
4. Run smoke tests against migrated database: `npm run test:smoke`
5. Disable maintenance mode: `curl -X POST $API_URL/admin/maintenance --data '{"enabled": false}'`
## Post-Migration
- [ ] Verify application health: `curl $API_URL/health`
- [ ] Check error rates in monitoring dashboard for 10 minutes
- [ ] If error rate > 1%: execute rollback immediately
- [ ] Delete backup only after 48 hours of stable operation
Every step is explicit. Every command is copy-pasteable. The agent follows the checklist sequentially with no room for improvisation. This is appropriate because a botched migration can destroy production data.
Flexible Skill Example: Code Review
# Code Review
Review the submitted code changes for correctness, maintainability, and adherence to team conventions.
## Review Priorities
Evaluate in this order -- stop at the first category that has significant findings:
1. **Correctness** -- Does the code do what it claims? Are there logic errors, off-by-one bugs, unhandled edge cases, or race conditions?
2. **Security** -- Are there injection vulnerabilities, exposed secrets, improper auth checks, or unsafe deserialization?
3. **Architecture** -- Does the change fit the existing patterns? Does it introduce unnecessary coupling or violate separation of concerns?
4. **Performance** -- Are there O(n^2) loops on large datasets, missing indexes, unbounded queries, or memory leaks?
5. **Readability** -- Are names descriptive? Is the logic followable? Would a new team member understand this in six months?
## Feedback Format
For each finding, provide:
- **File and line** -- exact location
- **Severity** -- critical (blocks merge), warning (should fix), nit (optional improvement)
- **What** -- one sentence describing the issue
- **Why** -- one sentence explaining the impact
- **Fix** -- suggested code change or approach
## Conventions
- Prefer small, focused functions over long procedural blocks
- Tests are required for new public functions
- Error messages must be actionable -- say what went wrong and what to do about it
- No commented-out code in PRs -- delete it or extract it
This skill provides a framework, not a script. The agent decides how to apply the priorities based on the specific code it is reviewing. This flexibility is appropriate because code review requires judgment.
Step 3: Add Bundled Resources
For skills that need more detail than fits comfortably in SKILL.md (which should stay under 500 lines), split content into reference files.
When to Use References
Use references/ when the skill supports multiple variants, frameworks, or detailed documentation that the agent only needs sometimes:
api-integration/
SKILL.md # Core workflow + when to load which reference
references/
rest-patterns.md # REST API integration patterns
graphql-patterns.md # GraphQL-specific patterns
auth-flows.md # OAuth, API keys, JWT details
In SKILL.md, reference them explicitly:
## API Patterns
Select the appropriate pattern based on the API type:
- **REST APIs**: Read `references/rest-patterns.md` for endpoint structure,
pagination handling, and error response mapping.
- **GraphQL APIs**: Read `references/graphql-patterns.md` for query
construction, variable binding, and fragment composition.
- **Authentication**: Read `references/auth-flows.md` for OAuth 2.0,
API key rotation, and JWT validation patterns.
The agent loads only the reference file relevant to the current task. If the user asks about a REST integration, only rest-patterns.md enters the context -- not the GraphQL or auth documentation.
When to Use Scripts
Use scripts/ for operations that need deterministic reliability or would otherwise be rewritten from scratch every session:
pdf-processing/
SKILL.md
scripts/
rotate_pdf.py # Rotate PDF pages by specified degrees
extract_text.py # Extract text from PDF with layout preservation
merge_pdfs.py # Merge multiple PDFs into one
Scripts are token-efficient because the agent can execute them without reading them into context. The SKILL.md describes when and how to use each script, and the agent runs them directly.
When to Use Assets
Use assets/ for files that become part of the output -- templates, images, boilerplate code:
project-scaffold/
SKILL.md
assets/
template/ # Full project template directory
package.json
tsconfig.json
src/
index.ts
Step 4: Test Your Skill
A skill that has not been tested against real tasks is a guess, not a tool. Testing reveals three common failure modes:
Trigger Testing
Ask the agent questions that should activate your skill and questions that should not. Verify the skill triggers when expected and stays dormant when irrelevant.
Should trigger:
- "Deploy the latest changes to staging"
- "Run the pre-deployment checklist"
- "Ship it to production"
Should not trigger:
- "How does our deployment pipeline work?" (informational, not procedural)
- "Write a deployment script" (creating new code, not following a checklist)
If the skill triggers incorrectly, refine the description. If it fails to trigger, add more specific trigger phrases and contexts.
Execution Testing
Run the skill on a real task and evaluate:
- Does the agent follow the steps in order?
- Does it skip steps it should not skip?
- Does it add unnecessary steps?
- Is the output format correct?
- Does it use the bundled scripts and references appropriately?
Iteration
After testing, you will almost certainly need to adjust. Common fixes:
- Agent skips steps: Make the steps more explicit, add checklist markers
- Agent adds unnecessary steps: State "Do not add steps beyond this checklist" or scope the skill more tightly
- Agent misinterprets instructions: Replace ambiguous language with concrete examples
-
Skill is too long: Move detailed content to
references/files - Skill triggers too broadly: Narrow the description's trigger conditions
Step 5: Advanced Patterns
Progressive Disclosure
For complex skills, use a three-level loading strategy:
-
Level 1 -- Metadata (always in context, ~100 words): The frontmatter
nameanddescription. This is all the agent sees until the skill triggers. -
Level 2 -- SKILL.md body (loaded on trigger, under 5,000 words): Core workflow, step-by-step instructions, decision framework. This should be complete enough for the agent to start working.
-
Level 3 -- References (loaded on demand, unlimited): Detailed documentation, API specs, extended examples. Loaded only when the agent determines it needs deeper information.
This design keeps the context window lean. A skill library of 30 skills consumes roughly 3,000 tokens of metadata at rest. Only the activated skill's body enters context, and reference files load only when the agent actually needs them.
Conditional Logic
When a skill must handle different scenarios, use explicit decision trees:
## Determine Test Strategy
Based on the change type, select the appropriate testing approach:
**If the change modifies a public API:**
1. Write integration tests for every affected endpoint
2. Run the full regression suite
3. Update API documentation
**If the change is internal refactoring:**
1. Verify existing tests still pass
2. Add unit tests for any new functions
3. Skip integration tests unless the refactoring touches I/O boundaries
**If the change is a dependency update:**
1. Run the full test suite
2. Check for breaking changes in the dependency's changelog
3. Verify no deprecated APIs are now in use
Explicit conditionals prevent the agent from guessing which path to follow.
Checklist-Driven Workflows
Checklists are the most reliable structure for skills that cannot afford mistakes. The agent treats each checkbox as a commitment -- it checks items off as it completes them, creating a verifiable audit trail.
## Pre-Launch Checklist
- [ ] All environment variables set in production config
- [ ] SSL certificate valid for at least 30 days
- [ ] Database connection pool sized for expected load
- [ ] CDN cache rules configured for static assets
- [ ] Error tracking (Sentry/Datadog) connected and verified
- [ ] Health check endpoint returns 200
- [ ] Rollback procedure documented and tested
- [ ] On-call engineer notified
Real-World Skill Example
Here is a complete, production-ready skill for SEO content briefs. This is the kind of skill that saves hours per article by encoding a specific, repeatable workflow.
---
name: seo-content-brief
description: >
Generate a structured SEO content brief from a target keyword or topic.
Performs keyword tier classification, SERP analysis, content gap
identification, and produces a ready-to-write brief with heading
structure, target word count, and internal linking plan. Use when
the user asks for a content brief, keyword analysis, article outline,
or says "write a brief for", "plan content about", "outline an
article on", "/content-brief", or "/seo-brief".
---
# SEO Content Brief Generator
Given a target keyword or topic, produce a structured content brief.
## Step 1: Keyword Classification
Assign the target keyword to a tier:
| Tier | Monthly Search Volume | Keyword Difficulty | Content Length |
|------|----------------------|-------------------|---------------|
| 1 (Head) | 10,000+ | 70+ | 3,000-5,000 words |
| 2 (Body) | 1,000-9,999 | 40-69 | 2,000-3,000 words |
| 3 (Long-tail) | 100-999 | < 40 | 1,000-2,000 words |
Use WebSearch to estimate volume and difficulty if tools are available.
## Step 2: SERP Analysis
Search for the target keyword. For the top 5 ranking pages, extract:
- Title format and length
- H2/H3 heading structure
- Content length (approximate)
- Unique angles or subtopics covered
- Types of media used (images, videos, tables, infographics)
## Step 3: Content Gap Identification
Compare the top 5 results and identify:
- Subtopics all competitors cover (table stakes -- must include)
- Subtopics only 1-2 competitors cover (differentiation opportunity)
- Subtopics no competitor covers (first-mover opportunity)
- Questions from "People Also Ask" not addressed in existing content
## Step 4: Generate the Brief
Output format:
Content Brief: [Target Keyword]
Target keyword: [keyword] Tier: [1/2/3] Target length: [word count range] Search intent: [informational/transactional/navigational]
Suggested Title Options
- [Title option 1]
- [Title option 2]
- [Title option 3]
Heading Structure
- H1: [Main title]
- H2: [Section 1]
- H3: [Subsection if needed]
- H2: [Section 2] ...
- H2: [Section 1]
Must-Cover Topics
- [Topic from gap analysis]
- [Topic from gap analysis]
Differentiation Opportunities
- [Unique angle 1]
- [Unique angle 2]
Internal Links
- [Link to related existing content]
FAQ Section (for featured snippet targeting)
- [Question from PAA]
- [Question from PAA]
## Step 5: Quality Check
- [ ] Brief targets a single primary keyword
- [ ] Heading structure follows logical hierarchy
- [ ] All competitor table-stakes topics included
- [ ] At least one differentiation angle identified
- [ ] Word count appropriate for keyword tier
- [ ] FAQ questions sourced from actual search data
This skill is 80 lines. It takes about three minutes to write. It saves roughly 45 minutes per content brief because the agent no longer needs to improvise the process each time.
Common Mistakes to Avoid
Over-explaining what the model already knows. Do not include instructions like "Write clear, grammatically correct English" or "Use proper indentation in code." The model already does this. Include only knowledge the model does not have.
Putting trigger information in the body. The body is loaded after the skill triggers. Trigger conditions in the body cannot influence whether the skill activates. All trigger information belongs in the frontmatter description.
Writing documentation instead of instructions. A skill is not a wiki article. It does not explain concepts -- it tells the agent what to do. Replace "Database migrations are changes to the schema that..." with "Back up the database before applying the migration."
Creating too many small skills. If two skills always trigger together, merge them. If a skill is under 20 lines, consider whether it is too granular to be useful. Context-switching between many tiny skills has overhead.
Ignoring the context budget. Every skill competes for context window space. A skill that consumes 10,000 tokens is taking resources from the conversation, the user's instructions, and the agent's reasoning. Use progressive disclosure to keep the active footprint small.
FAQ
How many skills can an AI agent load at once?
The practical limit depends on the model's context window. Only skill metadata (name + description, roughly 100 words each) stays in context permanently. Skill bodies load on activation. A library of 30-50 skills consumes approximately 3,000-5,000 tokens of metadata overhead. The activated skill's body should stay under 5,000 words. Most capable agents can work with 5-10 activated skills simultaneously without context pressure.
What is the difference between a rigid skill and a flexible skill?
A rigid skill prescribes exact steps, commands, and checklists. The agent follows the procedure literally. Use rigid skills for fragile operations where deviation causes failure -- deployments, migrations, compliance workflows. A flexible skill provides guidelines, priorities, and decision frameworks. The agent adapts the approach based on context. Use flexible skills for judgment-heavy tasks -- code review, content strategy, architectural decisions.
Can AI agents write their own skills?
Yes. Advanced agent systems include a self-improvement pipeline where the agent identifies recurring tasks, analyzes what worked and what failed, and generates new skills to encode the optimal approach. This is sometimes called a "Skill Forge" -- an automated system that detects capability gaps and produces new skill files without human intervention. The resulting skills go through validation before deployment.
Where do skill files live in a project?
In most agent frameworks, skills live in a designated directory within the project. In Claude Code, this is .claude/skills/. Each skill gets its own subdirectory containing SKILL.md and any bundled resources. The agent runtime scans this directory and indexes all skill metadata at session start.
Do skills work across different AI models?
Skills are model-agnostic in principle -- they are markdown files containing instructions, not code tied to a specific API. In practice, the triggering mechanism depends on the agent framework. A skill written for Claude Code's skill system uses YAML frontmatter triggers that Claude's runtime understands. Porting skills to another framework may require adapting the trigger format, but the procedural knowledge in the body transfers directly.
Start Building
You now have everything you need to write effective skills. The pattern is straightforward:
- Identify a task you or your team perform repeatedly
- Write down the exact steps, decision points, and quality checks
- Package it as a SKILL.md with clear frontmatter triggers
- Test it against real tasks
- Iterate based on results
The gap between a mediocre AI agent and an exceptional one is not the model. It is the skills. Start encoding your team's best practices today, and the agent gets better at every task you teach it.