What AI Agent Skills Are and How They Work

By IBM Technology

Summary

Topics Covered

AI Agents Know Facts But Lack Procedures
Progressive Disclosure Loads Skills Efficiently
Agent Architectures Mirror Human Memory
Skills Require Security Review Before Use

Full Transcript

What are AI agent skills and why have they become an open standard adopted by practically every major AI coding platform?

Well, because skills address a specific problem with agents.

Now AI agents, they're pretty good reasoners and LLMs or large language models already know a lot of facts.

They can tell you about Kubernetes architecture or the the history of SQL or the airspeed velocity of an unladen swallow, that part is covered but they lack something.

They lack procedural knowledge.

The stuff that's specific to how work actually gets done, like let's say a 47-step workflow for generating a compliant financial report.

Yeah, that would be fun.

An AI agent that is running a large language model when it encounters a task like generating this report, it basically has two options.

Either somebody needs to prompt it with every single step, all 47 of them, and they need to do that every time, or worse still, the agent is just going to take a guess at it.

Now a skill is how you actually add in that procedural knowledge into the agent and the format of a skill is almost comically simple.

It's simply a skill.md file.

That's a markdown file in a folder.

So let me draw out what a skill actually looks like.

So at the top.

Skill.md file is some YAML front letter.

So let's have a look at what is defined in the front letter?

Well, at a minimum, there are two things.

So there is a name and there is description.

These are the two mandatory fields.

Now the name identifies the skill, the description that tells the agent what this skill does and when it should be used.

Now this description is pretty important because it's the trigger condition that tells agent exactly when this skill applies, so maybe the skill name is PDF Builder.

And the description here that says something like, use this when the user asks to extract a PDF.

Now there are some other fields you can put into the front matter like author and version, but it's name and description that are mandatory.

Now below the front mater, we also have a field.

Now these are the actual instructions.

These are the step-by-step workflows, the rules, the examples of input and output, whatever the agent needs to know to do the job.

And it's just written in plain markdown.

And then the skill folder can contain some optional folders as well.

So you don't have to have these, but you can add them.

One of those optional folders is the scripts.

And this has executable JavaScript or Python or bash that the agent can actually run.

There's also a references directory that contains additional documentation that gets loaded if the agent determines it needs it.

And finally, the other optional directory is the assets directory that contains static resources like templates and data files.

That's what an agent skill looks like, but agents can have lots of skills defined for them.

So what happens when there are like hundreds of these skills?

Loading all of them into the LLM context window at startup would blow through the token budget before anyone even gets to ask a question.

So skills use something called progressive disclosure.

And progressive disclosure works in three tiers.

So tier one is metadata only, and that applies.

So at startup, the agent loads just the name and description from each skill.

So that's just a handful of tokens per skill.

So even if there are a hundred skills installed, the overhead isn't gonna fill the context window.

And this is kind of essentially akin to a skills table of contents.

Now tier two, this relates to the full instructions.

When the agent sees a request that matches this skill's description, it reads the complete skill.md body into context.

And this tells the agent what to do, the skill we are teaching it.

And that identification, the matching of a given requirement for a task to the skill available is something that happens through the LLM's own reasoning.

The model decides when it can make use of the skill, which is why a good skill description is so important.

Then tier three, that's these optional folders here.

So these are the resources that maps to the scripts and references and the assets folders.

And they only get loaded when a specific task actually needs them.

So the agent starts with a lightweight index of everything it can do.

That's the name and description.

It pulls in the detailed instructions when they're relevant, the body, based on matching the trigger condition.

And it grabs resources only at the point of need.

Now skills bring a type of knowledge to agents.

There are several ways to incorporate knowledge into an agent.

So let's just briefly compare them because they handle different things.

And the first one I just want to mention is MCP.

That's Model Context Protocol.

What sort of knowledge does MCP give you?

Gives you tool access.

It gives agents the ability to call out to external APIs and to interact with services.

MCP is about what the agent can reach, but it doesn't tell the agent when to reach for it or what to do once it has.

So that's MCP.

Another one is RAG, Retrieval Augmented Generation, and RAG that handles factual knowledge, so it pulls in...

Relevant chunks from our knowledge database at run time, which is pretty handy when the agent needs to look something up.

But RAG doesn't teach an agent how to do something.

It's reference material.

What about another one?

How about fine tuning?

What can that do for us?

Well, fine tuning bakes knowledge directly into the model's weights.

Now that's something that's permanent, but it's expensive.

And if the model changes the fine-tuning has to be redone.

Now skills don't really do any of this so what knowledge do skills bring to agents?

Well skills handle as I mentioned right up front procedural knowledge.

It's how to do things in what order and with what judgment, and because they just files Well, they can be version control, they could be easily updated and you can easily move them between platforms. Now, in practice, skills will often use some of these other forms of knowledge like, well, MCP for example.

So MCP provides the capability to invoke something externally and the skill provides the judgment for when and how to do that.

Now, one more thing to say about skills.

Is that the skill.md format is an open standard, and it's published at agent skills.io and that's an Apache 2.0 license project, and it was adopted across a bunch of AI platforms like Claude Code and OpenAI Codex and many other tools.

So a skill built for one platform works on any platform that supports this spec.

Now there's a useful way to think about skills and it comes from cognitive science.

Now humans have distinct types of memory.

There's semantic memory, which are facts.

So Rome is the capital of Italy.

There's episodic memory, which are experiences.

So, uh, I went to Rome last summer.

Actually I did, and it was lovely.

Uh, and then there's procedural memory, which are skills like how to ride a scooter on the streets of Rome and live to tell the tale, which I also did barely.

Now agent architectures are starting to mirror this.

So semantic memory, that maps pretty closely to retrieval, augmented generation and knowledge bases.

Episodic memory.

Well, that really maps to conversational.

Logs and interaction history and procedural memory.

Well, yep, that maps quite nicely to skill files.

Now, one thing that does need mentioning is that skills can include executable scripts with access to file systems and environment variables and API keys.

That's what makes them powerful, but it's also what makes trust so important.

Because when an agent runs one of these scripts, it's typically executing commands locally on your machine and audits have found publicly available skills frequently contain bad stuff like prompt injection, bad stuff

like tool poisoning, bad stuff like hidden malware.

Basically the usual suspects for any open ecosystem.

So, so treat skill installation the way that any responsible team treats installing any software dependency, which is to say, review it and understand what it does before using it on your local machine.

So, So where does this leave us?

Well, skills are procedural memory for AI agents.

They're defined in a markdown file that lives in a folder that teaches an agent how to do a specific job.

Skills are conditionally triggered and they load efficiently through progressive disclosure and the format is an open standard.

So an agent that already knows the airspeed velocity of an unlaid and swallow, African and European, can now also learn how to perform any repeatable task you define for it.

So that's AI agent skills.

If you're using them, let me know in the comments.

Loading...

Loading video analysis...