Building Great Agent Skills: The Missing Manual

By AI Engineer

Summary

Topics Covered

Skill hell strikes individuals and organizations alike
Skills trade context load for cognitive load
Hide branching material behind context pointers
Steer agents with leading words, not paragraphs
Hiding future steps forces deeper leg work

Full Transcript

Hello friends. I was dearly hoping to be able to come to the AI engineer World's Fair, but family matters have intruded and I'm not able to make it. However, I

will not be leaving you empty-handed.

I'm going to give you the talk that I would have given in San Francisco. This

talk is called the missing manual. How

to write great skills. And I think that the ability to distinguish good skills from bad skills is only getting more important. As developers, we seem to be

important. As developers, we seem to be pretty talented at finding different forms of hell for us to go to. In like a few years ago, we had tutorial hell, which is where you would go into a bunch

of tutorials trying to learn something, not be able to piece it together and sort of just get into this cycle you couldn't get out of. We had framework hell where every other 10 minutes there was a JavaScript framework being

announced and you know, you had to learn the hot new thing all the time. And now

I think we have another version of hell which is skill hell. Skill hell is where you have all of these skills available freely available that you can download, contribute to, you can figure out on your own, but you don't really know how

the pieces all work together. You can't

tell a good skill from a bad skill. And

this means that people are trying to piece together these frameworks trying to try everything that's out there all at once. And they sort of can't, or

at once. And they sort of can't, or rather, they don't get the results that the skills themselves promise. This is

true at an individual level, but it's also true at an organization level, too.

Organizations have no way or no understanding on how to build good skills. How to take their operating

skills. How to take their operating procedures and turn them into things that an agent can do. And if you don't do that, then it's hard to get the bounty that skills can offer. Just one

more skill, bro. That's kind of seems like what we're saying. And I feel a bit of guilt here, too, because we have Matt PCO skills, which is my skills repo, which is one of the most popular engineering skill sets out there. And so

I feel like I want to help the people who use my skills get out of skill hell.

So how do we do it? How do we get out of this? Well, what is actually missing

this? Well, what is actually missing here? Well, in my opinion, the thing

here? Well, in my opinion, the thing that we're missing is we don't know what makes a skill great. We can't yet look at a skill and go, okay, this skill is doing these good things and these bad

things. There's no shared rubric, no

things. There's no shared rubric, no framework for looking at a skill and making it better. And so that's what I'm going to give you in this talk. I'm

going to give you a skill checklist. A

checklist of things you can look at inside the skill to make sure that it's doing what it says it's doing and ways you can improve it, way you can write skills. This checklist looks like this.

skills. This checklist looks like this.

We start with the trigger of the skill.

How the skill is invoked and the decisions that you need to design there.

Then the internal structure of the skill, how the skill is actually composed and laid out internally. Then

number three is how do you actually steer using the skill? How do you get the skill to tell the agent what to do?

And then four, how do you make the skill as small as possible? Because once we've got a working skill, we then need to basically maximize it. Prune out all of the irrelevant stuff, prune out all of

the noops. There's one handy advantage

the noops. There's one handy advantage of me not being in the room with you, which is you can immediately go and try this out because I've encoded all of this into a new skill in my repo called writing great skills. So, if you've got

an immediate use case for this, then just go to this my skills repo. you

know, just close this browser, get out of here, and go and use this skill to either improve your skills or write great new ones. But let's start going through the checklist. Then we have number one, the trigger, the way the

skill is invoked. And in order to talk about this, I'm actually going to do a bit of comparison here, which is that my skills are often compared to another set of extremely popular engineering skills called superpowers. And I'm really often

called superpowers. And I'm really often asked the question, how do your skills compare to superpowers? What's the

difference between them? To understand

that we need to understand the difference between user invoked and model invoked skills. Anytime you have a skill you can always invoke it manually.

So the skill sits on your file system.

The agent will just be able to pull up the skill and understand what's in there. And you can always do that by

there. And you can always do that by communicating that to the agent. Doesn't

always look like this forward slash depending on the harness but that you can always user invoke your skills.

Another way that skills can be invoked is by the agent itself. off. These are

called model invocable skills or model invoked skills. You can take a

invoked skills. You can take a description. So the description of the

description. So the description of the skill always ends up in the agent's context and the agent can look in that and go okay based on that description I'm going to invoke the skill and I end

up reading the skill.md file which is where the meat of the skill is into my context window. That's how you invoke a

context window. That's how you invoke a skill. That's what happens when a skill

skill. That's what happens when a skill is invoked. So this description serves

is invoked. So this description serves as a kind of context pointer. It sits in the agents context pointing to another file where the agent can go if it wants more context for that context pointer.

You don't need to put it into the agent's context. It can just be

agent's context. It can just be invisible from the agent. And that is what we call a user invocable skill. So

some skills can only be invoked by the user because they don't have this context pointer. It's optional. For

context pointer. It's optional. For

instance, we can see in my codebased design here, this is a model invocable skill. It has a description that ends up

skill. It has a description that ends up in the agents context window. But if we look at my grill me skill instead, we can see it has disable model invocation true. This means that this little

true. This means that this little description here will only show to the user. It won't be visible to the agent.

user. It won't be visible to the agent.

So this then is tip number one. Decide

if your skill is user invoked or model invoked. Now you might think that model

invoked. Now you might think that model invoked skills are better, right?

Because either the model can invoke it itself or the user can invoke it. It's

more flexible. But every time you add a model invoked skill into your agent's environment, it increases what I'm going to call the context load on that agent,

it adds a new description, which is costing you tokens on every request, but also adding a different thing for the agent to think about. So if you have a 100 model invoked skills, that's going

to be 100 descriptions inside the context for your agent. So it seems to make sense then to either tamp down the number of model invoked skills or to just use all user invoked skills. But

user invoked skills have a different load which is the more user invoked skills you have the higher cognitive load on the user. In other words, the more things the user needs to keep in

their head, the more um skill you require from the pilot. And so if we compare Matt Pot skills to superpowers, superpowers is primarily model invoked

skills. gives the agent superpowers.

skills. gives the agent superpowers.

Whereas my skills, I much prefer to be in full control. That means I get to keep the context load on the agent as small as possible, but it does impose more of a cognitive load on me. So I

need to understand the skills really deeply in order to get the most use out of. So why have I done this? Why did I

of. So why have I done this? Why did I prefer user invoked skills? Well, every

time you have a model invoked skill, it basically you get a cost in unpredictability because every time you have a context pointer pointing from one resource to another, the model may just

choose not to follow it. You know, even if it's absolutely perfect for the task, it may just choose not to invoke the skill. I much prefer removing that level

skill. I much prefer removing that level of unpredictability, imposing a bit more cognitive load on the user. And what you get is just you you're removing a class of problem from even being a problem

because this unpredictability leaves people to need to eval their skills to make sure they're being called at the right time which is uh really nasty and it's a problem that I prefer to avoid.

But what I'm hoping to show you here is that model invoke skills and user invoke skills both have their same costs. So

it's not an easy decision which one you choose. So that then is the trigger how

choose. So that then is the trigger how the skill gets invoked. Now let's talk about the structure. the internal layout of the skill. I think of there as being two main units that you need to put into

most skills. These two units are the

most skills. These two units are the steps and the reference. The steps are the step-by-step procedure that the skill is going to walk through. And the

reference is any supporting information that helps it walk through those steps.

You can have skills that have no steps and are only reference. And you can have skills that are no reference and only a set of simple steps to walk through. But

if you start thinking of skills as composed of these two units, it really helps just break them down a lot more.

If we look at an example, one of my skills called two PRD creates a product requirements document out of the current context window. It's got three steps in

context window. It's got three steps in it. So it finds the relevant context. It

it. So it finds the relevant context. It

confirms the test seams with the user.

So there's like a little human in the loop checkpoint there just to make sure we're not doing anything weird with the testing, which I find really important.

And then we write the product requirements document to handle those three steps. We've got two bits of

three steps. We've got two bits of reference material. We've got a little

reference material. We've got a little bit of reference on what is a test seam.

And then we've got a product requirements document template. So just

a literal markdown template which is used to write the PRD. So this is a great way to write a skill from scratch.

You work out if you need some steps.

Then you write those steps and you work out what reference material those steps need and you put it in a separate little spot in the skill which is for reference material. However, there's a really

material. However, there's a really important constraint that we need to think about which is tip number three.

We want to make the main skill.md file

as small as possible. Every skill is composed of its description and then a skill.md file and then any reference

skill.md file and then any reference material that branches off that. And

this skill.md file, if we make it small, then we're saving in a bunch of different ways. Smaller skills are just

different ways. Smaller skills are just easier to maintain, easier to audit, fewer words to think about. And every

time you shave off a word, that is a token shaved. That multiple token shave

token shaved. That multiple token shave from your skills cost. So I do believe that small skills are really important both for maintainers and for users. One

really useful way you can make your skill smaller is by thinking about the different branches of the skill, the different ways the skill can be used.

Because if you have reference material that's only used in one branch, then that's a candidate for being removed from the main skill.md. For instance, if we look at my two PRD here, we have two

pieces of reference material. What is a test seam and the PRD template? Well, we

need the PRD template every single time because we are always creating a PRD.

And we probably also need the what is a test seam information every time because we're always asking about the test seams. So 2P there's only one branch and all the reference material belongs on

that branch. So it probably also belongs

that branch. So it probably also belongs in the skill.md file. However, if we look at a different skill of mine which is domain modeling, domain modeling does two things. It updates a local glossery

two things. It updates a local glossery called context.md and then it also

called context.md and then it also creates architectural decision records.

In other words, it's doing two different things. Or it might actually choose to

things. Or it might actually choose to do neither of these in which case it doesn't need the template and it doesn't need the ADR template either. So in

other words, domain modeling has two or maybe three branches. And this means that we don't need to include the ADR template or the context.md template into the main skill. They can be moved into

separate zones. The way you do that is

separate zones. The way you do that is you have the skill.md file. Then you put it behind a context pointer and you point the context template to a separate

markdown file inside the skills folder.

That context pointer literally just says if you need the template or if you need to update the context.md file, go to this file. And I call that an external

this file. And I call that an external reference. It's a reference that's

reference. It's a reference that's external to the skill.md that you can just easily reference. The agent can pull in very easily because it's bundled along with the skill. So this is a technique you can use for making the

skill.md as small as possible, which is

skill.md as small as possible, which is has so many benefits. Hide branching

reference material behind context pointers. In other words, if you feel

pointers. In other words, if you feel like your skill is going to be used in lots of different ways, then take the reference material that's relevant for those branches and hide them behind context pointers. So that is structure.

context pointers. So that is structure.

We need to think about making the skill.md super duper small. We need to

skill.md super duper small. We need to think about the branches in our skill moving material out behind context pointers. And we need to think about

pointers. And we need to think about steps and reference, which are the two main units inside a skill. Let's go next to steering, the actual ways we get the agent to do what we want it to do. And

for me, steering comes down to one really cool technique, which is the kind of main thing I want you to get from this talk. This technique fixes this

this talk. This technique fixes this issue, which is the agent doesn't do what I want. In other words, I specify something in the skill. I think that I've been clear and then it just doesn't

do the thing. Now, I think the main reason this happens is because you're not using a technique called leading words. The idea of leading words or

words. The idea of leading words or lightvert if you like literary theory I suppose is that there are certain words that pack in a bunch of meaning into a

very small space. These leading words are really powerful with agents because you put the leading word in the skill itself in the text and then the agent will repeat the leading word back to

itself as part of its operations as part of its thinking tokens and as part of its output to you. And then because it's re-emphasizing that word and that word hopefully describes what you want from

the agent that then goes and changes its behavior. Let's make this more concrete

behavior. Let's make this more concrete with an example. So let's imagine that we have a problem which is a classic problem with agents which is that they code layer by layer. In other words, if you give them a big tranch of work to

do, they will generally code up all of the database layer, then all of the schemas, then all of the API endpoints, then all of the front end. they don't do the sort of typical um human thing which

is to seek feedback early on, get something small working and then expand out from there. Now, we can try to encourage the agent to do that by just saying, you know, don't code layer by layer. Um make sure that you create a

layer. Um make sure that you create a small slice first and then go from there. But what if instead we used a

there. But what if instead we used a leading word? We said vertical slice is

leading word? We said vertical slice is our leading word. We want to slice up the work instead of horizontal slices into vertical slices. A vertical slice is a pretty well-known terminology in

development. And so this will hopefully

development. And so this will hopefully trigger the agent's prior, and it will understand what we mean. We don't just have to like have a two-word skill where it just says vertical slice. What we're

doing is we're packing lots of meaning into a relatively short phrase that we then repeat throughout the skill. The

cool thing about this technique is you can know if it's worked because you say vertical slice in your skill and then you'll notice in the reasoning traces that it's saying, "Okay, we're going to do this as a thin vertical slice." then

you should get better implementation plans. Everyone I've explained this

plans. Everyone I've explained this technique to sort of feels like, "Oh yeah, I've been doing that for a while.

I've been using these little phrases to try to encourage the agent to do what I want." All I'm asking you now is to use

want." All I'm asking you now is to use those consistently within your skills and watch in the thinking traces as the agent adopts your way of doing it. So

often if the agent isn't doing what you want, you need to make your leading words more consistent, more powerful, and look for others because you know, English is a pretty wide API in terms of

different functions you can call, different things you can experiment with, and there are many leading word candidates out there, and agents are actually pretty good at helping you think of them. Another little lever you

can use with agents is sometimes the agent just doesn't do enough leg work.

What I mean by this is that okay, we're on a step, let's say, and maybe the step is to ask clarifying questions or to explore the codebase and the agent just doesn't do enough of it. It doesn't put

enough effort into that particular step.

A real classic case of this and something that I have found almost everywhere it exists is plan mode.

Because in plan mode, we have two steps.

We have ask clarifying questions and then create a plan. And what I have found in every single implementation of plan mode I've tried is that ask clarifying questions just, you know,

doesn't ever do enough leg work. It sees

that its ultimate goal is to create a plan. And so it just does a small amount

plan. And so it just does a small amount of leg work with ask clarifying questions, ask you a couple of things, and then eagerly creates the plan. So

what was my solution here? Instead of

doing plan mode, I instead have a skill called grill with docs, which is kind of my ask clarifying questions phase. And

then I split that up into a separate skill. So I split the planning into its

skill. So I split the planning into its own skill. So grill with docs now is its

own skill. So grill with docs now is its own skill where the agent only sees that part of the process. And then after grill with docs completes, we then go

and do two PRD. In other words, we have step one and step two, but the agent only sees one step at a time. So, this

is a really cool technique for increasing leg work on the step that you're on by hiding the future goal, hiding the future steps. It's not always necessary to split skills into

individual steps, but in particular cases where you really want an extra chunk of leg work. It really there's no technique like it. It works very, very well. So, that is steering. using

well. So, that is steering. using

leading words to capture what you want in small reusable tokens and then making sure that it's doing the right amount of leg work per step. So let's head now into pruning. Now pruning really is just

into pruning. Now pruning really is just a quickfire set of failure modes, different things that you can get wrong.

And the first is fairly obvious is we do not want massive skills. Massive skills

are usually a kind of symptom of something else going wrong. So a symptom of one of these other failure modes. And

the first one is pretty simple. Don't

repeat yourself. You need to make sure you're watching out for duplication. And

in general, I like to have every part of the skill to have a single source of truth. In other words, if you have a

truth. In other words, if you have a piece of reference material like the PRD template, let's say, or something even smaller like what is a test seam, you make sure that you don't repeat that in

several places or like cover multiple steps in multiple places. Just make sure each part has a single source of truth and you're not repeating yourself even across reference material too. The next

way that skills get big is via sediment.

And sediment is just a classic thing when people are working on the same set of docs really which is that everyone starts contributing to a shared markdown file. People add their own stuff. They

file. People add their own stuff. They

don't feel brave enough to delete and modify anyone else's. And so you just end up with this huge amount of sediment with often irrelevant material for the skill, especially stuff that hasn't been laid out properly. With a skill with a

lot of sediment, you really need to look at structure. That's the first thing you

at structure. That's the first thing you need to do. You need to make sure that the stuff that's been added is relevant for all branches. If it's not, then move it into the correct branches. Or if it's just totally irrelevant, maybe just

remove it or kill it. Or maybe there's stuff in there that's totally stale, in which case you just need to kill it dead. The next failure mode is really

dead. The next failure mode is really common when an agent writes your skills, which are no ops. So things inside the skill that appear to do something, but don't actually influence the agent's

behavior inside the context of the skill. Let's imagine we have an

skill. Let's imagine we have an implement skill and we have an entire paragraph of the skill that tells the agent to write a long detailed commit message. What would happen if you just

message. What would happen if you just deleted that paragraph? Well, the agent would probably still write a decent like long commit message. People ask me a lot how I get my skills so small and it's

just using these techniques, using deletion tests, using um making sure that I compact things into leading words, I don't have anything irrelevant in there, and I don't have any sediment.

And that finally brings us to the full sweep of things. Number one, we check the trigger. We make sure that it's

the trigger. We make sure that it's firing at the right times. We check

whether we're imposing context load or cognitive load. With structure, we think

cognitive load. With structure, we think about branches. We think about um

about branches. We think about um structuring things into steps and reference and we make sure that um material that's only relevant for one branch is outside of the main skill.mmd.

For steering, we're thinking about condensing text down into leading words and watching those leading words appear in the reasoning traces. And we're also thinking about leg work. Should we break

this skill down further to increase its focus on the current phase by hiding the future phrase phase from it? And with

pruning, we're doing a final pruning pass over the entire skill, watching out for sediments, watching out for crud, and watching out especially for no ops.

Now, all of this stuff, the best way to get started with this framework is inside this skill, inside the writing great skills skill. You can check it out from map. Got skills, download it, uh,

from map. Got skills, download it, uh, use it to improve your own skills and maybe even use it to run over some community uh, authored skills so that you can check that the skills that you're actually pulling in are any good.

If you want to follow along with my stuff, then I have a newsletter up on aihero.dev. And my plans for the next

aihero.dev. And my plans for the next few months are to release an AI coding crash course, which is an intro to a lot of the stuff I've been talking about and how you get off the ground working with

engineering and AI. I hope that what I've given you is enough to help you escape from skill hell or at least try to make the bitter journey out of there.

I'm so sorry not to be able to attend in person, but thanks for watching. I'll

see you very

Loading...

Loading video analysis...