Claude Code Skills Just Got Even Better
By Nate Herk | AI Automation
Summary
## Key takeaways - **Skills Are Simple Text Recipes**: A skill is basically just a recipe, literally just text instructions like a prompt that an agent reads to get tasks right every single time, such as writing a LinkedIn post or internal comms in a company format. [00:10], [00:34] - **Two Skill Types Explained**: Capability uplift skills teach Claude to do something better like design websites with good fonts and layouts, while encoded preference skills enforce specific sequential workflows like idea mining from YouTube comments and trends using parallel agents. [01:12], [02:03] - **Skill Creator Enables Evals**: The new skill creator skill from Anthropic creates, modifies, measures performance, runs evals, benchmarks pass rates and tokens, and tunes triggers for better accuracy, catching regressions or spotting when skills are obsolete. [03:32], [04:46] - **PDF Skill Eval Fixed Placement**: Anthropic ran an eval on a PDF filling skill that struggled to place text correctly, but after improvement, text now accurately fills checkboxes and fields. [04:55], [05:05] - **Live Built YouTube Roundup Skill**: Using skill creator, a YouTube weekly roundup skill was built from a vague prompt to analyze videos, comments, views, generate SWOT, and output a branded PDF report with accurate data after feedback and iteration. [08:44], [14:23] - **Future: Natural Language Skills**: Over time, a natural language description of what the skill should do may be enough with the model figuring out the rest, evolving from steps and rules to high-level specs. [07:18], [07:35]
Topics Covered
- Skills Are Just Text Recipes
- Capability Uplift vs Encoded Preference
- Skill Creator Automates Optimization
- Natural Language Skills Are Coming
Full Transcript
Cloud skills just got 10 times easier to build and stronger to use. So, in
today's video, I'm going to explain exactly why that is and then I'm going to live build a completely new skill right here in front of you guys. So,
real quick, what is a skill? It's
basically just a recipe. So, that when you ask your agent to make you, for example, a LinkedIn post, it will read the recipe and it will get it right every single time. And when I say recipes, I literally just mean text.
It's just text instructions. It's like a prompt. So if I go to customize and I go
prompt. So if I go to customize and I go to skills and I click on let's say for example the internal comm skill this says a set of resources to help me write all kinds of internal communication using the format that my company likes
to use. And you can see this is the
to use. And you can see this is the skill itself. It is literally just text
skill itself. It is literally just text that you could read that an intern could read. Anybody could read and understand
read. Anybody could read and understand what's going on in the skill. And if
you're using them in cloud code you can see I've got a ton of skills here. So
for example let's look at my um idea mining skill. This is the markdown file
mining skill. This is the markdown file that explains to the agent what this skill actually does. And once again, it's all just text. So what did Enthropic actually do that made all these skills better? They updated their
skill creator skill, which is literally a skill that teaches Claude how to build, test, measure, refine, just make all the skills better and better and better. So let's actually cover why that
better. So let's actually cover why that matters and what happened. So the first thing I need you to understand is that there are two different types of skills.
We have a capability uplift skill, which basically is a prompt. So it teaches Claude how to do something better. for
example, design websites with the front-end design skill or create documents or run Excel formulas. Things
that maybe the default model by itself doesn't know super well, but with a prompt, it does a much better job. And
then we also have encoded preference skills, which means that Claude already understands each of these pieces, but it needs to follow them in a specific order. So these are way more like actual
order. So these are way more like actual workflows, like actual kind of like step-by-step automations. So, quick
step-by-step automations. So, quick example. If you ask Claude without a
example. If you ask Claude without a front-end design skill to build you a website, it could do it, but it might just look very generic. It might look AI slop as they call it. But if you give it the exact same prompt, but this time you
also let it use the front-end design skill, it's going to look much better because that skill tells it stuff like good fonts, good color schemes, you know, good background elements, good layouts. And that is a classic
layouts. And that is a classic capability uplift skill. Now, here's an example of an encoded preference skill, which is the one we just saw in my cloud code, which I call idea mining. And this
skill is a little bit more sequential and there's different steps involved.
So, first it will look at my YouTube comments. It will look at, you know,
comments. It will look at, you know, some videos in my niche. It will also look at AI trends on X and the web. It
will then spin up two different agents.
So, a YouTube agent that analyzes this stuff and a research agent that analyzes this stuff. And these run in parallel.
this stuff. And these run in parallel.
And then they both send their output back to the main agent, which will score and cross reference. And then the main agent turns all that information into some video ideas for me, which is why I call it idea mining. So, what I could do
is I could say, "Hey, Mr. AI agent, go look at my comments, go look at YouTube, go look at X, you know, analyze that and help me find some video ideas and every time it would give me different answers and every time it would sort of do it differently or I can just say, hey, do
some idea mining and it will just call the skill and every time I get an output that I like. And the reason why this is actually important to understand is because capability uplift skills might fade over time because for example with
the front-end design skill, right now we're with Opus 4.6, right? What if Opus 5 drops and default Opus 5 is better at front-end design than Opus 5 with a front-end skill? So, at that point, you
front-end skill? So, at that point, you might just need to retire that skill completely, but with an encoded preference skill, these will probably stay pretty durable and accurate because the process is very specific usually to
you, which Opus 5 won't be trained on most likely. Okay, so those are the two
most likely. Okay, so those are the two kind of different types of skills. Now,
we can actually evaluate them. So, with
this new skill creator skill, which is an official anthropic skill, this is the one we're talking about. It's in the repo right here. And if I open up the actual skill MD, you can see this is what it does. It creates new skills. It
can modify and improve existing skills.
It can measure skill performance. So use
this when you want to create a skill from scratch, if you want to update or optimize one. If you want to run evals
optimize one. If you want to run evals to test a skill, if you want to do benchmarks, or if you want to optimize a skills description for better trigger accuracy. So I'm going to talk about
accuracy. So I'm going to talk about what each of these little elements mean, but I just wanted to show you that this is the actual skill creator skill. It's
basically just all of Enthropic's best practices on how to build better skills.
They've done things before like dropped a 33page PDF which walks you through fundamentals, planning and design, testing and iteration, distribution and sharing, all this kind of stuff, patterns and troubleshooting. This is
pretty thorough. So you could either take time and learn this or you could just give your agent the skill creator skill and all that information is already in there. So what the eval do is it lets your agent actually evaluate the
quality of your skill and then make improvements. So, let's say you have a
improvements. So, let's say you have a skill for creating job descriptions.
What you could do is give your agent tons of examples of really good job descriptions that you want. And then it will look at your skill. It will test out some prompts and it will compare it to the outputs and it will be able to optimize your skill for you. As we've
talked about in the past, the more you use a skill, the better and better because you're able to give feedback on what you like and what you don't. So,
this basically shortcuts that process.
Here's a quick example that Enthropic actually ran with this eval. The skill
for filling out some PDF stuff was having trouble finding the right spot to put the text. But then after they ran the evaluation on the skill and it was able to improve, now you can see all the text is accurately being placed whether
that is a checkbox or just a fill-in some sort of field. So there's two reasons that we need to use evals and they sound kind of similar but they're basically the opposite. So the first one is to catch regressions. So this means
let's say we have a job description skill. As a model evolves it might
skill. As a model evolves it might actually use the skill worse because it's trained a little different and it you know thinks a little different. So
this would basically be an early signal that you need to evolve your skill. And
then the second one is to spot out growth. So once again, as models improve
growth. So once again, as models improve or evolve, it might be able to just do a better job without a skill at all. And
that's when you would be able to run the evaluation, say, "Okay, wow, without a skill, it's actually better. And I'm
just going to go ahead and delete this or maybe just archive it." And then we can also run benchmarks. So when a model updates or when you make an iteration and you change your skill, just run all the evals and run a benchmark which will
give you stuff like a pass rate, a time and also how many tokens are being used.
So here's an example where they said benchmark the PDF skill with and without the skill loaded and show me sideby-side results so I can see the uplift. And we
get all this information about these different evaluation metrics. We get the pass rate. We get the total time and the
pass rate. We get the total time and the total tokens. So here you can clearly
total tokens. So here you can clearly see that with the skill you're getting much better results. And then the final piece is skill trigger tuning. So once
you've got a project filled up with, let's just say 10 or more skills, you might notice sometimes that you get false triggers or you get misfires.
Meaning you wanted it to use a skill and it used the wrong one or you wanted it to use a skill and it just didn't use any at all. Luckily, you could also use them with slash commands, but it's so much more convenient to just be able to speak a natural language and make sure
that your agent understands you. So
using the trigger tuning, the skill creator will basically analyze your skill. It will test out different
skill. It will test out different prompts that you might use to trigger that skill and then it will edit the description so that that skill gets called more accurately. And this is an actual evaluation that they ran. You can
see on the lefth hand side and on the right hand side we have the test score and the train score. And the green and blue are basically the results after it has been analyzed and fixed with the
trigger tuning. So you can see it's
trigger tuning. So you can see it's still not perfect, but it's so much better than where we were without this new skill. What I think is really cool
new skill. What I think is really cool and how I want to end off this section before we get into a live demo is where this is going. And at the bottom, we have a quote from Enthropic themselves that say, "Over time, a natural language
description of what the skill should do may be enough with the model figuring out the rest." And I really think that this word may should actually have been will. And basically what this means is
will. And basically what this means is that today when we're telling our agent to build skills for us or maybe just giving it an SOP, we're giving it steps, rules, and format. But what's going to happen in the future is we're going to
be able to just tell it in way more highle natural language what we want and it's going to be able to figure out all of that and get there with a spec and basically just cut down the time that it takes for us to get a really good skill
or you know a really good automation.
All right, so I am in my Herk 2 project which is kind of just like my personal assistant in cloud code and I'm going to show you guys how we can actually get the skill installed. So whether you're in VS Code, which is where I am, or in
the terminal or desktop app, whatever, you just need to do /plugins, you can click on manage plugins, and then if you just go in here and you can see like all of the kind of anthropic official ones, you can just go ahead and search for
skill-creator.
And right here, you can see the official one. Here's the GitHub. And all you have
one. Here's the GitHub. And all you have to do is go ahead and click install. You
can install this for just you, you can install it for your project, or you can install it locally. And I'm just going to install it for the whole project.
So now you can see that's installed and I'm just going to go ahead and restart Cloud Code so that that actually happens. And so just keep in mind if
happens. And so just keep in mind if you're in Cloud Code, it may not show up right here in your actualcloud skills if you did it, you know, in your project.
So you can just verify it and say, do you have the skill creator skill? What
does it do? And you can see right here that we do in fact have that. So I'm
going to go ahead and switch on to plan mode and I'm going to see if it can build us a new skill. I need you to create a skill called YouTube weekly roundup where at the end of every week, you will look at the videos that I made that week. You'll analyze the comments,
that week. You'll analyze the comments, you'll analyze the views, engagement, things like that. And you'll give me a PDF report on all of the insights, strengths weaknesses threats
opportunities. So, that's all I'm going
opportunities. So, that's all I'm going to send off. And I kept this pretty vague intentionally to see what it's going to come back with and how it's going to be able to plan this out for us. And this is where the future's
us. And this is where the future's going. And this is what Enthropic is
going. And this is what Enthropic is talking about. Because most people that
talking about. Because most people that are using skills right now are actual just like executives and managers and operators. They're not engineers, which
operators. They're not engineers, which means we're really good at being able to explain what we want, the metrics we need to hit, and why we need that, but maybe not all of those technical nitty-gritty details. All right, so it
nitty-gritty details. All right, so it came back and asked me some questions.
The first thing I said is I want it to just be the last 7 days. So, it's a rolling 7-day window. It asked about the report sections that it came up with, and I said those look good. And for the PDF style, I told it to use the brand assets in my folder. So right over here
I've got my brand guidelines and then this one is the actual logo for AIS. So
I'm telling it to use those and hopefully it can throw all that on there and make it feel really branded. So it's
going to keep going now with this plan.
All right. So at this point it came back with a plan. And keep in mind I still haven't told it anything about text stack or anything else. It's writing out everything that it's going to do. And
normally I would read through this and give it some tweaks potentially but I just want to see what this skill creator is able to do with a oneshot prompt. And
I'm just going to go ahead and accept.
And look at this. In its to-do list we can see that it creates all these things. But then the last step is to run
things. But then the last step is to run the test and iterate with the skill creator eval process. So I'm excited to see what it does there. So you can see that it created everything and then what it did is it decided to test it to do a
final iteration. Okay. So I was a little
final iteration. Okay. So I was a little confused. I said, "Do you have an actual
confused. I said, "Do you have an actual PDF file for me?" And it said, "Yes, it is in your projects folder." I was looking in the templates where it created an HTML template, but apparently it actually rendered that as a PDF. So
let me go to projects. We'll go to YouTube weekly roundup. And right here we have an actual PDF, which this doesn't look great. Obviously, this is not a PDF, but if I actually open it up
from my files, it is a PDF. So, here we have the logo, we have weekly roundup, we have three videos published, and then we got some stats on views, likes, and comments. I'm going to keep going down.
comments. I'm going to keep going down.
We have our executive summary. So, this
is for it actually ran I think two weeks worth of data just to test this out. And
I will say just by glancing at this, I don't think that this data is correct.
So, keep that in mind. Here we can see the per video breakdown. Right now, we have nothing available in our SWAT analysis. And then we have competitor
analysis. And then we have competitor context and there's nothing available here. So now it's time to give it some
here. So now it's time to give it some feedback and see what it can do. I'm
first of all going to clear out this context because it used up 62%. I'm
going to go back into plan mode and just give it some honest feedback. All right,
so the report looks great. Like
aesthetically, you did a good job on the design. However, the data is all wrong.
design. However, the data is all wrong.
There was a lot of missing elements. I
need you to really look at how you're actually scraping this data from my YouTube channel, how you're actually searching through the comments and competitor videos and make sure that there's actually data going into this report. And before I send this off, it's
report. And before I send this off, it's interesting because you can see here it sent us some JSON data, which is actually the raw information that it was able to find from my YouTube channel.
And the thing is, this isn't super in-depth. So, I just don't think that it
in-depth. So, I just don't think that it did a good enough job on the research element. And maybe this is exactly what
element. And maybe this is exactly what we were talking about earlier over here where at some point the AI is going to be able to understand that we want all of this granular data, but maybe right now it's our job to just explain that
really clearly. I want to see comments
really clearly. I want to see comments analysis. I want to see what's working
analysis. I want to see what's working for other people in the space. I want to see, you know, other trending videos in AI. And I want you to use all of that
AI. And I want you to use all of that and use your brain to figure out what are the strengths my channel has, the weaknesses, and the opportunities and the threats. And then all of this
the threats. And then all of this information should be a pretty in-depth research report for me on, you know, my YouTube weekly roundup. So, while this is running, I thought that we should real quick look at what it actually did.
So, in my claude, we've got my skills folder. If I go all the way down, we've
folder. If I go all the way down, we've got the YouTube weekly roundup. And this
is the MD file. So, we've got the YAML up top with the name, description, disable model invocation false, which just basically means that Cloud Code can call this based on a request. It doesn't
have to be explicitly a slash command.
And then an argument hint. So basically
when cloud code decides to use this skill, it will send in maybe a hint so that the skill understands like what video we're looking at or you know the topic. It's giving some context. It's
topic. It's giving some context. It's
giving some channel benchmarks, some optional focus and then step-by-step instructions on what to actually do here. Now you can see what it's doing is
here. Now you can see what it's doing is it's calling on a script called fetch YouTube data, which if I was to look for that in here, I could probably go down to my scripts. I could see YouTube
weekly roundup. And right here we've got
weekly roundup. And right here we've got some different things. We've got the prepare data. We've got the render
prepare data. We've got the render report. And we also have a script that I
report. And we also have a script that I already had in this project that it was able to find and use, so it didn't have to create a new one. And this one is called fetch YouTube data. So the
skill.md file here basically points to everything that the agent needs in order to do this accurately. Okay. So it's
come back with another detailed plan.
And I'm going to go ahead and fire this off. And I love this. Once again, we've
off. And I love this. Once again, we've got all these to-dos. And then at the end, it says to audit with the skill creator. And this is such a good example
creator. And this is such a good example of why using a project more and using a skill more makes it stronger because some of the pieces that I already had in this project it's able to reuse like my YouTube analyzer agent like my YouTube
data script and of course it has all the context about my business and my YouTube channel in here already. So all of those changes have been made and now all that's left to do is actually run the skill. So I just called the skill. You
skill. So I just called the skill. You
can see that it's reading it right here.
And now what it's doing is it's going to refresh channel data. It's going to use three agents in parallel, prepare the report, populate the data, and then render the PDF and show it to me. So,
I'll check in with you guys when we get that output. All right, that finished
that output. All right, that finished up. We've got some quick hits. Top
up. We've got some quick hits. Top
competitor move, biggest opportunity.
Apparently, Jack Roberts is my biggest threat. If you see this, Jack, keep
threat. If you see this, Jack, keep crushing it. Okay, so here is the
crushing it. Okay, so here is the report. These stats right off the jump
report. These stats right off the jump look a little bit more accurate. I might
want to tell it to make this logo a bit bigger, but it did what we asked. Seven
videos published, and like I said, these stats look more accurate. We've got the executive summary here with some key takeaways of doubling down on dollar outcome titles, make a dedicated anti-gravity tutorial, fill the chat GBT
to claude migration, watch Jack Roberts closely, and then address VS Code versus anti-gravity confusion. We've got the
anti-gravity confusion. We've got the per video breakdown. So now you can see the actual metrics from all the videos, including the one that I literally just dropped like an hour ago. And so all of these look like they're doing okay. This
one might not be doing the best, and similar with this one. But I really like the way that this actually looks. The
layout's pretty good. It is very clean and professional. For the SWAT analysis,
and professional. For the SWAT analysis, we actually have it on the second page.
It still looks good. That's obviously
just an easy spacing issue that we can fix. So, we have some strengths here. We
fix. So, we have some strengths here. We
have some weaknesses. We have threats.
And we have our opportunities. Top
comments and audience signals. Selling
shovels in a gold rush, bro. Well
played, man. 26 likes. Hi, Nate. 10 days
into joining your plus community. I got
my first potential client. It's all
thanks to you and your community.
Awesome. And you can see that we see other comments. We see what video they
other comments. We see what video they came from and how many likes. And we're
also getting video requests, we're getting pain points. And so that really helps me stay in tune with what you guys are saying. Wow, it just keeps on going.
are saying. Wow, it just keeps on going.
We've got competitor context. So all of these channels, all of these videos, all of these stats, and that comes along with some notable gaps. And then
finally, we get what's trending in AI this week. So what are skills, the most
this week. So what are skills, the most powerful AI agent I've ever used, all of this stuff with the channel, the views, the views per day, and the topic. So
this is amazing. And I was able to build this in 20 minutes. And now what I would do is just keep running it. And every
time say, "Hey, I liked this. I didn't
like this." use the skill creator to make this better and better. So anyways,
appreciate you guys making it to the end of the video. If you enjoyed, please leave a like. And now that you understand this concept of skills and how to make them really good, what you need to do next is build your own executive assistant that you can start to build tons and tons of skills into.
So if you want to see how you can do that, then check out this video right up here. I'll see you guys over there.
here. I'll see you guys over there.
Loading video analysis...