There Are 4 AI Skills in 2026. You're Using 1. The Last 3 Separate 10x Users From Everyone Else.
By AI News & Strategy Daily | Nate B Jones
Summary
Topics Covered
- Chat Prompting Now Obsolete
- Context Engineering Creates 10x Gap
- Four Prompting Disciplines Stack
- Specification Engineering Enables Autonomy
- AI Prompting Sharpens Human Leadership
Full Transcript
If you're prompting like it's last month, you're already too late. And I'm
not just doing that for clickbait. If
you haven't updated how you think about prompting since January 2026, you're already behind. Opus 4.6, Gemini 3.1
already behind. Opus 4.6, Gemini 3.1 Pro, and GPT 5.3 codecs have all shipped in the past few weeks with autonomous agent capabilities that make the
chatbased prompting most people are practicing functionally obsolete for serious work. These models don't just
serious work. These models don't just answer better. They work autonomously
answer better. They work autonomously for a long time, for hours, for days against specs without really checking in. That changes what good at prompting
in. That changes what good at prompting means on a fundamental level. And it's
time to revisit how we think about prompting as a result. Not because
prompting stopped mattering. It actually
matters more than ever, but because the word prompting is now hiding four completely different skill sets, and most people are only practicing one of them. And the gap between the people who
them. And the gap between the people who see all four of them and the people who don't is already 10x and widening. In
this piece, I'm going to lay out what those four skills are, why the distinction matters now, and exactly how to build the skills you're missing. This
builds on my earlier work on intent engineering, but it goes way beyond it to lay out a full framework for how to think about prompting post February 2026. Intent engineering is just one
2026. Intent engineering is just one layer in a larger stack. This is the full stack for prompting post February, post these new autonomous models. First,
what changed? The prompting skill that mattered since 2024 has been really conversational. You sit in a chat
conversational. You sit in a chat window, you type a request, you read the output, you iterate, you get better at phrasing things, you provide examples, you structure instructions. If you're
good at that, and if you've been following this video series, you probably are. You've been building real
probably are. You've been building real skills. They work. you're faster than
skills. They work. you're faster than you were a year ago. But that
fundamental chatbased skill has a ceiling. And in early 2026, a lot of
ceiling. And in early 2026, a lot of people are hitting it because the models have stopped really being chat partners and started being workers. Workers that
run for a long time. I'm not kidding when I say days and sometimes weeks. And
the thing about a worker that runs for a long time is that everything you relied on in a conversation, like your ability to catch mistakes in real time, your ability to provide missing context
accurately when the model asks, your ability to course correct when things drift. All of that must be encoded
drift. All of that must be encoded before the agent starts. Not during the course of a conversation, but at the top. This is a fundamentally different
top. This is a fundamentally different skill. It's not a harder version of the
skill. It's not a harder version of the same skill. It's actually different.
same skill. It's actually different.
I've talked before about the importance of thinking about prompting even in the chat window as providing the relevant context for the LLM to provide you an
accurate response. But this goes way
accurate response. But this goes way beyond that. If you were giving an agent
beyond that. If you were giving an agent a longunning task, which is where most of AI is going, even if you're not a coder, then you have to think not about
how do I build for a chat response, but how do I build for economically real work that this agent will do and provide the agent the relevant context for that.
And this shift is happening really quickly. Between October of 2025 and
quickly. Between October of 2025 and January of 2026, in just 3 months, the longest autonomous cloud code sessions nearly doubled, and they've doubled again since then. Agents are running
into the hundreds and thousands in production systems at major companies.
And this is just from publicly available information. We have Telus reporting
information. We have Telus reporting that they have 13,000 custom AI solutions internally. We have Zapia
solutions internally. We have Zapia reporting that they have over 800 agents internally. Look, whenever they release
internally. Look, whenever they release a press release, you got to assume that company feels behind and is releasing something to help them feel better about themselves. The companies that are
themselves. The companies that are really serious about AI don't feel the need for press releases and have an order of magnitude or more agents. So,
this is not about a world that is coming. This is about a world that has
coming. This is about a world that has landed. But this still might not feel
landed. But this still might not feel concrete enough for you. So, let me talk to you about a random Tuesday. Two
people sit down with the same model, same subscription, same context window.
One of them uses 2025 skills. You type a request, you get back something that is 80% right. You spend 40 minutes cleaning
80% right. You spend 40 minutes cleaning it up and you have a good use of AI because you saved it a lot of time. It
might have saved 50% on your time, might have saved 60%, maybe higher. Let's make
the gap concrete. Let's say two people are sitting down with the same model on a Tuesday morning in February of 2026.
Same subscription, same context window.
The only difference is that one of them is using 2025 prompting skills and one of them is using 2026 prompting skills.
So the 2025 person types a request and they're asking for a PowerPoint deck, right? They get back something that's
right? They get back something that's about 80% correct. Maybe there's some formatting issues. Maybe the font has
formatting issues. Maybe the font has some collisions in it. Maybe there's
some styling issues. They spend about 40 minutes cleaning it up, but they're pretty happy because this deck would have taken two or three hours. That's a
2025 prompting skull application. it
would have been good in 2025.
Person B sits down with 2026 prompting skills. They write a structured
skills. They write a structured specification in 11 minutes. They take
longer to prompt. Then they hand it off to the same chatbot, but they're thinking of it and using it as an autonomous agent. They go to make
autonomous agent. They go to make coffee. They come back to a completed
coffee. They come back to a completed PowerPoint that hits every quality bar defined up front. And they're able to do this for five other decks before lunch.
In other words, they are now doing a week's worth of work in a morning easily. Same model, same Tuesday, 10x
easily. Same model, same Tuesday, 10x gap. And if you want to replicate this,
gap. And if you want to replicate this, you can replicate this experiment directly in Claude Opus 4.6 in the co-work model, which is available on Windows and Mac. And you can see exactly
how this plays out. And this did not happen because the person with 2026 prompting skills is smarter or because she's more technical. It's because she's practicing a different skill than person
one and person one doesn't know that kind of prompting skill exists. I think
it's worth paying attention to Shopify CEO Toby Lutkkey here. Unlike most CEOs, Toby is a technical guy and he does not just dig into AI from a LinkedIn
perspective. He has a folder of prompts
perspective. He has a folder of prompts that he runs against every new model release and he really deeply thinks about how new model releases change his workflow. He uses the term context
workflow. He uses the term context engineering because he believes the fundamental skill that we're all facing is the ability to state a problem with enough context in a way that without any
additional pieces of information, the task becomes plausibly solvable. I think
that's a really elegant way to describe what person B did in the example I just showed you. Person B put all of the
showed you. Person B put all of the information that the model needed to build a deck in one task very clearly defined and the model could just go to
work. This isn't about clever prompt
work. This isn't about clever prompt tricks. This isn't about magical words
tricks. This isn't about magical words that an AI can use to produce a better output. It's about a communication
output. It's about a communication discipline. Can you state a problem so
discipline. Can you state a problem so completely with so much relevant surrounding information that a capable system can solve it without going out and fetching more context? Can you make
your request as self-contained as possible? This is a really big deal
possible? This is a really big deal because it demands a much higher bar for communication from us humans than we're used to. And that's something that Toby
used to. And that's something that Toby called out when he reflected on the impact of AI on his own leadership style. One of the things he mentioned is
style. One of the things he mentioned is that by being forced to provide AI with complete context, he is now better at communicating as a CEO. His emails are tighter, his memos are better, his
decision-making frameworks are stronger, and Toby has gone farther than most people thinking about the implications of context engineering. I think one of his most provocative assessments is that
a lot of what people in big companies call politics is actually bad context engineering for humans. What he suggests is that essentially good context
engineering would surface disagreements about assumptions that are never surfaced explicitly but play out as politics and grudges in large companies.
And he says that happens because humans tend to be sloppy communicators who rely on shared context that doesn't actually exist. I think that's a really
exist. I think that's a really interesting thesis. I think one of the
interesting thesis. I think one of the implications of getting this February 2026 prompting lesson deeply ingrained in ourselves is that our communication humanto human is likely to improve and
our organizations are likely to have cleaner decisioning and cleaner communication even between humans as a result. So I bet you're wondering what
result. So I bet you're wondering what are these four disciplines? What does it mean when prompting becomes multiple skills that we need to learn? Well, I
don't want to hide the ball here. Here
is the framework that I would lay out that describes what prompting should be in February 2026. And I've built it to be futurep proof. So we look at the direction that agents are going, how
they're developing this year. These four
disciplines are going to matter even as we expect agents to continue to scale.
This represents a significant update on how I've taught prompting before 2026 because the way we prompted before 2026 was helpful as a foundation. You're not
losing something by having learned it, but it's not enough as agents get more capable. And I think we're due for a
capable. And I think we're due for a reset. So, fundamentally, prompting is
reset. So, fundamentally, prompting is the broad skill of providing input to AI systems so that they can do useful work.
Prompting has diverged into four distinct disciplines, but it's not taught that way. And so, I'm laying this out for the first time here. Each of
these disciplines is operating at a different altitude and time horizon. And
you need to understand them all to prompt well. Just because they're not
prompt well. Just because they're not often prompted this way, just because they're not often taught this way, doesn't mean that good prompters don't intuitively know this. What I'm taking
is intuitive knowledge that I see in excellent prompters and boiling it down into four key disciplines that you can practice and learn from. And these build on each other. If you skip one, I'm presenting them in order. If you skip
one, you're creating the kind of failures we tend to see at scale in the enterprise, but you're creating it for yourself in your own prompting. And I'll
kind of get at what that means and you'll get the idea. So discipline one is prompt craft. This is the original skill. This is the skill I have taught
skill. This is the skill I have taught and many others have taught for the last year or two. It's synchronous. It's
sessionbased and it's an individual skill. You you sit in front of a chat
skill. You you sit in front of a chat window. You write an instruction. You
window. You write an instruction. You
evaluate the output. Then you iterate.
The skill here is knowing how to structure a query. And I've talked a lot about this in the past. I'll rehash it briefly here. You must have clear
briefly here. You must have clear instructions. You must include relevant
instructions. You must include relevant examples and counter examples. You need
to include appropriate guard rails. You
need to include an explicit output format. And you should be very clear
format. And you should be very clear about how you resolve ambiguity and conflicts. So the model doesn't have to
conflicts. So the model doesn't have to make it up on the fly. This is what anthropics prompt engineering documentation covers. Open AI talks
documentation covers. Open AI talks about this. Google talks about this.
about this. Google talks about this.
It's on a thousand blog posts and LinkedIn courses. Prompt craft has not
LinkedIn courses. Prompt craft has not become irrelevant. Don't hear that. It's
become irrelevant. Don't hear that. It's
just become table stakes. It's sort of the way knowing how to type with 10 fingers was once a professional differentiator and now it's just table stakes. It's just assumed. If you can't
stakes. It's just assumed. If you can't write a clear, well ststructured prompt in 2026, you're the person in 1998 who couldn't send an email. Is it important
to be able to do it? Yes. Is it going to differentiate you in the workforce? No,
not really. The key shift is that promptcraft was the whole game when AI interactions were synchronous and sessionbased. You wrote something, you
sessionbased. You wrote something, you got something back, and you refined it in real time. As a human interacting with that model, you were acting as the intent layer, as the context layer, and
as the quality layer so that longunning tasks could get done. You did all the breaking out of those tasks. That model
of prompting broke the moment agents started running for hours without checking in. Discipline 2 we've been
checking in. Discipline 2 we've been talking about for a few months now. It's
called context engineering. Enthropic
published the foundational piece on this back in September of 2025, but there's a lot of other good pieces out there as well. I've written a fair bit on it. I
well. I've written a fair bit on it. I
define context engineering as the set of strategies for curating and maintaining the optimal set of tokens during an LLM task. And that's not just me defining
task. And that's not just me defining that. That's a pretty commonly held
that. That's a pretty commonly held definition. Wang Shane's Harrison Chase
definition. Wang Shane's Harrison Chase was even bluntter about what context engineering is during a recent Sequoia Capital interview when he said everything is context engineering. It
actually describes everything we've done at lane chain without knowing the term existed. So that's actually somewhat
existed. So that's actually somewhat dangerous because context engineering is only one of four levels and people have misunderstood it to mean everything. And
one of the things I'm trying to get us to move toward is a world where we understand context engineering as a specific skill where we're providing relevant tokens to the LLM for inference. And yes, it is certainly
inference. And yes, it is certainly foundational. It is certainly
foundational. It is certainly significant. It is where the industry's
significant. It is where the industry's attention is focused today. It is the shift from crafting a single instruction to curating the entire information environment and agent operates within.
all of the system prompts, all of the tool definitions, all of the retrieved documents, all of the message history, all of the memory systems, the MCP connections. The prompt you write might
connections. The prompt you write might be 200 tokens. The context window it lands in might be a million. Your 200
tokens are 002% of what the model sees.
The other 99.98% that's context engineering. This is the discipline that produces claw.md files,
agent specifications, rag pipeline design, memory architectures. It's the
discipline that determines whether a coding agent understands your project's conventions, whether a research agent has access to the right documents, whether a customer service agent can retrieve relevant account history.
Anthropics engineering team identified the core challenge precisely. LLM
degrade as you give them more information. That's correct. They do.
information. That's correct. They do.
And the point therefore is to include relevant tokens because the issue is not that they can't hold the tokens. It's
that retrieval quality does drop as context grows. The practical implication
context grows. The practical implication is that people who are 10x more effective with AI than their peers are not writing 10x better prompts. They're
building 10x better context infrastructure. Their agents start each
infrastructure. Their agents start each session with the right project files, the right conventions, the right constraints already loaded. The prompt
itself can be relatively simple because the context does the heavy lifting. I
have seen this for myself as I've built out my own context engineering scaffolding. And if you're wondering how
scaffolding. And if you're wondering how to do it for yourself, I'm putting a guide together with this video over on the Substack and I think it'll be helpful. Discipline number three. This
helpful. Discipline number three. This
one we don't talk a lot about. I think
is where we're going as these agents start to do much longer autonomous running tasks. I wrote about this one at
running tasks. I wrote about this one at length in a prior piece. I did a video on it. I'm going to be brief here. I'm
on it. I'm going to be brief here. I'm
going to contextualize where it sits in the stack. Context engineering tells
the stack. Context engineering tells agents what to know. Intent engineering
tells agents what to want. It's the
practice of encoding organizational purpose, your goals, your values, your trade-off hierarchies, your decision boundaries into infrastructure that agents can act against. Claro story is
the proof case I talked about. Their AI
agent resolved 2.3 million customer conversations in the first month, but it optimized for the wrong thing. It
slashed resolution times, but it didn't optimize for customer satisfaction. And
as a result, CLA got into big trouble and had to rehire a bunch of human agents and is still dealing with the customer trust aftermath. So, intent
engineering sits above context engineering the way strategy sits above tactics. You can have perfect context
tactics. You can have perfect context and terrible intent alignment. You
cannot have good intent alignment without good context, though, because the agent needs information to act on the intent. So these disciplines again
the intent. So these disciplines again they're cumulative. Another thing I want
they're cumulative. Another thing I want you to notice is that failure as we progress up this hierarchy is getting more and more serious. When you as an individual sit down and you screw up a
prompt, it might waste your morning at worst. When you as a human being sit
worst. When you as a human being sit down and screw up context engineering or intent engineering, you are screwing up for the entire team, your entire org, your entire company. The stakes get
higher. And because the stakes get
higher. And because the stakes get higher, our attention to detail matters and the value of the work we do increases commensurately. What I am
increases commensurately. What I am talking about when I talk about context engineering and intent engineering can be a full-time role at a big company.
And if it's not, it is a highstakes human skill that has a lot of transferable value. Level four is
transferable value. Level four is specification engineering. We're just
specification engineering. We're just starting to talk about this now, even though the best practitioners are already doing it. Specification
engineering is the practice of writing documents across your organization that autonomous agents can execute against over extended time horizons without human intervention. This is a level
human intervention. This is a level above everything I've described because all of the first three levels focused on how you prepare work directly for an agent. Specification engineering is
agent. Specification engineering is really about thinking about your entireformational corpus in your organization as agent fungeible, agent readable. Everything
you write has to be something the agent can access and do something with. It's
not really about prompting per se. It's
not about an individual agent's context window. It's not even about the intent
window. It's not even about the intent you've given agents. Specifications
are complete. They're structured.
They're internally consistent descriptions of what an output should be for a given task. They look at how quality is measured. Specifications are
a mindset you bring to your documents that allow you to apply agents across large swaths of your current company's
context with the confidence that what the agent reads is going to be relevant.
I think an interesting example from Anthropic actually comes from the team's struggles with the Opus 4.5 agent which is one generation ago now. They were
trying to build a production quality web app. But if you give the agent only a
app. But if you give the agent only a highlevel prompt like build a clone of claw.ai, the agent tries to do too much at once, runs out of context mid-implementation, and leaves the next
session guessing at what happened. The
fix, it turned out, was not a better model. It was specification engineering.
model. It was specification engineering.
It was a pattern that you could specify where an initial laser agent sets up the environment. A progress log documents
environment. A progress log documents what's been done and a coding agent then makes incremental progress against a structured plan every session. The
specification became the scaffolding that let multiple agents produce coherent output over days. So the shift from prompt to specification mirrors a transition that happened in human
engineering decades ago. When you're
building something small, verbal instructions and conversations work really well. When you're building
really well. When you're building something large enough to require a team or span multiple sessions, you need blueprints. Anthropic needed blueprints
blueprints. Anthropic needed blueprints in the Opus 4.5 example. And even though we've now moved to Opus 4.6, the need for specification engineering has not
gone down, it's gone up because Opus 4.6 can do even more work. That's true for Codeex 5.3. It's true for Gemini 3.1 Pro
Codeex 5.3. It's true for Gemini 3.1 Pro as well. The smarter models get, the
as well. The smarter models get, the better you need to get at specification engineering. Which is why I deliberately
engineering. Which is why I deliberately started this section zooming out and saying the entire org's document corpus should be viewed as a form of specification engineering. And yes, this
specification engineering. And yes, this is a fractal insight. You can also think about specification engineering for your individual agent task where you think
about what is the log that the agent has. How do we assign tasks across this
has. How do we assign tasks across this agent build? How do we make sure the
agent build? How do we make sure the agent has a clearly specified requirements list to work from? But all
of that gets way way easier to put together if you think of your entire organizational document corpus as specifications that are agent readable.
Your corporate strategy is a specification. Your product strategy is
specification. Your product strategy is a specification. Your OKRs are a
a specification. Your OKRs are a specification. Everything ends up being
specification. Everything ends up being a specification that your agent can use.
And that's different from context engineering because the art of context engineering is really about shaping the context window in a way that's relevant for the agent. Right? If you look at these four levels, the prompt is you and
the agent and you're working on crafting clear instructions. The context window
clear instructions. The context window is how do you shape relevant tokens.
Intent engineering is how do you communicate goals and objectives to the agent that allow the agent to work autonomously for long periods of time in a direction consistent with company
strategy. Specification engineering is
strategy. Specification engineering is how do you think about your entire corporate document structure, the knowledge, the context that makes the corporation work as a form of
specification. And yes, for individual
specification. And yes, for individual agent runs, how do you refine that specification that becomes something where you give the agent a good specification? It is going to start to
specification? It is going to start to keep the context window cleaner than before. And so these start to interplay,
before. And so these start to interplay, right? If you write good spec, if you
right? If you write good spec, if you have a good task log, if the agent understands what the spec is from the broader organizational context, they're less likely to go off the rails because
of intent engineering conflicts. They're
less likely to bloat out with bad context. And so all of these start to
context. And so all of these start to interplay, but the highest level is to think about specification as the way
your organization does business. You
specify the outputs you want. The agent
does the work. the outputs are produced.
That is the highest level description of what business is going to look like in the next couple of years and it starts with understanding how to specify. This
is where anthropics best practices documentation for cloud code becomes really revealing. The recommended
really revealing. The recommended workflow for complex features is relatively simple. Interview me in
relatively simple. Interview me in detail. Ask about technical
detail. Ask about technical implementation, UIUX, edge cases, concerns, and trade-offs. Don't ask
obvious questions. Dig into the hard parts. The agent then writes the spec
parts. The agent then writes the spec with the human. I think that that is an artifact of this moment in time. And I
think we will get to a point where the agent will only be asking us for places where the broader specification corpus is in conflict or ambiguous and we have
to talk about what it means to us for this task to be accomplished. Well,
because the entire organizational infrastructure is going to be agent. So
the practical skill going forward is not writing code. It's not crafting prompts.
writing code. It's not crafting prompts.
It's the ability to describe an outcome with enough precision and completeness that an autonomous system can execute against it for days or weeks. I hope
you're seeing here how that is a fundamentally different skill from writing a good prompt in a chat window.
And the people who are excellent at one of these layers are not automatically excellent at all of them. Context
engineers spend a lot of time thinking about how to compress tokens and get good tokens into context windows and how to keep bad tokens out. That is a different mindset than thinking about your information environment is agent
translatable, agent readable, agent fungeible. We have to have all of these
fungeible. We have to have all of these skills in order to effectively bring AI into the enterprise or even into a small business in 2026. And I will tell you
one-person businesses have the greatest advantage right now because if you are a oneperson business and you can just convert your notion to be agent readable, you're off to the races today.
There's no gigantic effort required to make all of your SharePoint agent readable. It's simple. It's easy. You
readable. It's simple. It's easy. You
just get it into notion and you're done.
This comes back to the core idea that in 2026, speed is going to matter because agents are going to keep getting better quickly. What we have now as days and
quickly. What we have now as days and weeks is going to become weeks and months by the end of the year. And the
corresponding impact of getting specification engineering correct is going to be even higher. The
corresponding impact of getting all four levels translated into specific roles, people who are responsible, DRIs, teams who handle this, that's going to be even more valuable in 2026. And so what I
mean by that is that if you are at a large company, you should have people who are doing context engineering and that's all they're doing. You should
have people who are doing specification engineering and thinking about how agents can read the enterprise. You
should have people who are thinking about intent engineering and how you translate goals into a set of objectives that an agent can read and value and a
set of verifiable guardrails the agent can follow. Look, the mental model most
can follow. Look, the mental model most people carry is that prompting is good instructions for the AI and that fails for a very specific reason. And I hope you're seeing it here. That entire model
assumes synchronous interaction. In the
synchronous AI human partnership model, you're always there at the computer. You
see the output in real time. You correct
mistakes right away. You provide
additional context when the model asks or when you notice it going off track.
longunning agents break every single assumption in that model. So if you've relied on the assumptions of synchronous prompting, you have a structural
vulnerability in the way you think about AI. You need to start thinking about AI
AI. You need to start thinking about AI as if your real time oversight is embedded in the specification before the
agent begins to work. The planner worker architecture that's dominating production agent deployments really reflects this reality. A capable model plans the work, decomposes it into
subtasks, defines the acceptance criteria, and assigns work out to models and then cheaper, faster models do the work. The planning phase, you could call
work. The planning phase, you could call it the specification phase, determines the quality ceiling. The planning phase, basically taking your specification and expanding it and enriching it and
breaking it out and planning against it.
That's what determines the quality of the work of the overall system.
Execution without that specification step produces broken work and it requires extensive human rework to be a value at all. So the shift from fixing it in real time, which is what we do
with a lot of prompting in the chat window to we must get the spec right up front changes your bottleneck skill.
Right? Real time prompting rewards verbal fluency. It rewards quick
verbal fluency. It rewards quick iteration. It rewards a good eye for
iteration. It rewards a good eye for output quality. Specification
output quality. Specification engineering rewards completeness of thinking, anticipation of edge cases, clear articulation of acceptance criteria, and the ability to decompose
really complicated outcomes into independently executable components.
Different people are good at these things in different ways. Some people
are going to be naturally exceptional at synchronous prompting and they're going to struggle with specification work. And
some people will be mediocre at chatbased interaction, but they might actually be excellent spec engineers. My
challenge to you is that you don't tolerate whatever your natural propensity is as your ceiling. Think of
this as a learnable skill and go after it. Now, you might ask, if we're going
it. Now, you might ask, if we're going after it, what are the foundational elements to learn? Specification is
ironically very vague. Nate, please tell me what you mean. I want to suggest to you that in 2026 we can define the primitives that go into good specifications in ways that are useful
for us to learn. And I'm going to go ahead and define them right here. I
think these are the foundation that we need to learn if we want to get better at specifying and we want to get better at the prompting skills that will matter in 2026 and beyond. Primitive number
one, self-contained problem statements.
This is actually Toby's insight, but it's not only his insight. It's been
primitive. Number one is self-contained problem statements. Think back to what
problem statements. Think back to what Toby said when he talked about the idea that we have to give the model everything it needs to do the work. Can
you state a problem with enough context that the task is plausibly solvable without the agent going out and getting more information? The discipline of
more information? The discipline of self-containment forces you to be clear.
It surfaces hidden assumptions. It makes
you articulate constraints you normally leave implicit because you trust the human on the other end to fill in the gaps. AI doesn't fill in gaps reliably.
gaps. AI doesn't fill in gaps reliably.
It fills them with statistical plausibility, and that's a polite way of saying it guesses in ways that are often subtly wrong. So if you're trying to
subtly wrong. So if you're trying to train this primitive, I would say take a request you would normally make conversationally, like update the dashboard to show the Q3 numbers and
rewrite that as if the person receiving it has never seen your dashboard, doesn't know what Q3 means in your org context, doesn't know what database to query, and has no access to any
information other than what you include.
That is the level of self-containment you should be challenging yourself with if you want to get better at this primitive. Primitive two, learn about
primitive. Primitive two, learn about acceptance criteria. If you can't
acceptance criteria. If you can't describe what done looks like, an agent can't know when to stop or more precisely, it will stop at whatever point its internal huristics say the
task is complete, which may bear no relationship to what you needed. This is
why the 80% problem is a big issue for agent system design. the specification.
Let's say the specification said build a login page when it should have said build a login page that handles email passwords, social ooth via Google and GitHub, progressive disclosure of 2FA
session persistence for 30 days and rate limiting after five failed attempts. If
you're training on this primitive, you want to get to the latter, not the former. For every task you delegate,
former. For every task you delegate, write three sentences that an independent observer could use to verify the output without asking you any questions whatsoever. If you can't write
questions whatsoever. If you can't write those sentences down, you probably do not understand the task well enough to give it to an agent. I have had that happen where I've been in a conversation
with an AI agent and I realize I don't know enough to delegate the task and I have to come back later. That's okay.
It's good to realize that before you assign the work. Primitive three is constraint architecture. Learn
constraint architecture. Learn constraint architecture. What the agent
constraint architecture. What the agent has to do, what the agent cannot do, what the agent should prefer when multiple valid approaches exist, what the agent should escalate rather than
decide autonomously. These four
decide autonomously. These four categories, the musts, the must notss, the preferences, and the escalation triggers form the constraint architecture that turns a loose
specification into a very reliable one.
The claw.md pattern that's emerging in the coding community is a practical implementation of constraint architecture. The best claw.md files are
architecture. The best claw.md files are not long lists of rules. They're
concise. They're extremely high signal constraint documents. Use these build
constraint documents. Use these build commands. Follow these code conventions.
commands. Follow these code conventions.
Run these tests before marking a task complete. Never modify these files
complete. Never modify these files without explicit instructions. The
community consensus is very strongly that every line in an file needs to earn its place. If you ask would removing
its place. If you ask would removing this line cause the AI to make mistakes and the answer is no it really wouldn't then kill the line. So if you want to train this primitive before delegating a
task you should be writing down what a smart well-intentioned person might do that would technically satisfy the request but produce the wrong outcome.
Those failure mos end up being your constraint architecture. Encode them.
constraint architecture. Encode them.
Primitive four is decomposition. Large
tasks need to be broken into components that can be executed independently, tested independently, and integrated predictably. This is software
predictably. This is software engineering's oldest lesson, modularity, but is being applied to AI task delegation. Anthropic's longunning agent
delegation. Anthropic's longunning agent harness splits every complex project into an environment setup phase, a progress documentation phase, and an incremental coding session, each
independently verifiable. You get
independently verifiable. You get similar task decomposition automatically inside codecs. A marketing content audit
inside codecs. A marketing content audit requires the same decomposition as a coding task. So I'm not just talking to
coding task. So I'm not just talking to engineers here. You would have to
engineers here. You would have to decompose your marketing content into quality scoring, gap analysis, recommendation generation, etc. If you want to train on the primitive of task
decomposition, take any project that you would estimate at a few days of work and decompose it into subtasks that each take less than 2 hours for you to do.
Have clear input output boundaries and can be verified independently of the other tasks. That is the granularity at
other tasks. That is the granularity at which agents work best and it's the granularity at which specification engineering tends to operate. Now, in
2026, you do not have to pre-specify all of those 2-hour tasks when you are writing a prompt, but you do have to
understand what all of those tasks are.
And you have to understand how to describe for a planner agent what done looks like and what decomposible pieces look like in such a way that the
planner agent can reliably break the work into those 50 or 60 subtasks. In
other words, your job increasingly is not to manually write the subtasks for the agent. Your job is to provide the
the agent. Your job is to provide the break patterns that a planner agent can use to break up larger work in a reliable executable fashion. That's a
level of abstraction even above decomposition and that is a lot of where we're going as agents start to run.
Primitive 5 is evaluation design. This
is critical to do not just in an individual level but at an org level.
Organizations need to think about every level of AI deployment in terms of eval.
How do you know the output is good? Not
does it look reasonable, which is how most people evaluate AI output. But can
you prove measurably consistently that this is good. If prompt craft is the art of the input, evaluation design is the art of knowing whether that input
worked. And in a world where agents can
worked. And in a world where agents can run for a really long time, eval design is the only thing standing between AI generated output that I can't use and AI
generated output we can really use asis.
If you want to train this primitive for every recurring AI task in your world, build an eval build three to five test cases with known good outputs and run them periodically, especially after
model updates. This is going to catch a
model updates. This is going to catch a regression. It'll build your intuition
regression. It'll build your intuition for where models fail. It will create institutional knowledge about what good looks like for your specific use cases, your team, you, your org. You need to be
doing this systematically. If you're
listening to all this and you're wondering where to start, I gave you those four layers for a reason in order.
Start by closing the prompt craft gap.
Most people are worse at basic prompting than they think. You should be rereading prompting documentation. I've written a
prompting documentation. I've written a bunch on the Substack. I'm writing more for this piece. You should do interactive tutorials. You can head over
interactive tutorials. You can head over to AI Cred and see where you are on prompting. You should be building a
prompting. You should be building a folder of tasks that you do regularly, writing your best prompt against each one and saving the outputs as your baseline and then revisiting them over
time. Take prompt craft seriously.
time. Take prompt craft seriously.
Second, once you feel like you start to have a handle on that, start to build your personal context layer. You should
be writing a cla.markown equivalent for your work. I don't care if you use
your work. I don't care if you use claude. You still need to have an idea
claude. You still need to have an idea of your goals, your constraints, your communication preferences, your quality standards, the institutional context that a new team member would need 6
months to absorb written down. Start AI
sessions by loading this context. The
difference in output quality should be immediate and obvious. Then get into specification engineering. Take a real
specification engineering. Take a real project, not a toy problem, and write a specification for it. Then start to get into intent infrastructure. This is an organizational layer. So if you manage
organizational layer. So if you manage people or systems, you can start encoding the decision frameworks your team uses implicitly. If you are an individual contributor, you should be
encoding the decision frameworks you understand and trying to be a champion that pushes for this at the organizational level. A lot of teams
organizational level. A lot of teams like to talk about adopting AI. We'll
talk about it in terms of building intent infrastructure. Talk about what
intent infrastructure. Talk about what good enough looks like for each category of work together. Talk about what gets escalated by AI versus what AI can decide. Write it down, structure it,
decide. Write it down, structure it, make it available to agents. And then
practice specification engineering. Take
a real project, not a toy problem. Write
a spec for it before touching AI. Talk
about acceptance criteria, constraint architectures, decomposition, etc. Hand that spec to an agent and see what comes back. And oh yes, from an org
back. And oh yes, from an org perspective, start to think about every doc you touch as a spec that the agent will need to read and operate against.
Your org is a system of business processes, even if you're a team of one.
And those business processes should be agent readable and they should be speckable. This has some of the
speckable. This has some of the downstream implications that Toby talked about where he said a lot of organizational politics is just bad context engineering at the org level. If
we practice better specification engineering for our documents, we will expose a lot of the implicit assumptions that we end up being political about inside orgs and we will start to make
those agent readable and they will start to become fungeible and we will start to have fewer issues. Practicing
specification engineering is a way for us to clearly describe intent at organizational scale and clearly translate that intent in a way that agents can read it. And yes, it nests
down to individual agent runs and it ladders up to the full organizational context. And that's why it's the last
context. And that's why it's the last and most difficult skill to learn. The
progression here from prompt craft all the way up to spec engineering is not a ladder where you can abandon lower rungs. It's a stack where each layer
rungs. It's a stack where each layer makes the layers above it possible. You
cannot write good spec if you can't write good prompts. You can't build effective agent systems if you don't understand context engineering. You
can't align agent behavior with organizational goals without understanding how intent works and how that plays into context engineering.
They all go together. There's a final dimension to this that goes beyond AI, and I want to spend some time on it. I
hinted at it when I talked about the idea that Toby found that he communicated better when he got better at prompting. The best human managers
at prompting. The best human managers that I've worked with already operate with that degree of clarity. They give
complete context when they delegate.
They specify acceptance criteria to their team members. They articulate
constraints. They're effectively
following the four disciplines of AI input with their people. And that makes for effective leadership. What's
happening right now, I think, if we step back, is that AI is enforcing a communication discipline that the best leaders have always practiced intuitively and now everyone needs it in
order to be effective. You cannot just rely on shared context with the machine.
You cannot just assume that AI will know. And that is something that is a
know. And that is something that is a gift to us because so many of our colleagues don't know what we mean either. How many times have you sat in a
either. How many times have you sat in a meeting where someone is referring to a document and you don't know what that document is and you're afraid to ask?
That is a wonderful example of the kind of poor communication quality that goes into human meetings. This is not a framing you'll see in a lot of how to prompt courses. I think it should be.
prompt courses. I think it should be.
The skill of providing highquality input to intelligent systems turns out to be a skill that's translatable for AIs and for humans. It turns out to be a
for humans. It turns out to be a fundamental skill of the agent age that benefits us as humans and how we work together. I think the people who develop
together. I think the people who develop these skills, this collection of skills around prompting for 2026 are going to end up being the leaders who will run
organizations where agents and humans both perform at their ceilings. And the
people who don't, the people who are stuck in 2025 prompting skills are going to wonder why their AI investments keep producing partial value. And meanwhile,
their human teams keep having alignment issues. The prompt by itself is dead.
issues. The prompt by itself is dead.
The specification, the context, the organizational intent. That is where the
organizational intent. That is where the value in prompting is moving toward because agents are starting to work for longer and longer periods and look in a lot of ways like junior employees. the
specification done right turns out to be just what clear thinking has always looked like really made explicit because machines don't let us be lazy about it and I'm really excited for the way that
kind of communication clarity can clean up our organizations and our humanto human communication as well. Good luck
with prompting the humans and agents in your life. Cheers. And yes, there's lots
your life. Cheers. And yes, there's lots more on this on the Substack. I think
this is one that needs a really complete guide. So, I wrote up a lot of extra
guide. So, I wrote up a lot of extra stuff for this so that you can dive into what each layer of learning means.
Loading video analysis...