LongCut logo

AGENTIC WORKFLOWS 6 HOUR COURSE: Beginner to Pro (2026)

By Nick Saraev

Summary

Topics Covered

  • The Overhang: Why Most People Use AI at 5% of Its Potential
  • AI Sent My Emails for Me in 15 Seconds
  • AI Models Code Better Than Senior Devs

Full Transcript

Hey, welcome to the definitive guide on agentic workflows for business. Now,

agentic workflows have the potential to bring about what I think is one of the largest wealth transfers in human history. But very few people are

history. But very few people are currently talking about how to practically use them to improve their financial means. That's what this video

financial means. That's what this video is going to show you how to do. Here's

what you're going to learn. What an

agentic workflow really is. How agentic

workflows function via loops. A few

common problems with agentic workflows and how to fix them. How to actually build these things. So, idees, setting up your workspace, creating your first flow, the DO framework, directive orchestration and execution, claude

skills, MCP and other frameworks, what each one does, when to use which and how they all fit together, how to test and validate agentic workflows, the best system prompts for agentic workflows, which I will give you, how to make your

workflows self annealing, aka heal themselves when they air out, how to move out of the IDE and into the cloud.

I'll teach you how to create web hooks, schedule triggers, and more. How to run multiple agents simultaneously. I'll

show you a sub aents and advanced workflow parallelization. And finally,

workflow parallelization. And finally, how to troubleshoot agentic workflows when things break. If you don't know who I am, I build two AI based service agencies to $160,000 a month in combined revenue. I've also consulted for a

revenue. I've also consulted for a couple of billion-dollar businesses with AI. And I tell you this cuz I want to

AI. And I tell you this cuz I want to make it clear. Well, you guys are of course going to learn everything from the fundamentals all the way up to the advanced concepts today. This course has a business focus. My goal is to help prepare as many people as possible for

what I consider to be the next stage of the economy. So what you will learn

the economy. So what you will learn today is working right now. It is

generating revenue right now and you can use it to improve your own and other people's businesses right now. Please

bookmark this and use the chapter feature to come back to it or whenever you need anytime. And I hope you guys are excited as I am to get into Agentic Workflows. Let's get started. This is a

Workflows. Let's get started. This is a practical course. The whole point of it

practical course. The whole point of it is to build and then use Agentic Workflows in real business environments.

And that's because building is the most effective way to learn anything. When

you build with your hands and get them dirty, you're forced to deal with concepts in a way that you guys never would have if you just sat back and passively listened. That said, before we

passively listened. That said, before we get into the building, and there will be a lot of building and a lot of demos in this course, there are some foundational things about agents and workflows that I'd highly recommend that you understand

because if you don't understand them, you're going to commit many hours to this course and you'll only really be able to digest or extract a few percentage points of it. So what I want to do is I want to maximize the ability

and efficiency of your time by helping you cover those concepts now. And by

doing that, you'll be able to absorb the rest of the course a lot faster and a lot better. So what do I mean by

lot better. So what do I mean by concepts? AI is currently in an overhang

concepts? AI is currently in an overhang state. Current AI capabilities are very

state. Current AI capabilities are very far beyond what most people believe, expect, or know how to use. If you guys graft this, what we have down here is

sort of like the general public's perception of AI, okay? And their

ability to use it. And what we have above it is sort of like the reality, okay? You guys are going to see a lot of

okay? You guys are going to see a lot of very crappily drawn lines in this course, so you might as well get used to them now. So this gap between the

them now. So this gap between the reality of the situation and then what people believe AI is capable of is called the overhang.

The reason why this overhang exists and the reason why people are only squeezing out a very small percentage of the actual value of AI, large language models, agentic workflows and so on and so forth [snorts] is because right now

most people are using them as glorified copy and paste tools. They are basically trying to drink through the Pacific or Atlantic Ocean with a tiny straw. You

know, they ask these galaxy brain intelligences. Pretty dumb questions to

intelligences. Pretty dumb questions to begin with to be honest. They answer and then all they do is they copy it from one tab into another, which is obviously a very low bandwidth, really

bottlenecked way of working. They are

not integrating AI into their business like I'm about to show you how to do in this course. Instead, they're just

this course. Instead, they're just dealing with it like a like an external sort of third party thing.

Now, obviously, people are figuring out that AI is a lot more powerful than most people give it credit to, and courses like mine are helping them do so. But as

they figure it out, the arbitrage window will close. And in case you guys didn't

will close. And in case you guys didn't know, arbitrage is your ability to essentially produce some sort of beneficial outcome, revenue or profit, based off of a disparity in knowledge.

And so, if you know, you know, this and the rest of the market knows this, obviously there's kind of a gap there, right? and the market is willing to pay

right? and the market is willing to pay you to be somebody that solves that little tiny gap. Well, that window is closing because people are learning about how this technology works. But

right now, it's wide open and you can make a ton of money with it. So, just as a demonstration to show you how powerful these models are, I'm going to have one in particular called Claude Opus 4.5 do a pretty straightforward task for me.

This task is to compile a list of five local meal preparation companies that deliver to around my area and then find their email addresses. I'm then going to send each of them emails with specifications from this email. I want

uh you know 3500 calories a day, 200 grams of protein a day. I'm doing some big bulk. Do this entirely autonomously

big bulk. Do this entirely autonomously requiring no input from me. If you

cannot find the emails of at least five, then keep on searching until you do.

Most people don't realize that models are entirely capable of doing this sort of thing for you and essentially acting as you know an extension of yourself. So

it's starting off by searching for meal prep delivery companies downtown Vancouver BC 2025. If I were doing this on my own, this is probably something that I would do as well, right? like

very straightforward and logical. You

don't need to know how the IDE that I'm using uh works. You don't need to understand the interface or everything.

I'm going to cover all this later on in the course. And as you can see, it's

the course. And as you can see, it's found me a bunch of meal preparation services. There's Fresh Prep, Two Guys

services. There's Fresh Prep, Two Guys with Knives, Crave Healthy, Fed, Fresh in Your Fridge, K-Bop, and then WellFed.

Now, it's finding email addresses of each of these. So, as you can see, it's actually simultaneously running a bunch of searches on their websites to look for email addresses or contact methods.

A few seconds later, it looks like it could only find one email out of the four or five searches that it ran. So,

what is it doing instead? It's now

broadening its search. It's going on contact pages. It's looking for

contact pages. It's looking for alternative solutions. Okay, it's now

alternative solutions. Okay, it's now accumulated the email addresses and like a temporary database. And it's just going through and sending emails. It

does so through uh what's called an MCP, model contact protocol server that I've set up. I'll show that to you later. And

set up. I'll show that to you later. And

boom. Now, it is done. So, we've sent five emails. Down here, you can see it

five emails. Down here, you can see it said, "I asked each company about custom meal plans, pricing for higher volume orders, and their delivery schedule to downtown Vancouver." We also included

downtown Vancouver." We also included the requirements. I went through and I

the requirements. I went through and I actually found the email that it sent.

It was something like this. Hey, company

team, I'm looking for a meal prep service that delivers to downtown Vancouver and that contains the following requirements. Daily calories

following requirements. Daily calories approximately 3500. Daily protein

approximately 3500. Daily protein approximately this much. Focus on whole foods and healthy ingredients.

Interested in learning more? Do you mind letting me know? you know, if you guys offer custom meal plans, um, what your pricing looks like and how your delivery schedule works. Looking forward to

schedule works. Looking forward to hearing from you. Thank you very much.

So, I mean, like, this is something I realistically probably would have sent myself. Um, is it in my exact tone of

myself. Um, is it in my exact tone of voice, honestly? Like, it's really

voice, honestly? Like, it's really close. This is more or less everything

close. This is more or less everything that I would send. There's no AI isms. People on the other end of the line aren't going to know that I'm using AI to do this sort of thing. And it turned a process that realistically would have previously taken me maybe like 20

minutes into something that took me literally less than 15 seconds. I mean,

I wrote the thing, I pressed enter, and then I went. And what you'll see is with the use of other bandwidth improving tools like voice transcription and stuff like this, you can actually have agentic workflows become more or less your

interface for the internet. And I should note that I didn't even use a defined agentic workflow for this. I literally

just asked an agent to do something and it was super unstructured and it still did a great job. Imagine when we wrap this in the framework. I also want to cover this idea of a river of value. The

way I see the global economy is as a giant river. Okay. Now, capital flows to

giant river. Okay. Now, capital flows to whoever provides value. And essentially

what occurs is for many centuries that value has come from human labor, primarily physical to start, although eventually cognitive. And then the more

eventually cognitive. And then the more value that people could produce, the more downstream little tributaries of this river we found. And so this might be some person that's producing

tremendous value, these might be other people and so on and so forth. The whole

idea of capital is that as solutions arrive in the economy that are more and more effective, [gasps] they produce larger diversions of this stream. Okay?

And so let's say this person Z is using agentic workflows. The idea is over the

agentic workflows. The idea is over the course of the next few years, he or she is going to consume more and more and more and more and more of that river until essentially he's getting all of

it. Those who position themselves as

it. Those who position themselves as people like Z in this case will capture massive flows from the future economy because agentic workflows aren't optional. There's something that are

optional. There's something that are coming and being deployed right now. The

last thing I want to talk about is automation in the terms of a Gentic workflow. Now, a lot of people that

workflow. Now, a lot of people that watch my channel and are probably here are familiar with the idea of automation. They're also familiar with

automation. They're also familiar with the idea of roles and they've heard a lot of things about how AI agents are coming and their whole fleets of teams that are being replaced and so on and so

forth. And this is kind of inaccurate.

forth. And this is kind of inaccurate.

Rather than thinking about agentic workflows, which is what we're going to cover in this course, as being able to automate 100% of one role, I want you to think about it a little differently. I

want you to think about agentic workflows as being capable of automating 90% of 10,000 roles. So as opposed to

automating 100% okay of one, we're automating say 90% of 10,000 people in the organization. Now if you automate

the organization. Now if you automate 100% of one role, that's actually pretty valuable. Don't get me wrong. If I could

valuable. Don't get me wrong. If I could automate a software developer completely end to end, if I could automate a marketer end to end, obviously that produces some value in my organization.

But agentic workflows, like a lot of technology, have gaps. And so, um, the main issue is human beings tend to always have a little bit more context than these things do, at least right now. And so, even the ability to

now. And so, even the ability to automate 90% of 10,000, despite the fact that it's not 100, is still tremendously valuable. If you just do the math,

valuable. If you just do the math, automating 100% of one person's role is equivalent to basically providing one unit of economic value. Whereas, if you automate 90% of 10,000 people's, you're

providing 9,000 units of economic value.

As long as you structure your companies in a way to accommodate these things, these things are very powerful. Now, I

call this horizontal leverage and it's very, very strong. Another way I want you to think about this is like the industrial revolution. Back in the good

industrial revolution. Back in the good old days, well, I don't know if they were really good, but certainly back in the day, you had people like seamstresses who would, you know, knit various garments and stitch various

things together. And maybe one of these

things together. And maybe one of these seamstresses could produce, you know, 10 pairs of a specific type of clothing per day. Well, after the industrial

day. Well, after the industrial revolution, obviously we didn't do a lot of this stuff by hand anymore. We had

machines that did this stuff instead. So

maybe a loom. Before a single seamstress could produce maybe 10 garments a day.

After one of these machines could maybe prepare 10,000 garments in a day. That

said, it the machine didn't fully replace that seamstress because that seamstress just transitioned. Instead of

being somebody that worked with their hands on building the garment directly, they instead became somebody that was supervising whole fleets of machines that did it. Now imagine if in this

analogy, not only can we build and use a loom, we are capable of rebuilding that loom in any configuration in seconds. We

don't have to, you know, smelt the metal and then hammer it and then construct it in a way and screw gears and all that stuff in order to build a machine. We

could literally just use natural language. Obviously, that would be a lot

language. Obviously, that would be a lot more powerful, right? Well, that really is the idea of an agentic workflow. It

is something that provides incredible horizontal leverage and we can reconfigure it in seconds to do more or less whatever we want. And it's not an exaggeration to tell you that this is a

phase change essentially in a company's ability to automate things. So if you guys are familiar with automation platforms, in this case this is N8N, you'll know that most of the time the

way that we are currently building automated systems is through drag and drop nodes or modules. And so on the left hand side here, I have a simple system set up. I'm not going to go through everything because it's

pointless. The point is not to learn a

pointless. The point is not to learn a specific automation platform. The point

is to learn how to automate platforms in general, but I have a specific automation here that just responds to some emails coming in for a cold email campaign. And as you see here, we have

campaign. And as you see here, we have these nodes and they do various things.

Some of them do HTTP requests. Some of

them do some data processing and and formatting. Some of them call a Google

formatting. Some of them call a Google sheet. We have some AI functionality and

sheet. We have some AI functionality and so on and so forth. They're all

connected with these lines, which is basically the the flow of logic through a system. And this is hunky dory. It

a system. And this is hunky dory. It

works really well. Well, the new version of that workflow on the left, which obviously requires a lot of time, energy, and understanding in order to be

able to to parse and then change is what we have on the right. Instead of dealing with nodes and specific software platforms, we use the universal translation, which is natural language,

and then just write it out in bullet points. So on the right hand side I have

points. So on the right hand side I have the exact same workflow except I have it set for agentic uh systems and all it is is a list of bullet points. Hey when

somebody replies to one of your cold outreach campaigns instantly should send a web hook. The system should look up the campaign in a Google sheet to find talking points and example replies. It

should then research the person who replied. It should then generate a short

replied. It should then generate a short friendly reply. If they said something

friendly reply. If they said something negative like unsubscribe or remove me, we should skip them. If there's no knowledge base, we should skip them.

Otherwise, we should send the reply automatically. I want you guys to see

automatically. I want you guys to see that on the left hand side, we had to spend months, maybe years, becoming skilled enough to use a platform to be able to build systems that did this. And

on the right, a toddler who has a a rough idea in mind of what he or she wants to do can write it out in natural language. And not only can everybody

language. And not only can everybody else on a team interpret that, we can also change that at any point. If I

wanted to add an additional step to my workflow, all I do is I click click on this, press enter, and then just write it out. and the agentic workflow builder

it out. and the agentic workflow builder and then eventually doer using a framework I'm going to run you guys through later on in this course will do it and it'll do it extraordinarily remarkably well. So that's a very

remarkably well. So that's a very fundamental change in how these things work and hopefully it's clear to everybody here that workflows are no longer drag and drop sort of builds in the concept that we see on the left hand

side. They're very much so just like

side. They're very much so just like basic logic. So why is all of this stuff

basic logic. So why is all of this stuff possible right now? It certainly wasn't just a little while ago. Well, there are three main reasons. intelligence, tools,

and cost. On the intelligence side, model intelligence just crossed a threshold and became very, very good, seemingly overnight, but really we've been working up to it for quite a while.

Frontier models like Anthropics Claude, OpenAs, Chat, GBT, Google's Gemini, and then a bunch of other ones have gotten really smart. They score around 80% on a

really smart. They score around 80% on a benchmark called software engineering bench verified. And this measures real

bench verified. And this measures real software engineering ability. This is

not a crappy cherrypicked demo. It

wasn't included in like the training data or anything like that. These are

novel problems that are being solved in novel ways through models. And

essentially, they are genuine professional grade work that are better than most software engineers. Now, I

would have considered myself a software engineer a couple of years ago. I'd say

my skills have definitely uh deteriorated a fair amount since because I've been focusing more on no code tools and and making money and stuff like that. But this stuff is so far beyond my

that. But this stuff is so far beyond my own abilities as sort of like a mid-level dev u that it's not even funny. Most people that learn about this

funny. Most people that learn about this and they're going to be learning about it pretty soon will think that AI went from, you know, intern level to some sort of senior employee overnight. But

this is just how knowledge works.

Basically, anytime that you have a process and that process slowly gets better and better and better over time, most people don't see until we hit a certain threshold and then it almost

looks like it went vertical. In reality,

uh it's almost like the way that boiling water works, right? The temperature of water goes up and up and up and up and up and then eventually it boils and then it fundamentally changes state. You

know, it goes from over here where it's like a liquid to over here where it's a a gas. And although we're supplying more

a gas. And although we're supplying more and more energy to this thing, we're not really seeing it change until all of a sudden, boom, it's producing bubbles and getting all over the place. So, I see model intelligence a very, very similar

way. So, a lot of people talk about

way. So, a lot of people talk about benchmarks. Very few people actually

benchmarks. Very few people actually show what the questions inside of a benchmark realistically ask. I think

benchmarks are for the most part pretty artificial. A much better test of how

artificial. A much better test of how good a model is is just how good you feel while using it. But it is important that at least we understand how benchmarks work in order for us to really put in context the capabilities

of agents. So here's uh one from

of agents. So here's uh one from Astropi. It's a misleading exception

Astropi. It's a misleading exception message. And basically, these models are

message. And basically, these models are so good at coding. Like, like, I mean, I tried to look through and understand what any of these actual questions meant and how to fix them. I'd probably be staring at each of these for like a day

before anything makes sense. Um, let

alone before I get to the point where I could realistically solve it. These

models can do this sort of thing in in seconds. So, issue problem statement.

seconds. So, issue problem statement.

Hey, removing a required column from a time series raises a misleading error message. The error claims the time

message. The error claims the time column is missing even when it's present. Instead, the error should list

present. Instead, the error should list all missing required columns. Then it

gives you a snippet of code with the actual class time series. Right? So

looking at that, no idea what the hell that does. The bug, if flux is missing,

that does. The bug, if flux is missing, error still complains about time. Error

message is factually incorrect. You're

fix detect which required columns are missing. Report them explicitly. So you

missing. Report them explicitly. So you

actually have to go through and you have to do this with the code. Okay, here's

one from sort of like a Panda style question. Load CSV silently coerces

question. Load CSV silently coerces mixtype columns instead of failing quickly which leads to incorrect downstream computations and then it like provides a list. So, we now have models that are basically capable of looking at

a thousand of these and solving more than 800 of them perfectly. I mean, if you gave me a thousand of these, not only would I take like a year, I would probably get at least, you know, 50% of

these things wrong. And I'm somebody that has some exposure to this sort of stuff. Imagine the average person. And

stuff. Imagine the average person. And

so what I mean to say is that we are essentially empowering every human being on earth or at least we have the potential to empower if we were to actually distribute this technology and if everybody were to know it to the

level that you will know it by the end of this course with the powers of like a mid-level to even senior developer in many cases. Another important point is

many cases. Another important point is how fast these models can operate. I

mean this is me asking chat GPT 5.2 thinking to just reason a little bit about the meaning of life. Check out the stream of output that it's providing.

But you can go way faster than that.

This is an example of a diffusion LLM that it basically immediately processes and writes I don't know how many hundred words, but extraordinarily quickly. You

see that we just click generate and then immediately after, you know, probably at least 300 words for instantiated. These

models can run these reasoning loops extremely quickly behind closed doors.

In addition, providers like uh Anthropic and OpenAI and Gemini and stuff have all the compute necessary to run these things like 10, 50, 100 times faster than you are yourself. So just imagine

what's going to happen when that level of technology drips down to the rest of the economy. Like to be clear, these

the economy. Like to be clear, these models, the ones that I'm using to build agentic workflows, are already extremely powerful and have automated the vast majority of my day-to-day work. They can

automate the vast majority of your day-to-day work as well or any of the companies that you work with. But

imagine the models in 3 months. Imagine

the models in a year from now. That's

why learning how to build these sorts of workflows today is probably one of the highest ROI skills that you can engage in. The second thing is tool integration

in. The second thing is tool integration is now standardized. So there's some protocols out there like model context protocol which standardizes how AI connects to external tools, databases, resources, and stuff like that. I'm

going to be showing you guys how to use model context protocol in pretty advanced ways that I don't think a lot of other people have covered in this course. I'm also going to be talking

course. I'm also going to be talking about some of the downsides of model context protocol like how initially it totally blew but now it's uh actually pretty good and well supported so it's it's worth us diving in. In addition to

you know those tools through MCP there also some frameworks that have recently come out. One is directive orchestration

come out. One is directive orchestration execution. This is the framework I'm

execution. This is the framework I'm going to be using to build and then use our agentic workflows throughout the course. There are also platform specific

course. There are also platform specific frameworks like cloud skills for the cloud family of models. these formalize

tool calling and you know in case you have no idea what I'm talking about here LLM are really flexible okay which is a great thing conceptually it's great if you want to write poems and write do creative writing and help you respond to

emails and stuff like that but a lot of business functions don't depend on flexibility what they depend on is the opposite they depend on reliability so in business we need to standardize and

tools are basically just standardized little things that we can use in order to accomplish business tasks I like thinking of it like a caveman that you know, is hunting saber-tooth tigers or something. If you're a caveman and

something. If you're a caveman and you're hunting saber-tooth tigers, and every time you go to a saber-tooth tiger, you're completely empty-handed, what are you going to do? The first

thing you're going to do is you're going to be like, "Holy crap, is that a saber-tooth tiger?" You're going to

saber-tooth tiger?" You're going to scrge around on the ground to look for rocks and pointy stabby things and, you know, sticks and anything that can buy you some distance and then maybe some effectiveness. Contrast that with if

effectiveness. Contrast that with if before you had a little bit of foresight and you said, "Hm, I should probably build something that's kind of pointy and sharp." Huh? So, you you work all

and sharp." Huh? So, you you work all day and night and you put together a spear. Well, every time you encounter

spear. Well, every time you encounter that problem of the saber-tooth tiger, okay, what are you going to do? You're

just going to pick up your spear and deal with it. Just my really crappy drawn spear. That's sort of the same

drawn spear. That's sort of the same thing that LLMs use tools for. They

encounter problems. When they encounter them a few times, they then develop tools that solve them or use pre-existing ones through MCP. And then

in doing so, we can standardize the solving of business problems pretty easily.

Okay. The last thing is just cost economics and they finally make sense.

When Claude Opus 4.5 dropped, it went from a cost of about $15 or $75 depending on input or output per 1 million tokens to five or $25 depending

on input or output for 1 million tokens.

That's a 3x reduction. And newer models are even cheaper than that. The cost of intelligence per like effectiveness has plunged something like 40% in the last year. If I were to graph this, it would

year. If I were to graph this, it would actually look like this. Now, I've been using models since GPT3, way back in 2020 when it was um initially released with a very small, you know, select

group of people that could access it and so on and so forth. GPT3, which is, I mean, orders upon orders upon orders of magnitude dumber than this, costs more than this technology that we are dealing

with right now. It is insane how quickly the price of knowledge work has plummeted. It's already gone down 40

plummeted. It's already gone down 40 times in just the last year. I imagine

it'll probably go down another 40 times over the course of the next year, maybe even more. What that means is we can

even more. What that means is we can actually send large volumes of tokens to these things to replace the work of like deterministic um old school automations like the NAN flow that I showed you without it running a business ragged

into the ground. There are also tons of price wars that are occurring between major providers and there's a lot of like geopolitical incentives between, you know, places in the east and then places in the west um to basically make these things as accessible and easily to

use as possible. So to make a long story short, this is new. Very few people understand the capabilities right now.

So there are many billions of dollars that will shift as the market learns and adapts. It is much better to be an early

adapts. It is much better to be an early mover than somebody that is affected by this technology uh without their consent or knowingness. What I mean is would you

or knowingness. What I mean is would you rather learn about this stuff now or would you rather learn about it in 2 years when your boss or I don't know some some client base of yours turns to you and says hey we no longer need you

because we have aic workflows to do it.

I would much rather be the person that helps them build those agentic workflows than I'd be the person that's now sitting on my ass because I don't know anything about them. Hopefully, you are too. Okay, so now that that big

too. Okay, so now that that big preamble's out of the way, let's learn about chat bots, agents, agentic workflows, uh, knowledge tools, and then actually get our hands dirty with some demos. I like thinking about knowledge

demos. I like thinking about knowledge tools as evolving over the course of the last 30, 40 or 50 years. I always think about it sort of like the step ladder on

the right where you have three rungs. At

the bottom you have documents. In the

middle you have chats and at the top you have agents. Over the course of the last

have agents. Over the course of the last 30 40 50 years we basically transition from knowledge in the form of docs to knowledge in the form of chats over the last 5 years to knowledge and action in the form of agents. And I'm going to run

you through what each of these look like now before actually using them in a real workflow. So documents are static

workflow. So documents are static knowledge. Hopefully they're pretty

knowledge. Hopefully they're pretty straightforward. It's oneway information

straightforward. It's oneway information flow. All you do is you read the

flow. All you do is you read the document, but it's not like the document can respond to you. We currently use documents everywhere in school and in business. We use them in legal

business. We use them in legal agreements. We use them in training

agreements. We use them in training materials. Once you write a document, it

materials. Once you write a document, it obviously stays fixed. That's a feature, not a bug, because it's great for permanence. Like if you're writing

permanence. Like if you're writing contracts or standard operating procedures that are immutable, aka it should not change. You don't want your contract or your standard operating procedure rewriting itself unless you want it to, right? In most cases, you

don't. So, u that's great. That's

don't. So, u that's great. That's

actually a feature, not a bug. Chat

bots, on the other hand, are not static.

They are dynamic. Chat bots were developed realistically way back in the 1970s, but we were only starting to use them for real knowledge purposes and maybe like the early 2020s. And they

perform two-way interaction. You read

the output, but you can also ask questions back. So, here's a crappy pass

questions back. So, here's a crappy pass to GPT40 where I just said, "Hey, what's up? Hey, Nick. All good on my end. Quick

up? Hey, Nick. All good on my end. Quick

check-in. Zero fluff. I'm ready to help if you want to chat. If you got a decision to make, whatever. What's on

your mind?" This is now two-way knowledge interaction. the dreaded

knowledge interaction. the dreaded mdash. Um, this allows you to do things

mdash. Um, this allows you to do things like clarify confusing points. You can

ask for research. You can dig deeper into topics. You can also modify the

into topics. You can also modify the knowledge. So, you could upload, you

knowledge. So, you could upload, you know, a PDF or you could make some statement and then the chatbot now has some additional context. Uh, I just think of it like really smart colleagues who read everything you give them, but then they're also confined to a chair.

You know, they can't move and they can't do anything with it. So, essentially all you can do is is talk. This is how most people treat models today as chat bots.

They're dynamic knowledge, but they're still subject to this little window.

They can only be communicated with and copied and pasted through your chatgbt or through your cloud output. Now,

contrast that with agents, which I consider to be dynamic action. To make a long story short, this is two-way interaction, just like chat bots, except this time it acts. On the right hand side here, you can see I have a flow

that says run the thumbnail generator on a link. So, it's not just asking it a

a link. So, it's not just asking it a question about the thumbnail generator, and I'm actually having it do something.

And this is a real agentic workflow that I developed to basically build YouTube thumbnails like what you guys saw on my channel. What we see here is a

channel. What we see here is a fundamentally different interface. On

the left hand side, we have some of these nodes. Green ones here are actions

these nodes. Green ones here are actions that are being taken. These gray little sections over here are thinking nodes, which are where the model reasons um extemporaneously, basically temporarily, and then discards these reasoning

tokens. You can see that it's actually

tokens. You can see that it's actually calling a script. You don't need to know Python in order to like have the model do really cool things for you, but that's what's happening right here. And

then down over here we have a bash output where it's actually ran. We have

an output that we can then use and so on and so forth. So you're given visibility into the reasoning. You're also given visibility into the um planning tool memory reasoning and then observation loop. And I'm going to cover exactly

loop. And I'm going to cover exactly what that looks like in a moment. You

also have autonomy, long execution times. Agents can routinely run for 5 or

times. Agents can routinely run for 5 or 10 minutes. Now yesterday night I

10 minutes. Now yesterday night I actually had an agent run for over 5 hours uninterrupted to build me a really cool system. As of today I think of

cool system. As of today I think of models like a mid-tier developer.

They're 100K a year or so in terms of their like capability. But if you think about it, I'm spending 20 bucks a month for this, which is 240 bucks a year, which is over 400 times cheaper. And not

only is it cheaper, this thing works 24 hours a day, as I mentioned, or it can work 24 hours a day. You can do a lot of really cool things with models like this. So now is the time to jump on it.

this. So now is the time to jump on it.

A point to understand is that an agent is not a chatbot, despite the fact that they look really similar, right? Now,

the way I see chat bots is like a chat is just an interface, right? It's just

some specific thing with messages that go back and forth and then a little window down here where you can enter in your own information. The chat is just like the app. The agent is what lives inside of the app. If you guys are

familiar with crustaceians or crabs or um I don't know, like cute little things that crawl around on the ocean subfloor.

They often will have fine shells and then um discard them when they no longer fit their purpose. Right? So, like a crustation that uses the shell of an older animal, an agent is just currently using the interface of an older type of

knowledge tool, the chatbot. And I'm

sure over the course of the next few years, it's going to discard this and we're going to have new interfaces that are even better. Okay, so let me show you guys just the difference between chat bots and then a really low-level agentic workflow that I put together that functions through an agent. Um,

down over here is a chat GPT desktop app. This is really simple and easy. You

app. This is really simple and easy. You

can download it on chatbt's website.

Super straightforward. I'm just going to say um hey, how can I scrape, you know, leads from LinkedIn Sales Navigator. So,

when you're working with models like this, the input and output is pretty bounded, right? All you can really do is

bounded, right? All you can really do is you could just see what this model tells us. Hey, you know, here's the direct

us. Hey, you know, here's the direct high IQ zero fluff rundown. Use this,

scrape this, use this. This is cool, right? I mean,

use this. This is cool, right? I mean,

it's nice that we're getting information on how to do this. And you know a few years ago this would have been revolutionary. Rather than just have a

revolutionary. Rather than just have a conversation with the model and ask it how to do things which is knowledge. I

can actually force a model to action using agentic workflows. So in this case I'm saying scrape me 200 HVAC owners in the US. I want decision makers. It then

the US. I want decision makers. It then

checks to see if there are lead scraping directives and execution scripts. This

is just part of the framework that helps constrain the model's output which I've run you guys through a little bit more later. It's then going through and

later. It's then going through and actually pulling a script together to do this thing for me. It then comes up with this idea of a test scrape, 25 leads.

It's then going to verify some industry match, run the full scrape, upload to Google sheet, and then even go through and enrich it for me. In this case, the model is performing a search. It's then

comparing the results of the search with what it is that it thinks that I want.

It's determining that there's a very low match rate. And so, it's now adjusting

match rate. And so, it's now adjusting its filters on the fly completely on its own to find leads with zero input. All

I'm doing here is texting a friend of mine on my phone.

It's then verified, past threshold. Now

it's running a full scrape. It then went and it actually got us a Google sheet with all that information. I mean, it's pretty cool in so far that it's totally autonomous. It probably would have taken

autonomous. It probably would have taken me a fair amount of time to come up with the filters and so on and so forth myself. This thing just did it entirely

myself. This thing just did it entirely on its own. If you guys check the bottom right, we actually ended up getting almost 200 emails directly from this. We

also got a bunch of phone numbers and a bunch of other really personal information. So, what exactly is going

information. So, what exactly is going on? There are five steps that an agent

on? There are five steps that an agent will follow every single time you send or receive a message. The first is planning. The next is tools. The third

planning. The next is tools. The third

is memory. The fourth is reflection. And

the fifth is orchestration. I think I called it observation before. My bad on that. But orchestration. I use a simple

that. But orchestration. I use a simple fiveletter acronym for this. Just pt

mro. Helps me remember it. Hopefully

it'll help you remember it as well. Now

these five components are as follows.

Planning is where you break down objectives into executable steps. Tools

are the actions that an agent actually takes in the world. If you guys remember, it was calling various things to do what it needed to do. They then

stored things into memory. So this is how agents retain and recall information across tasks. There different forms of

across tasks. There different forms of memory. There's short-term, midterm,

memory. There's short-term, midterm, long-term, and there's different ways that that works within an agent these days. I'm I'm going to cover each of

days. I'm I'm going to cover each of them. Uh reflection is where the agent

them. Uh reflection is where the agent evaluates and corrects its own work. So,

as you saw there, we had an issue with one of the calls and it went through and it fixed the filter. And then finally, orchestration, which is where you coordinate multiple agents or complex workflows. We're going to talk about how

workflows. We're going to talk about how to do that um later on in the program, too. Obviously, there's planning, and

too. Obviously, there's planning, and that's mostly goal decomposition. So,

it's where a highle objective gets broken into subtasks. Um, for instance, if your highle task is to eat at White Castle, you know, it's not just eat at White Castle, right? That's not enough to go and actually do the thing. What

you want to do is you want to break that down into various tasks. Like maybe step one is we have to um, I don't know, get in the car, right? Step two is, and maybe you do this while you're in the

car, you do this before, you got to research the um, GPS location. You know,

the third is you have to drive all the way over there.

And then the fourth is you actually have to order. And the fifth is you have to

to order. And the fifth is you have to make a movie about it. Just kidding. But

um the point that I'm making is you know you take this high level task and you actually break it down. And that is occurring every single time within an agent. You don't always see it because

agent. You don't always see it because it's typically buried within reasoning and most people don't expose reasoning.

But this form of highle goal decomposition occurs all the time. And

it's important that it does it right because if it screws up at the planning stages, probability of it being able to move and do the rest of the task is very low because it's making a foundational misassion. Now, an agent will identify

misassion. Now, an agent will identify dependencies within steps. It'll then

sequence them logically, like I just gave you, five steps. Well, the agent will actually reverse those steps as necessary. And then good planning also

necessary. And then good planning also means revising the plan when things change because there's obviously only so much information that we have ahead of time. There are limitations to this and

time. There are limitations to this and Claude, GPT, Gemini, these have pretty imperfect planning capabilities. So, as

part of the building of the workflows that I'm going to show you later, I actually recommend doing a fair amount of the planning yourself. The reason why is because it's sort of um like an analogy where if I'm on I don't know

let's say the east coast of the United States and I want to go somewhere on the west coast of Africa or something like that. Okay, and I'm this ship over here

that. Okay, and I'm this ship over here and my goal is I want to make it to this port right over here. If I screw up at the very beginning, okay, even by a few percentage points, let's say, okay, and

I give myself a range of possible outcomes here, this range, even if it's like a 1% problem with the planning or 1% error or something like that, these ranges have massive downstream impacts

over the course of the entirety of the task. Like, if I'm really really bad, I

task. Like, if I'm really really bad, I could end up in the middle of freaking nowhere. Or if I'm really, really,

nowhere. Or if I'm really, really, really bad on this end, I could end up, you know, hundreds of kilometers, maybe thousands of kilometers away from where I wanted to go. So what planning really is if you think about it is effective

planning just reduces those error bars.

It just allows us to go a lot tighter and a lot narrower. So the probability of us actually achieving uh the thing we want aka going to where we want to go is a lot higher. If there was one place for you to exert your human intellect, it's

at the planning stage. And I'll cover some practical ways to do that later. Um

obviously there's DO which helps by providing structured directives. I'm

going to show you guys how you can just dump your company SOPs into a model to guide its planning. If you guys don't have company SOPs, I'm going to show you how to reproduce them really simply and easily. Next are tools. Now, these turn

easily. Next are tools. Now, these turn LLMs into systems that are capable of real world action. Um, I think I covered the caveman analogy, ancient people building a spear or something like that, but you can also think of it as like an ancient person building a house. It's

like they will build the house the first time and the house will be pretty cool, you know, might um have most the things that they want. I don't know, some sort of um straw roof or whatever. And then

what's really cool is agents can then go back to the tools and then make them better. So maybe, you know, you want to

better. So maybe, you know, you want to build a window or something like that.

So the first iteration of the house doesn't have a window. Second one has a window. The third one has like a door.

window. The third one has like a door.

The fourth one has like a cool barbed wire security system and so on and so forth. But just to break it down, tool

forth. But just to break it down, tool use is where agents interact with systems and services. In our case, because we are dealing mostly with digital services, that means things like calling APIs. Okay, that's a big chunk

calling APIs. Okay, that's a big chunk of tool use to be honest. Then executing

code. You don't need to know any of the code. It does the coding for you, but it

code. It does the coding for you, but it is still executing the code. It also

nowadays includes a lot of database stuff because you don't want to store all the information directly in the uh context of the model. Then it also means things like browsing the web. So if your computer was the entire world, right, in

your case, the tool that you personally use to interact with your computer, if you think about it, is use your mouse and use the keyboard. And some people are now using voice transcription tools like myself. So that is our input method

like myself. So that is our input method to our world of the computer, right?

Well, it's the same thing with agents.

Tools are their input methods to real life. They need tools in order to break

life. They need tools in order to break out of that little chatbot, okay, and actually influence things that matter.

So the entirety of the intelligence of models in the do directive orchestration execution framework in cloud skills in a bunch of these different ways of thinking about agentic workflows, the

entire point of the intelligence is just to help it use and then build tools. And

a good analogy is tools are like the agents hands. The LLM is the brain. If

agents hands. The LLM is the brain. If

you're a brain and you're in a vat or in a jar somewhere, obviously your ability to influence the real world is pretty limited, right? But you give a brain

limited, right? But you give a brain some wires and neurons and some hands or whatever and now it can actually start doing things. Unfortunately, right now

doing things. Unfortunately, right now tool quality varies a ton. There is a lot of variance in like really good and really crappy tools. And just a few months ago is actually way larger.

There's way more variance, but we're getting better. And I imagine future

getting better. And I imagine future tool systems are going to be mostly pretty solid. There's going to be a lot

pretty solid. There's going to be a lot less uh uh range between like a really good tool and a really bad tool.

Essentially, um, this is for a variety of reasons. MCP came out pretty

of reasons. MCP came out pretty recently, and there are also a lot of people trying to capitalize short-term on MCP, so they're building a lot of really crappy tools. I'll show you guys how to avoid that, and also how to select like really high quality tools

that matter, as well as how to build your own that are way better. The way I see bad tools is it's like if you give somebody a really crappy hammer and then you expect them to build you like a really nice uh cupboard or cabinet or something, probability is low, right? If

you want to build something really cool, you need to have cool tools. If you want to do something really cool, you obviously need to make sure those tools are as high quality as humanly possible.

So, here's one of the key insights of Agentic Workflows and one of the reasons why I think a lot of people don't understand how the stuff works. When you

standardize tools, okay, and you turn them from vague ideas into actual concrete functions. You let anybody use

concrete functions. You let anybody use them, regardless of the type of model that you're using, whether it's Claude or whether it's chat GBT or whether it's Gemini. All of these models are smart

Gemini. All of these models are smart enough to know how to use the tool. You

also ensure consistent inputs and outputs, which is really, really important for business. And the cool thing is you don't actually need to wait for other people to build them anymore.

All of these models are hyper optimized for programming. So, we're just going to

for programming. So, we're just going to let the model build its own tools. LLMs

are very probabilistic, right? Their

decision-m process is pretty opaque to us. I heard a great quote the other day,

us. I heard a great quote the other day, uh, might have been from Dario Amod, might have been from somebody else, but it was that AI models are grown. They're

not built. And I think about that pretty often. AI models are just intelligences

often. AI models are just intelligences that we are slowly figuring out how uh they work under the hood. We don't

actually know. We don't have an an established consistent decision-making process that takes us from one to wherever we want to go. Business

requires that you need interpretability.

You need the ability to audit things and so on and so forth. Okay? So rather than have this big probabilistic galaxy brain which makes decisions in routes in ways that we have no idea how, okay, we just

give it very very simple tools. And in

that way, even if there's some deviation, maybe it gets all kind of uh loopy over here, we know that it called a tool. And because it called a tool, we

a tool. And because it called a tool, we can obviously interpret that um a lot a lot easier, right? We have a sequence of steps like 1 2 3 4 5 6. We go through the process. It's just way more

the process. It's just way more straightforward. So, we just let an

straightforward. So, we just let an agent, which is optimized for coding, make its own tools. Then the agent will call the tools and then interact with life for us. I want to show you guys how easy it is to build your own tools. So

here I have a simple query. Hey, how

would you build a workflow that takes a video, cuts out the silences in said video, and stitches it all back together to deliver me the results. The cut

should look natural like most YouTube junk cuts. Basically just try and stitch

junk cuts. Basically just try and stitch the empty space together. You know, this is a pretty complicated flow if you think about it. There are a lot of different ways you could build something like this and none of them are basically easy. So, what this is going to do is

easy. So, what this is going to do is it's going to look for a couple of simple and easy ways to do this and then present them to me because I went down here and I selected plan mode, which is one of the different modes that you can use in um at least the Claude series of

models. Keep in mind depending on the

models. Keep in mind depending on the models that you're using may be a little bit different. So now once I have this

bit different. So now once I have this plan in front of me, I'm then going to be able to decide on how to do the workflow and then I could act as more or less a highle director letting this thing know whether or not I want to do something. Okay, next up it's asking me

something. Okay, next up it's asking me are we doing this on short clips, long clips, any preference on the defaults and so on and so forth. I say short

clips defaults sound fine. MP4 is great.

Okay, I then have the plan in front of me and if I wanted to build this, all I would need to do is click yes and auto accept. And I think I will. That seems

accept. And I think I will. That seems

pretty straightforward. So, let's give it a try. While this is working, I'm just going to see if I could find an example of a video that I could feed into this. Um, I've done this a couple

into this. Um, I've done this a couple of times previously as you guys could see. So, let me just find some really

see. So, let me just find some really simple video that's only a few seconds that we can test this on. Okay. And I

found an example here. It's just a short one minute video clip of me doing a typical intro.

Now that this thing is building, I'm just going to move this to bypass permissions mode. That'll just allow it

permissions mode. That'll just allow it to operate autonomously without me. And

once it's there, it's actually created it. That's great. As you guys can see,

it. That's great. As you guys can see, that only took us maybe like 30 seconds or so. From here, I actually want to

or so. From here, I actually want to test this. Let's test using

test this. Let's test using test_clipip.mpp4.

test_clipip.mpp4.

Now, I'm not actually expecting this to work the first time around because most workflows don't actually work the first time around. It's all a process of

time around. It's all a process of progressive iteration. Essentially, if

progressive iteration. Essentially, if the workflow doesn't work, the error message is fed back into the agent and then the agent will progressively build the agentic workflow using the u the error messages to sort of guide it in

the right direction.

In situations like this, I honestly just alt tab and then do something else.

Okay. And it actually looks like it did run through the entire test manually and was perfectly fine. That's crazy. What

I'm going to do now is I'm just going to watch the test, see how it is, and then we'll just continue to go back and forth a few times until I have what I want.

Oh, by the way, I don't even need to find this file. I could actually just say open it. Okay, so I'm noticing that the cuts are kind of abrupt. They're a

little bit too fast for me. Um, what I mean by that is like instead of cutting at the point that I wanted it to cut, it's just cutting like a few seconds before. Multiple different ways around

before. Multiple different ways around this. I could use a different approach

this. I could use a different approach to detect the cut points. I could have it manually move things over. I mean, if you think about it, like I could do whatever the heck I want here. Uh, this

thing's operating at the speed of thought. So, I'm just going to give it

thought. So, I'm just going to give it some very high level instructions here, and we'll see what it thinks. It's

giving me a bunch of different options here. One of them is voice activity

here. One of them is voice activity detection. I like this. Let's do this

detection. I like this. Let's do this one. Okay, it's now testing with this

one. Okay, it's now testing with this new approach.

All right, let's take a look at round two.

Okay, so it worked perfectly on the um one minute clip. So now I'm just going to run it on test three minutes.

Okay, and it's just finished and then opened the next clip. Let's just see how that does. There is a cut point right

that does. There is a cut point right here, I think. Let's see if that's good.

Cool. Nice. Looks like it did that cut.

That's cool. How about another one? H

I think it was right here.

Nice. It's solid.

Last one right here.

Cool. So, yeah, this one worked basically perfectly. Um the agentic

basically perfectly. Um the agentic workflow is for the most part now complete. So, you guys could see it took

complete. So, you guys could see it took one back and forth. I just in a very high level um realistic way gave it a list of what I wanted. I didn't really know what I wanted to be honest, just like I think most people that have probably done any sort of like software

engineering work know clients usually have no clue how to scope a project. So

you can sort of only take them at face value there. I went back and forth a

value there. I went back and forth a little bit. Um you know I was like okay

little bit. Um you know I was like okay this didn't work too well. Is there any other thing that we could do? It gave me some other thing. So I tried the other thing. Hopefully you guys could see that

thing. Hopefully you guys could see that this sort of loop is very straightforward and realistically only takes a few moments of your time. The

most important part I think of my entire day is now just providing some sort of highlevel nudge in one direction or another to a agents like this when designing my agenda workflows. Um, you

know, like if you just remove me from the loop completely, the resulting agent workflow is probably going to suck, at least for now. But, uh, I'm just here to steer the ship, right? It's almost like as if I don't know, it's like an old school Viking boat where people have to

like manually row, right? So, I'm just the person at the very front of the ship doing a little bit of steering. The

agents are the minions doing my rowing.

At this point, I'm briefly going to cover memory here. It's how agents maintain context. This isn't super

maintain context. This isn't super important to know for building, but it's important to know if you want to understand how these things work under the hood. So, short-term working memory

the hood. So, short-term working memory are basically reasoning tokens that are relevant to the current task. They're

stored temporarily. If you guys have ever seen like a little thinking window or a thinking tab with like a little thing that you could click to open inside, it'll be like the user wants to do this. The user is thinking about

do this. The user is thinking about doing this. This is your uh short-term

doing this. This is your uh short-term memory sort of uh analog and like the way that our human brains work. Sort of

your intermediate memory is your back and forth messages with the agent. So

it's like the actual like message chain that you are having. Those aren't

removed like reasoning tokens are. And

so this is just always stored and sent with every API call. Long-term memory

are things that persist across sessions.

So they're variables that are stored in claude chat GBT etc. On the right hand side here, I have that same message that I sent earlier as part of our demo where I scrape 200 HVAC owners. If I show you guys how all of this memory works in

context, basically this over here, okay, and then its replies are what are called intermediate messages. Anything inside

intermediate messages. Anything inside of this thinking tab is like your short-term, okay? And then long-term are

short-term, okay? And then long-term are like things that are stored within my file space. So they're things like, you

file space. So they're things like, you know, my agents MD. They're things like my Gmail accounts.json. They're things

like my token leftclick. If this all seems like magic to you right now, don't worry. You're going to get to the point

worry. You're going to get to the point you can actually understand and interpret everything within an integrated development environment by the end of the program. But I just wanted you guys to be on the same page here that this over here is like an intermediate piece of memory. It's going

to include all messages that are sent and received from you and the agent and then everything in between the reasoning loops and stuff for short-term whereas long-term tend to be files and then system prompts. Right now, one of the

system prompts. Right now, one of the primary failure modes in Agentic systems right now is because of um context. And

context, for those people that don't know, is just all of like the the letters and words and tokens that are being stored in a model at any given one point in time. Uh the way that agents manage context limitations right now is

they are summarizing previous steps to save on tokens by compressing the full history into key takeaways. If you think about it, like the way that I write and the way that the model writes isn't actually like super token efficient.

What it does is it makes a bunch of summaries of these constantly. So if you know this is my actual chat window if you think about it that's the message that the agent sent me and this is the message that I sent the agent this is the message that it sent me back and

blah blah blah what it'll do periodically just to save on the token cost is it'll actually just summarize it in as high density a form as humanly possible so we take maybe like a 500word

uh uh context and then chunk that down into like a a 100 or maybe a 50word context. It'll do so periodically

context. It'll do so periodically without losing you the core details just by rewriting it in various ways that are just a lot simpler. For instance, I could say hello, how are you doing? My

name is Nick Sarif. Or I could say, hi dash, how you do question mark, I'm Nick Sarif. And if you just like count up the

Sarif. And if you just like count up the total number of characters there, the latter one is obviously going to be a lot more efficient. They also don't store reasoning in the main loop. It

generated temporary and then it disappears. It does store intermediate

disappears. It does store intermediate results externally by offloading the databases, files, and other vector stores. And then it'll now load the

stores. And then it'll now load the relevant context on demand to only pull in what is needed for the current step.

Um, you know, you can build this in explicitly using something called a rag or retrieve augmented generation system, which I'll talk about later, or you can, uh, you know, just let the model do its own thing and it does a pretty good job of it. When we make it to reflection,

of it. When we make it to reflection, this is where the agent self-evaluates.

So that's where it examines its outputs to detect errors and then assess whether or not what it wanted to do actually worked. It identifies the approaches are

worked. It identifies the approaches are failing. it knows when to pivot and it

failing. it knows when to pivot and it just selforrects. This is really like

just selforrects. This is really like the intelligence of the model to be honest. Um, if you don't have this

honest. Um, if you don't have this reflection loop, you will just have a script like a typical Python script or like an nadn or make.com or zapier or gum loop or lindy automation that just breaks at the first hiccup. And this is

also really important in what's called self-annealing which I'm going to cover a little bit more of later. But it's

essentially the way that an agentic workflow can run and then also just heal itself as it encounters errors and so on. Finally, we have what is called the

on. Finally, we have what is called the orchestration or coordination layer. The

way that I think of it as if you just get all of these steps, right? So

planning, tool use, memory, then reflection. Okay, orchestration doesn't

reflection. Okay, orchestration doesn't exist within the loop. It sort of exists outside of it or maybe inside of it. And

then it's just responsible for shuttling the information around from step to step. And that's really cool, right? It

step. And that's really cool, right? It

looks at the results of the plan. It

then feeds that into the right tools. It

then enters what it needs to enter in memory. and then it looks at the results

memory. and then it looks at the results of the reflection and then changes the next loop of the planning and so on and so on and so forth infinitely. I think

of it as like the brain that combines all the components that we just talked about similar to how your brain combines inputs from like your ears and your eyes and your nose and your skin and your mouth and your memory and it just like

factors everything in and then this is what thinks and then ultimately comes up with decisions. Now there are a couple

with decisions. Now there are a couple of different approaches uh right now for orchestration. uh there's an approach

orchestration. uh there's an approach with crew AI right now that uses role-based team structures and so you know up at the top you have some sort of manager and then underneath you maybe

have like a a marketer and then you have like a software engineer and you know the manager exists above the marketer and the software engineer and the marketer has like you know some interns and so on and so forth the software

engineer has some juniors this is one way of doing it um and it's a way that you know crew AAI has done reasonably well with like the sort of framework role-based team structure I think It's kind of like an organization and I think

that's just looking at things like a human being would. I think they're actually just much more efficient ways to organize. So I don't personally do

to organize. So I don't personally do this with the directive orchestration um execution framework and then cloud skills. Instead, what we do is we

skills. Instead, what we do is we basically give AI access to um both highle instructions and then tools to have it execute. And then this AI over here, this is sort of like that

orchestrator that we were talking about before. It just looks the high level

before. It just looks the high level instructions, looks at the tools, matches up the two, does stuff, stores things into a memory, and then it just loops over and over and over in that PTML loop. Claude skills is kind of

PTML loop. Claude skills is kind of similar. It just um organizes the

similar. It just um organizes the instructions. If we visualize this for

instructions. If we visualize this for you guys, it basically just stores things into a folder. This folder

contains both the highle instructions and the specific tool use and any additional resources. And then the model

additional resources. And then the model now just accesses a folder instead of accessing you know two different folders. And really the point I'm trying

folders. And really the point I'm trying to make is no framework is perfect yet.

I imagine the real best framework in the future is just going to be a combination of all these. You know taking the best parts and leaving the crappiest parts.

Um but they are all improving rapidly as the space gets m more and more mature.

So my recommendation is we're not going for perfection here. We just want what works. And in my case um I use dough

works. And in my case um I use dough because you know I came up with it and then it's a big part of all the content that I'm producing now. So I mean this works reasonably well right now. Sure,

maybe there's another framework out there that'll get us from 97% accuracy to 98.5. I'll worry about that framework

to 98.5. I'll worry about that framework when it's here. For now, I'm going to do what I can with the 97. Okay, we're now talking text. This is the universal

talking text. This is the universal interface. When I want to talk to my

interface. When I want to talk to my model, I do so through text, right? When

I want to talk to my model and I don't know, I try and give it a call or something like you can do on claw on chatbt and stuff like that. What's

really occurring is I'm transcribing most of that into text. Now agents if you think about it are actually a step back in terms of our interfaces for now.

Back in the day and when I say back in the day I mean like you know very very recently um most people use these drag and drop no code tools right and these are actually really pretty and they're very easily interpretable and you can

see how the data flows and so now we basically said no screw that we just want a bunch of words on a screen right which obviously has a bunch of issues in terms of presentation our ability to visualize them and understand them.

Right now we are taking a step back in terms of the interface. It's sort of like back in like the 70s, 80s and 90s when most people coded and then built things on computers through DOSs or

Linux terminals, right? It was like text in you get results out. That's it.

Everything is just like some sort of terminal or prompt. And in this way, I think it can be really intimidating for people because they just see a bunch of text and they're like, "Oh, I'm not a programmer. Oh, I'm not like a, you

programmer. Oh, I'm not like a, you know, I don't learn through reading and writing. I learn through seeing." And I

writing. I learn through seeing." And I think that's fair and it's a totally okay criticism to make with these things right now. I imagine future systems are

right now. I imagine future systems are going to go back to a visual interface.

It's just we don't have them yet. And as

I mentioned earlier, my whole goal is just make do with what we can at the moment. I imagine over the course of the

moment. I imagine over the course of the next couple years, somebody's going to build the most amazing visual interface probably in conjunction with one of these agents or agent agentic workflow builders and then we'll have something that combines the best of both worlds,

natural language and visualization. But

right now we use some tools. And those

tools as of the time of this recording are cursor, VS code, and anti-gravity.

And that's where most agent interaction happens today. That is the textheavy

happens today. That is the textheavy interface that you guys saw earlier as part of the demo where I just talk to the model through a chat box and see it update files and stuff like that. On the

lefth hand side, I have some recommendations to make things feel a little bit more natural. I personally

use speech to text tools like um Whisper Flow and Aqua. These are really simple, straightforward transcription tools.

They allow you to feel like you're talking to an employee more than you are necessarily writing text or typing at your computer. I'm going to show you

your computer. I'm going to show you guys a bunch of practical examples of me using this. But for now, let me give you

using this. But for now, let me give you guys a demo. On the left hand side here, I'm just talking to my model. I

basically converted a workspace from the directive orchestration execution framework to the cloud skill framework.

And you guys are going to see both of those later. But for now, I just want to

those later. But for now, I just want to ask it how things are going and you know, if you can tell me something about it. So, I'm just going to hold down a

it. So, I'm just going to hold down a key on my computer. Fn. Hey, can you tell me a little bit about the changes that we just made? I let go and then I press enter and now I'm basically talking to my model. Of course, I still

have to press the enter key. Future

iterations of this will probably change that, but in this way, I'm maximizing the bandwidth. Human beings can speak a

the bandwidth. Human beings can speak a lot faster than they can type, but they can also read a lot faster than they can listen. So, this is typically how you

listen. So, this is typically how you optimize both of those. All right, so what I have here are five cloud code instances. I'm running the latest model

instances. I'm running the latest model of Opus, Opus 4.5, at least as of the time of this recording. You guys may have some later versions, but just to show you as the variability of model outputs, I've set all these to plan mode. And what plan mode essentially

mode. And what plan mode essentially means to make a long story short is they just don't they can't take actions without my express or explicit approval.

They write a plan for me first, then I verify the plan. And so, just to show you guys how different um various forms of these plans are, I'm going to open up five tabs. I'm then going to um open up

five tabs. I'm then going to um open up the reasoning and kind of thinking panels here. Then we're just going to

panels here. Then we're just going to evaluate how different all of these answers are to the same simple question.

What are some ways to send automated proposals? So I sent that to all five.

proposals? So I sent that to all five.

And you'll see that as we proceed through here, there are a variety of different routes that these models follow. After this does its research and

follow. After this does its research and and plans, you end up with five answers.

And you'll notice that um all five of these answers are different, meaning that there is no like procedural simple step-by-step result here. the

models are doing different things every single time. This first one here says,

single time. This first one here says, "What type of proposal?" So, it's asking me some questions. The second one here actually just went through and then wrote me a big list of different options I could take. This third one here wrote me sort of a combination, ask me some

questions. And then it's giving me some

questions. And then it's giving me some common automation triggers alongside some more questions. This one here gives me these four options. And then this one here gives me like a little table. And

this is okay. I mean, obviously I'm arriving at like the same sort of answer regardless, but I want you guys to know that like the way that businesses work is, you know, when somebody does something like they fill out a form or

they require an invoice sent or something of that nature. This level of variability in and of itself is way too much. There's no way that we could

much. There's no way that we could really like meaningfully add value to a business, whether it's our own business or some other business with variability like this, with like 30 40 50% variance

in answers. What we need is when we

in answers. What we need is when we generate an invoice, the invoice needs to be basically the same every time.

When we generate a receipt, the receipt needs to be the same every time. When we

send an email, maybe an onboarding thing or whatever, these should be the same every time. When a new form comes into

every time. When a new form comes into our system and we need to qualify them, we should use the exact same qualification framework every time. Any

serious company at scale that has this level of variability in their processes won't be a serious company for long.

which is why raw large language models are very difficult to use in u both mid-market and enterprise style applications. Now the reason for this is

applications. Now the reason for this is because LLMs are probabilistic not deterministic. I touched on this earlier

deterministic. I touched on this earlier on in the course but let me run you through how a large language model actually works under the hood. So a

while back I actually built a large language model. Well I guess kind of a

language model. Well I guess kind of a small language model. this guy Andre Cararpathy, he um built this big uh like GitHub repo showing people how to like train their own textbased mini GPT. I

went through this whole thing and then I built my own mini GPT and it was really instructive and I've since learned a lot more about large language models and sort of what's going on under the hood.

So let me just give you guys a very brief demonstration. If you guys

brief demonstration. If you guys understand this, you guys will go a lot further towards getting how these agents are working under the hood. What large

language models are are they are basically machines and they are machines that operate off of a distribution of outcomes. What I mean by this is they

outcomes. What I mean by this is they are statistics sort of pattern matchers.

What a lot of people think is that large language models will predict the single best next word but they don't do that.

Instead they predict a statistical distribution of options that they could pick from. What I mean is if I say hi,

pick from. What I mean is if I say hi, how are and then I have a little space and if you feed this into a model, what you may think you're going to get is

you're going to get the most likely next token, right? Which is sort of like

token, right? Which is sort of like universe A. You think you'll just get

universe A. You think you'll just get the word you and then maybe a question mark. But what you actually get is you

mark. But what you actually get is you get a whole graph of different outcomes and possible words that you could choose from. This one

might be you. This one might be the word things, right? How are things?

This one here might be your, for instance. And what happens is we use

instance. And what happens is we use this concept of temperature and top P to basically randomize the process of

choosing the next token. And so while U may statistically be the most likely next token, maybe U has like a 98% confidence score or something, despite the fact that U is the most likely next

token, we're not always going to pick you. What we're going to do is we're

you. What we're going to do is we're going to have some cutoff, which is sort of like this um top P. And then we're going to pick from one of these three or four options. And we're going to do so

four options. And we're going to do so with a level of what's called stochasticity or randomness. That means

that you can't actually predict what the large language model is going to do every time. Now, this isn't a bad thing.

every time. Now, this isn't a bad thing.

This is actually a good thing because think about it. If we could predict what every large language model was going to do, there would be no reason to have a large language model. If you just trained things and always outputed the exact same thing every time, there would be no way for the model to reason

flexibly about things. It would

essentially just be a giant series of dominoes that just, you know, knock over one to the other. Those are some really crappy looking dominoes to the other to the other. And then, you know, we'd be

the other. And then, you know, we'd be able to predict everything that's going on. Anyway, models um randomness and

on. Anyway, models um randomness and stochasticity is actually a big chunk of how they are capable of solving problems and reasoning for us. But what I'm trying to say is there's a level of randomness added to every step of the

process. Right? So the first thing is

process. Right? So the first thing is they predict a distribution of options.

What that means is there is some randomness. There is some statistical uh

randomness. There is some statistical uh error here or or inaccuracy. Next, we

can set the temperature and top P. These

are settings that you'll find in parameters for most large language models nowadays. Those settings also

models nowadays. Those settings also introduce some randomness to the process. You now have um architectures

process. You now have um architectures like the mixture of experts architecture which is basically where they don't just have one large language model do this.

They test this simultaneously across four or five large language models and then they pick the most commonly voted task. Believe it or not this introduces

task. Believe it or not this introduces some additional variance. Then even at temperature zero tiny input variations can produce wildly different outputs because of randomness. Obviously there

is um sort of like probabilities here at every step. Now in math these are

every step. Now in math these are basically called compound probabilities.

And I don't mean to make this a math thing, but if you're working with AI, you might as well um learn at least a little bit of the math underneath it because it'll help you understand how all these things work. Essentially,

these compound probabilities make it very unlikely that you'll be able to achieve the exact same outcome every time on the large language models own.

And so what happens is you have these error rates that compound catastrophically. I'll give you a quick

catastrophically. I'll give you a quick example. Let's say you have five steps

example. Let's say you have five steps in a process. You want the large language model to, I don't know, go out into your email inbox, pick the best email, then you want it to summarize that email, then you want to feed that

summary into some other model, then you want that other model to take that summary and then combine it with a bunch of other summaries to give you a big digest of the day. So if you have five

steps and each of them are 90% successful, the way that math works really is although every individual step may be 90% successful, if you math it

out and actually multiply out 90% success for step one time 90% success for step two times 90% success for step three times 90% success for step four

times 90% success for step five, you end up not with a 90% success rate across the entire process. you end up with a 59% success rate across the entire process. Essentially what occurs is

process. Essentially what occurs is although the first step might be 90%.

The second step when multiply makes it 081 and then you have 64 or 74 or 63 and so on and so forth until eventually your actual total error rate is significantly

higher. Your success rate on the other

higher. Your success rate on the other hand is significantly lower. And so when you add more and more steps to this process, you know, if you get to 10, it's 35% success rate. If you're at 20,

it's 12% success rate. This applies even if models are 95% successful at specific tasks. What ends up happening is

tasks. What ends up happening is basically at every step of the task. A

good way to consider it is the total range and outcomes gets bigger and bigger and bigger and bigger. There are

super successful outcomes, sort of quasy successful outcomes. They're not

successful outcomes. They're not successful outcomes and they're like catastrophic outcomes, right? And this

range in business is nowhere near tight enough for most companies to trust systems like this. Now, because most business workflows are multi-step and because people have typically tried doing things like this with dumber,

simpler models with no frameworks, you know, most raw LLMs are actually just not usable in business, aside from copy paste outputs, which is why people tend to do that. Just as an aside, imagine if you were a business that made $100,000 a

month and you sent a wrong invoice 5% of the time. What sort of impact do you

the time. What sort of impact do you think you that would have to your business? Do you think that would have a

business? Do you think that would have a 5% impact to your business? No, that

would have like a 95% impact on your business. If I'm one of your clients and

business. If I'm one of your clients and you send me the wrong invoice even one out of 20 times, I don't think I'm going to work with you the 21st time. So, the

root cause here is we're asking probabilistic systems to do deterministic work. Probabilistic is

deterministic work. Probabilistic is that big sort of uninterpretable thought process that cloud that I showed you guys earlier. Whereas deterministic

is what businesses use where you have one step going into the second step going into the third step going into the fourth step and so on and so on and so on and so forth. This over here is what

business is and the best businesses, you know, productize and standardize everything. And then this over here um

everything. And then this over here um operates in the realm of probabilities which ultimately we can't use. What is

the solution here? Well, it's not necessarily just making LLM smarter.

Although keep in mind, the smarter the models get typically the less error and variance they do have. That's great. But

the actual solution is we don't have to wait for model intelligence to get smart in an unspecified amount of time. We

just build a framework around those models that turns these really rickety outputs into something that we could still use anyway despite the fact that there's variability in the process. We

give them defined nodes and steps between each important thing that we want. And in that way, because we're

want. And in that way, because we're shortening the total gap, models are capable of performing economically valuable work. So what we're going to do

valuable work. So what we're going to do is wrap this super galaxy brain intelligence in a framework. And this

framework is going to allow us to control it for beneficial purposes for ultimately business ends. Okay. So how

do you actually do that? Well, this is now where you get into DO or the directive orchestration and execution framework. What we do is we separate

framework. What we do is we separate concerns. Directives up at the very top

concerns. Directives up at the very top provide very clear unambiguous instructions to the system. These are

documents which if you guys remember were sort of the first rung on that knowledge ladder. Orchestration, if you

knowledge ladder. Orchestration, if you think about the PTMRO loop, is where the large language model does its thing. It

chooses what to do and in what order.

And then execution scripts are the actual heavy lifting. And we don't do that with the model itself. What we do that are with little snippets of code that the model has built, then test, and

then retested over and over and over again. Okay? I typically do this in

again. Okay? I typically do this in Python right now, but I want you guys to know you can do this with whatever programming language you want. The

models tend to be pretty good at I want to say most of them equally. The reason

why this works so well is because of this concept of separation of concerns.

Essentially, anything that is deterministic aka something that like a business would use. So maybe an API call, some sort of data transformation, some sort of file ops actually go into

code. Code is always the same every

code. Code is always the same every single time. If you give it input A,

single time. If you give it input A, it'll always give you output B. There's

never any variability unless you specifically program that in. So, it's

really, really interpretable. It's very,

very clear how it works. And you never really need to wonder, hm, is that doing what I wanted it to do? Because it's

only going to do what you told it to do.

And then what we do is we leverage the really flexible, cool parts of AI to make judgments, to make routing decisions, and so on and so forth. Code

is really reliable. It's also super fast and precise. LLMs are flexible,

and precise. LLMs are flexible, adaptive, and then also handle ambiguity really well. So, what we're doing is

really well. So, what we're doing is we're combining the best of both parts.

We combine AI's incredible ability to route and be flexible and so on and so forth with deterministic code's extraordinarily ability to run really quickly, really precisely, and really,

really repeatably. When you do this, you

really repeatably. When you do this, you get the best of both worlds, and you can make a ton of money with it. That's how

Agentic workflows work in a nutshell.

What's interesting is you probably would not have understood any of this had you not watched the last hour to hour and a half of content all about the basis and the foundations. Some other reasons LLMs

the foundations. Some other reasons LLMs are really really bad at basic operations. When I say basic operations,

operations. When I say basic operations, I mean math. Up until quite recently, um LLM couldn't even count the number of letters in a word. That's something that you could build a Python script to do in like 0.1 seconds. You know, if you have

a big list of numbers or something, you use LLM to sort those numbers. It's kind

of like hiring a PhD intelligence to count some inventory. It's just not the best cost basis on your end. You're

going to spend way too much money and get way too little of a result. Hence

why we pushed the deterministic tasks to scripts and then reserve the LLM processing with the tokens for actual thinking. Also makes everything cheaper.

thinking. Also makes everything cheaper.

Just for the purposes of demonstration, if I gave an LLM a really simple task and I said, "Hey, I have all of these um letters, okay, and they're all arranged,

you know, in this list." And let's say this list hypothetically isn't just, you know, six letters long. It's like a 100 thousand or 10,000 items long or something. It's just like really really

something. It's just like really really long. Okay, so just pretend that I put

long. Okay, so just pretend that I put this thing together and I give it to an LM. If I had the large language model

LM. If I had the large language model sort this thing, it would have to run billions upon billions upon billions of mathematical operations to sort this list. If I gave this to a Python script,

list. If I gave this to a Python script, it could literally do this entire thing in one function call. I could probably do it in like 5 seconds on my own, not even with a large language model. And it

would take milliseconds. If you look at the actual mathematical time and then the resource usage when you use uh deterministic scripts to do things like this, these mathematical simple operations like sort a big list, you

could do it 10,000 to 100,000 times faster with deterministic code. And then

it's also for the most part free because it's operating on your CPU or extraordinarily low cost cuz it's operating on some cloud CPU or GPU um that's very very uh affordable. This

gets more and more and more difficult the more you do. Instead of having the large language model do math for us, what we do is we build a calculator tool and then we say, "Hey, can you call the calculator tool to do the math for us?"

In this way, obviously, we're maximizing the best of all possible worlds. So now

I want to show you the difference between using a large language model's native intelligence to do something that I think most would consider very simple, which is just sorting a list, and then using a Python script to do it instead.

And I'm showing you this because there are so many advantages to using procedural deterministic tools like Python scripts. It's hard for me to know

Python scripts. It's hard for me to know where to begin, but I just wanted to give this to you guys sort of as a representative example. So, what I've

representative example. So, what I've done up here is I've just had AI or an agent assist me with the creation of a brief demo list that I'm going to sort.

The first thing I'm going to do is I'm going to tell it to sort the list on its own. Sort the list using only your

own. Sort the list using only your native LLM intelligence. Do not make use of any tools. Time yourself and at the end, let me know how long it took.

What I'm going to do now is let it run.

And you'll see that when its native LLM intelligence does the sorting, it takes significantly longer in order to do so.

We can see the time that it's taking by expanding this reasoning tab.

Scroll all the way down here. You can

see it's actually manually outputting every token. Here we go. And now it's

every token. Here we go. And now it's actually gone through and sorted the list alphabetically by name. Okay.

Anyway, it told us it didn't have its own internal clock or whatever, but realistically, as you guys could see and probably timestamped the video, this took what, 30 seconds or something like that from start to finish. Now, I want you to see how quickly it is when we

just run a script to do it instead. Now,

run the script.

So, what it's going to do is instead it's just going to call said script, then it'll immediately sort this with significantly higher levels of accuracy on the right hand side. Now, I should note that the amount of time it took me

to call the large language model and actually have it do the thing, that's a bunch of latency here that we're not actually taking into account.

Realistically, this took 53 milliseconds. The LLM, I mean, it's

milliseconds. The LLM, I mean, it's saying 3 to 5 seconds, but as you can tell, it doesn't really understand its own internal processing. So, it's closer to, you know, 15 to 30. That is um several hundred times faster. And not

only is it several hundred times faster, a point that I'm going to make repeatedly throughout this course is also several hundred times freer because running a Python script to sort of list on your own CPU or even on cloud CPU

when we get into uh posting web hooks and actually hosting these things on servers that aren't ours is like is essentially free. I mean it's it's

essentially free. I mean it's it's occurring in the space of I don't know a neuron in your brain firing. This

thing's doing a whole whole buttload of work. And you can see even down here it

work. And you can see even down here it said this is the core argument for pushing deterministic work into tools.

The LLM handles decision-making whereas the script handles execution. That's a

major part of how we are going to be talking about how to use these and build these agentic workflows later on. So in

a nutshell, my whole point is reserve your large language model calls for judgment. Let code handle the rest. By

judgment. Let code handle the rest. By

doing so, things will be significantly faster, things will be significantly more reliable and things will also be significantly cheaper. This is where the

significantly cheaper. This is where the DO directive orchestration execution framework comes into play and it's how we're going to be building out the rest of the workflows in this course. Let's

talk a little bit more about how to actually do this. Now, okay, so unsurprisingly, right now everything to do with the Gentic Workflows happens in what's called an IDE. If you guys are

unfamiliar with IDE, that stands for integrated development environment. Now,

idees look like this, and you've seen them already multiple times throughout this course. What they are is they are

this course. What they are is they are basically programming environments. Now,

agentic workflows are not idees. To be

clear here, this is just a way that we're communicating with them. If you

guys remember way back in the beginning of this course, I talked about how chats were sort of like an interface and then agents were like things that lived inside of the interface almost the way

that a crustation has shells and it can change shells at will. Well, right now, because programmers usually build stuff and because agentic workflows are composed of the same thing that programmers used to build, we just

happen to do them in an IDE. But I want you to know that this is most likely to change. Now, I don't like IDEIDes

change. Now, I don't like IDEIDes because they just are really overly technical for a lot of newbies, people that don't understand this stuff, and they look at it and they look at all the lines on the page and all the different partitions and sections and then they go, "Holy crap, Nick. This is way too

complicated. I'm not a technical person.

complicated. I'm not a technical person.

I don't want to deal with it." But what I want to do in this course is I want to avail you of the notion that you have to be technical in order to understand what's going on. What this is is this is just the same thing as like a bunch of instrumentation panels on a car or

something. You know, the very first time

something. You know, the very first time you step into a car, you don't know how the odometer works. You don't have any idea what the gear shift is, how the radio works, and all that stuff. This is

the exact same thing. I'm currently

taking my pilot's license right now, and let me tell you, the damn instrumentation panels on even the oldest and and cheapest of aircraft are sort of the way that I imagine IDs are to people that have never touched these things. So I entirely empathize with you

things. So I entirely empathize with you and I'm going to walk you through it all in a moment. So as mentioned IDE stands for integrated development environment.

I think of it as basically Microsoft Word just for code instead of you know natural text documents. They're composed

of workspaces and this is the same language that basically any IDE will use where you basically just write organize run and then manage everything in one place. And it's important for me to note

place. And it's important for me to note like how this works in a historical basis cuz otherwise you'll be like why the hell did we choose this? Well, the

reason why is because back in the day, we actually used to have like five or six different tools. Uh, programmers

would use tool number one to like write their code. Then they'd use tool number

their code. Then they'd use tool number two to test their code. Then they jump over into tool number three to, I don't know, run their code, tool number four

to host their code, tool number five to commit their code into a a repository so they could save it, and tool number six to do something else. And so there was just so much switching going on, right?

We had to jump from tool number one to tool number two, whatever. And then

somebody was just like, "Wait a second.

Why don't we just combine all of these into one unified tool? Sure, the

interface will probably be an absolute cluster, but you know, this is more than enough and it'll probably simplify and and alleviate some of the context switching." And that's basically what

switching." And that's basically what happened here. We basically just stuck

happened here. We basically just stuck them all into this one tool. And this

tool is really like 20 or 30 tools simultaneously, which is why it looks so complicated. Now, over the course of

complicated. Now, over the course of just the last year or so, ids have gotten way smarter. And I mean smarter here as in like AI. So, in the last year, basically every IDE has added some

form of AI chat capability. Old school

ones like VS Code, and I'm going to cover what all these are in a minute, added built-in AI assistance quite recently. And then newer tools like

recently. And then newer tools like anti-gravity, big one that Google just released, are now less like coding workspace, and they've just eliminated and streamlined most of the UX. So, it's

almost all just like AI based agent stuff. Basically, the line between

stuff. Basically, the line between writing code and then just directing AI to do it all for you through natural language is blurring really quickly. And

that's um one of the motivations behind our course actually. So this over here is VS Codes logo. This over here is um anti-gravities. And this over here is

anti-gravities. And this over here is cursor. These are three relatively

cursor. These are three relatively popular tools that I'm going to touch on in a little bit more detail. And then

I'm actually going to walk through VS Code and anti-gravity just so you guys could see how all this stuff really plays out. In a nutshell, if you guys

plays out. In a nutshell, if you guys are going to be comfortable with agents, you need to be comfortable in an IDE.

That's just the whole goal of today's module. So three areas of your IDE.

module. So three areas of your IDE.

There's a file explorer on the left.

There's an editor panel in the center and then there's an agent chat panel on the right. Let's cover all of them in

the right. Let's cover all of them in detail. On the lefth hand side, we have

detail. On the lefth hand side, we have the file explorer. The file explorer almost always looks something like this.

All this is is it's just another way that you guys can explore files. Just

like on a Mac or a PC, you have the native file explorer. Here, your files are just arranged vertically as follows.

This little tab just means that this is a folder. And if you click on one of

a folder. And if you click on one of these, obviously, this will open and expand. and then you'll be able to see

expand. and then you'll be able to see all the files within. So just as like a sanity test, this um first kind of line here, this first folder is period cla and there are a bunch of other files

inside of period claude. Same thing

here. Period dev container period prompts period tmp period venv. You

might be wondering, Nick, what the hell do any of these things mean? I'll be

honest, I have AI do most of that. I

don't even know, nor do I really care.

The whole job of coding is not the point of gentic workflow building. All I'm

doing is I'm just giving highle instructions and I have the AI deal with the how. Next up, we have a directives

the how. Next up, we have a directives folder as you guys see here, an execution folder as we guys see here. Uh

I also have a folder called for_youtube in my workspace. This is where I store things like this course node modules prompts trigger, right? What you'll

notice is eventually we run out of folders, these little things with the tabs, and then everything else is just a file. So I have this file here, this

file. So I have this file here, this file here, this file here. We we got a ton of files in the workspace. But

hopefully now you guys have like looked at it and squinted hard enough at it that you guys at least understand that there's nothing magical going on here.

This is just a file explorer. So just

like with any other file explorer, you can create files, you can rename files, you can delete files, and you can organize everything you want from here.

For Aentic work, at least in our case, the DO framework. This is also where the directives and executions folders live.

As we saw earlier, I had the directives folder here and then the execution folder. I'm going to dive into those and

folder. I'm going to dive into those and actually show you what these look like in a moment. And really just the way to think about this whole thing is as a filing cabinet. Okay, that does not look

filing cabinet. Okay, that does not look like an F, but we're going to roll with it regardless. This is just your filing

it regardless. This is just your filing cabinet for your agent. And so that is how I want you to think about this moving forward. In the middle of the

moving forward. In the middle of the page, you have the editor panel. Now,

this is typically in the center, although some idees will vary. That's

okay. I'll cover two instances today.

When you click on a file, this is where they open. And so for instance, as we

they open. And so for instance, as we see here in this middle panel, I have a file open called capitalized agents.mmd.

Now we get into system prompts and how to actually control these u models through long-term context later on. But

this is basically just like a file that you will add to any workspace and it'll just be injected at the very top of your agent. So the agent will just always see

agent. So the agent will just always see this in its context 24/7. And in my case, what I do is I just give it some highle instructions describing my framework. Hey, you operate within a

framework. Hey, you operate within a three-layer architecture that separates concerns to maximize reliability because of the same things that I just taught to you. LLMs are probabilistic. Most

you. LLMs are probabilistic. Most

business logic deterministic so on and so on and so forth. Okay? So, we'll

cover this file later, but for now, I just want you to know that you can actually open multiple files and tabs just like a browser. You guys see here how this is sort of like a tab. Well,

you can actually have multiple other ones open, too. I could have, you know, another file here, and then another file here, and another file here. You'll

notice that some of these letters are different colors. You see how this one's

different colors. You see how this one's blue and then this uh little um you know right arrow is green and then this text is white and then this is uh sort of orangey. Well, the reason why is just

orangey. Well, the reason why is just because um this this is a natural language file. This is markdown it's

language file. This is markdown it's called which is a specific format. But

like when you're dealing with code like Python and JavaScript and Node and so on and so forth, there's just so many different types of text that coloring it just makes it a little bit easier on the eyes and you can just tell what's going

on faster. So in the case of markdown,

on faster. So in the case of markdown, which is the format that my natural language or almost plain text files are in, um if something is in blue, it's a header. So you know that this is like a

header. So you know that this is like a header of some kind, right? Same thing

over here, right? This is a header or it's like bolded, right? So that's what that is. If something is in orange, you

that is. If something is in orange, you know it's written in like code format.

So anytime you write something in code format, it's done with these little back texts. Something is in white, odds are

texts. Something is in white, odds are it's just like normal text. Something's

in green, it's like a comment or something like that, right? This depends

on the format. Typically, we only use two or three formats in Aentic Workflow.

So, you're just going to figure this out really quick. Nor does it really matter

really quick. Nor does it really matter to be honest because you you never actually read files. And that actually takes me to a great point. Um, you can look at files in the editor panel, but you'd almost never actually manually

edit them. My rule of thumb is if I'm

edit them. My rule of thumb is if I'm manually editing a file, I am doing something horrifically wrong because there's no real reason why I should be manually editing a file. I just

communicate with my agent and then it does it for me. Even if I want to change a specific file, I won't go into that file. I'll just say hey change specific

file. I'll just say hey change specific file to do this and then typically I'll just give it a oneline description of what I want it to do and it'll go through and it'll do it in the most efficient way. In this way I'm almost

efficient way. In this way I'm almost like the CEO of my own company. I mean I am the CEO of my own company but I am like the CEO of my own agent company. I

just give very highle instructions and then it's the agent that interprets those highle instructions and does things. So that's two out of the three

things. So that's two out of the three sections. The third is the agent chat

sections. The third is the agent chat panel that exists all the way on the right. So the agent chat panel is

right. So the agent chat panel is hopefully very familiar to you guys.

Same sort of thing as just any chat over the last four or five years. In this

case, I just said, "Hey, what's up?" It

then read through agents.mmd. As I told you, it always reads through this at the very beginning of every run. And then it says, "Hey, not much. Just ready to help. What are you working on?" So, this

help. What are you working on?" So, this is your primary interface. This is

really where you're going to live. And

uh it's such a primary interface that the modern idees like anti-gravity and stuff have basically done away with everything else except for this. And you

just talk to this all day. So, you'll

type your instructions here. Agent will

respond. You can even see the thinking tab over here with the reasoning processes is deciding what actions to take. That's really cool for

take. That's really cool for interpretability reasons. And it's also

interpretability reasons. And it's also just one of my favorite things to watch because you're seeing the AI's internal monologue. It's also good and and useful

monologue. It's also good and and useful when you're building aic workflows, which obviously we're going to cover uh quite shortly so that you could stop it if it makes some mistake. Um you could see where maybe an error is, do your

debugging and so on and so forth.

Finally, just an obligatory section on code. I know code is really intimidating

code. I know code is really intimidating for a lot of people. I want you to know that all scripts are is they're just text written in a hyperspecific way.

This over here is what's called Python.

Do I know what's going on over here? I

mean, yeah. I've done some coding in Python, so I can look at this. I can

kind of interpret it, but I I I can't do so very quickly, and I don't know what's going on for the most part. You don't

actually need to have any clue what's going on in the code these days in order to do really powerful, effective things with them because, as I mentioned earlier, AI is just a way better coder than you. So, if you find yourself

than you. So, if you find yourself opening coding scripts and stuff, you're probably doing something wrong. I never

actually have a page open like this because it just means no difference to me. Now, if you do find yourself opening

me. Now, if you do find yourself opening this for whatever reason, I want you to know that a Python script or whatever language you're using, Python's just one of the many. It's just a set of instructions for the computer to follow.

It's the same sort of thing as like the the the bullet points that I was showing you guys at the beginning of the course where I was describing an instantly auto reply bot. This is just a set of

reply bot. This is just a set of instructions written in a way that this computer understands, but it's literally just text sitting in a file. It doesn't

do anything on its own. What you have to do in order to turn this into some sort of function, turn this into some sort of execution script, is you have to run it.

And that just means telling the computer to run the instructions. And typically

the way you do this is you do this through the terminal yourself. You'd

find the file, you'd see it's called Python script. py. Then you'd actually

Python script. py. Then you'd actually go into the terminal and very intimidatingly, you know, if you even script one character, it's not going to work. You actually have to type all that

work. You actually have to type all that yourself. Well, guess what? you no

yourself. Well, guess what? you no

longer have to do that. The agent just does all the coding for you and then it also runs the code for you. That's what

makes it such a powerful um orchestrator and that's why I live entirely in the editor. Agents just run all the code. I

editor. Agents just run all the code. I

just say, "Hey, run my Upwork scraper."

Do I have to know the format to to execute it? No, I don't. What I do is I

execute it? No, I don't. What I do is I just say, "Do the thing I want." It'll

then do some thinking. It'll find the specific file that I'm referencing and then it'll go and it'll run it. And so

now this is actually running. It handles

the entire execution loop autonomously.

That's the whole point of agentic workflows. So don't worry about being

workflows. So don't worry about being hyper precise. If you spend too much

hyper precise. If you spend too much time being hyper precise, you're kind of wasting it because models, as I mentioned, are just millions of times faster than us. They think just extraordinarily quickly. This is really

extraordinarily quickly. This is really just the domain of the model.

Communicate with it almost like you'd be communicating with an employee or staff member. Obviously, you wouldn't say,

member. Obviously, you wouldn't say, "Hey, Pete, run the Upwork scraper. Give

me the results. Uh, post it to Slack and then give me the Google sheet URL. Hey,

could you send Sandy an email about X, Y, and Z? Use the email template. Just

speak to it like you'd speak with an employee. Don't speak with it like you'd

employee. Don't speak with it like you'd speak with a programmer, and you're going to do a lot better. When you do this, your IDE becomes essentially a visual chatbot where you can just watch the agent work 24/7. And that's where

things get really cool and really powerful. So, back in the day when we

powerful. So, back in the day when we didn't have agents, we had to create a lot of this stuff manually. What I have open here on the right is the terminal.

And the terminal is essentially the command line interface way that you would communicate with your computer in order to get valuable knowledge work done. Usually programming work. And so

done. Usually programming work. And so

before you know I couldn't just say hey write me a script that does XYZ. Why? It

would say command not found. This only

works in the context of specific commands. You know instead I would have

commands. You know instead I would have to use Python 3 for instance. I'd

actually have to open it up and then I'd have to, I don't know, create a function. So, let's just do x= 5, y =

function. So, let's just do x= 5, y = 10, um, x + y equals what? 15. As I'm

sure you guys could tell, this is pretty laborious. And obviously, this is like a

laborious. And obviously, this is like a highly specialized domain of knowledge that you have to learn in order to be able to communicate with things in this way. Well, if I clear all that out of

way. Well, if I clear all that out of the way, with our previous example, we had um a list, right? That list looked kind of like this. It was a big list and

items with water filter, compass watch, matches, so on and so forth. So back in the day, if I wanted to build a script to do this, I needed a tremendous amount of domain specific knowledge to be able

to put together scripts like this. What

this does here is this. This actually

sorts the list. It's Python 3 C import JSON, D equals JSON.load, open

item.json, D items equals sort key equals lambda. I mean, this is like this

equals lambda. I mean, this is like this is a whole another language you have to learn. You know, it's like me trying to

learn. You know, it's like me trying to write an essay in Portuguese or something. You know, the amount of time

something. You know, the amount of time and energy it would take for me to be able to know just how to do this one thing would be immense. And you know, I can do it and then my list gets nice and sorted. But the amount of work that I

sorted. But the amount of work that I had to do in order to get that done is tremendous. Contrast that with our

tremendous. Contrast that with our agent. All I'm going to say is write me

agent. All I'm going to say is write me a simple function to sort this file alphabetically, then execute it. It's

going to do some thinking to begin. So

first it's going to read the file then it's going to see the structure and it's going to write the script and then execute it basically immediately. The

amount of time that it previously would have taken me somebody with no knowledge how to do this probably is on the orders of like a day at least just to be able to write that script let alone all other ones and this thing can now do it in you

know just a few moments. You offload the coding to the model have it actually put together these deterministic scripts which are a lot more reliable and then what you do is you just sort of sit back and orchestrate. Okay, so IDEs, as I

and orchestrate. Okay, so IDEs, as I mentioned, were kind of like code editors, right? And they've been around

editors, right? And they've been around for quite a while, at least 15 years.

They weren't designed with AI agents in mind, but the new breed of IDs just give agents access to everything. They have

your editor access, they have terminal access, they even have browser access.

Now, so there are three main options I want to talk about today. Each of them have different trade-offs, and your choice depends on how much flexibility versus simplicity you want.

The first is anti-gravity. I'm actually

going to be opening this in a moment and then running through this in a lot more detail. But basically, this is Google's

detail. But basically, this is Google's brand new agentic development platform launched super recently and it's very, very good. It's designed primarily for

very good. It's designed primarily for their Gemini class of models, but it supports other providers as well. It's

the cleanest and simplest interface in the bunch, has by far the lowest learning curve, and it looks something like this. On the lefth hand side, it

like this. On the lefth hand side, it has the file explorer. On the right hand side, you have your agent. And you'll

notice in the middle, it's actually empty. And there's the ability to open

empty. And there's the ability to open up agent managers, code with the agent or edit the code inline. For the most part, this thing is really simplified and it knows that you don't really give a crap about what the files look like.

Obviously, if you open a file, it'll open up in the middle, but for the most part, it abstracts away all that for you and you just communicate with the model and it does what you want it to do. Next

is VS Code. That stands for Visual Studio Code. This is a lot older of a

Studio Code. This is a lot older of a platform. It's actually the platform

platform. It's actually the platform that all other platforms are kind of based on nowadays. It was built by Microsoft. It's their free co-et code

Microsoft. It's their free co-et code editor and it's very, very popular. The

big draw to Visual Studio Code is its extensibility. You can't really see this

extensibility. You can't really see this that well, but over on the right there's this little extensions tab. And VS Code just has like a massive supported library of all the different extensions you could want. These extensions are pretty cool. Now, for the most part

pretty cool. Now, for the most part nowadays, we just use like the Cloud Code extension, GitHub Copilot, right?

These like AI model extensions that add AI functionality into your code. But

there are some cool things that you can build in with extensions that just allow you to use whatever the heck you want with it. So, I see this as less of like

with it. So, I see this as less of like a specific AI editor and more as just like a really general editor that a lot of people are used to. They just import extensions to turn their editor into, you know, a hyperoptimized AI one. I'm

going to be showing you this one as well, just because it's very popular.

Finally, I want to chat a little bit about Kurser. Kurser is actually one of

about Kurser. Kurser is actually one of the first like AI editors on the market, like an an editor that was built specifically for AI in mind. I don't

really like using Kurser these days myself. Um, obviously it's baked in

myself. Um, obviously it's baked in directly to every part of the platform.

But for the most part, I just find anti-gravity is better in every way, shape, and form. Um, very similar interface to what you guys are used to.

So, there's a file explorer, there's an editor, and so on and so forth. The file

explorer, which you can't actually see in this screenshot, is usually just on the left hand side. Then in the middle here, you have like the big code editor, and then on the right hand side, you have both a chat and a composer. Same

sort of vibe to anti-gravity. Aside from

that, it just has access to everything.

I'm not going to cover this one just because while it's somewhat popular, it's not as popular as the other two options and I want to be mindful of everybody's time. Okay, so let's start

everybody's time. Okay, so let's start with anti-gravity. Pretty

with anti-gravity. Pretty straightforward stuff. On the lefth hand

straightforward stuff. On the lefth hand side, we have that file explorer, which I talked about to you guys earlier. In

the middle, we have obviously the editor, which is where you can open specific files and then change things.

And on the right hand side, you have the agent window, which is where you can talk with agents. So, just to be clear, I sent this agent a message saying, "Hey, what's up?" And then it tells me, "Hey, I'm ready to help. I see you've been working on a variety of workflows recently from YouTube transcript

analysis and panda dooc proposals to lead scraping. What would you like to

lead scraping. What would you like to tackle today? To cover the middle

tackle today? To cover the middle section here as I talked about earlier uh markdown.md is the file format that

uh markdown.md is the file format that we put a lot of instructions in. And

you'll notice that we have a blue sort of headers over here you know orange text over here and then the rest of it is uh is white. And so what I've opened up is I've opened up a simple directive called the Upwork scrape apply system

which just scrapes Upwork jobs matching AI automation keywords, generates personalized cover letters and proposals and outputs to a Google sheet with a one-click apply link. The whole idea behind the system, and I'm going to show you how to build ones just like this in

a moment, is you can automate the process for the most part of applying to an Upwork job. Upwork being a freelance platform. This sort of stuff is going to

platform. This sort of stuff is going to very quickly become an integral part of most people's workflows. So as you can see here, we define some inputs. So, we

give it some tools. We give it a filter.

You may be thinking like, good lord, Nick, did you write all this? No, of

course not. I had AI, write all of this for me based off some simple bullet points. It's very meta. You use AI to

points. It's very meta. You use AI to come up with the instructions for another AI model. Um, in a way, in that way, you are literally just some person that is giving some minor instructions.

You're acting more as like the motivator than anything else. Okay, I remember I talked about on the left hand side how there'd be a couple of different folders here, directives and then executions.

I'm just going to open up directives and show you guys around a little bit. So,

as you can see here, I have a bunch of these different flows set up. One of

them was Upwork, scrape, apply, but there's, I don't know, another 15 or so.

Create proposal MD, cross niche outliers, deep research, pitch, and so on and so forth. Let's say I'm in the building process of an agentic workflow.

What I'm going to do is I'm going to ask this to help me out. Hey, is there anything that I could do to the create proposal directive to improve it?

Suggest some alternative approaches.

Going to enter that in. And now the model is going to come up with some ways that we can make things better. It's

going to do so with the directive structure. Um we injected a prompt into

structure. Um we injected a prompt into its uh agents MD, claude MD, Gemini MD, multiple different ways to initialize system prompts, but it has all the context about what I mean. And this is how Gemini's UX works. You know, analyze

and improve, create proposal directive.

Gives me the reasoning loop over here, progress updates, it gives me a big plan, and then I get some interpretability, some access to its thoughts. At the end of it, we end up

thoughts. At the end of it, we end up with, "Hey, you should add a human in the loop review step. Hey, you should try a web enrichment option. Hey, you

should handle variable token counts.

Hey, you should do robust JSON handling.

Hey, you should do a dynamic follow-up email." That's pretty cool. I like the

email." That's pretty cool. I like the idea of number two. Number two sounds great. Why don't we give that a try? All

great. Why don't we give that a try? All

I'm doing is asking it for its opinion.

I went through. I didn't like four out of the five, but I did like the second.

So, now I'm just going to have this model go to the directive and then update it to include a web enrichment step. It's then built me a plan that

step. It's then built me a plan that looks pretty straightforward and easy.

I'm then going to okay this. What I

really like about Gemini is it just shows you sort of like the tracked changes really easy. And you can see here that it's now provided an additional step called research client.

Understand the client's brand voice and current context. So on and so forth. If

current context. So on and so forth. If

a website URL is provided or can be inferred from the email domain, then use this thing to fetch the client's landing page. Analyze all this information and

page. Analyze all this information and output a brief summary. So I like this.

I'm going to accept it. And then I'm going to say, "Yeah, sounds great. Let's

give this a try.

As part of this specific workflow, um, I have the model ask me a bunch of questions about the client. To be

really, really straightforward here, I'm actually just going to open up chat GBT and then going to take a screenshot of this. I'll feed this in and I say, I'd

this. I'll feed this in and I say, I'd like you to give me a bunch of example data here. I'm feeding this into a model

data here. I'm feeding this into a model for a demo, for a YouTube video.

I'm then going to have Chat GPT construct a big list of demo information, and then I'm going to feed that in in a second.

Okay, as you guys can see here, I have a bunch of data sets here. Um, they fed me in 10. I'm just going to use one, use

in 10. I'm just going to use one, use this information for the demo.

Cool. And now I'm sort of orchestrating multiple AI models. I am certainly using chatbt as a copy paste sort of thing, but I just wanted to show you guys that like this is data that is in a way real.

It's data that is supplied outside of the system that I'm feeding into this workflow. I'm not having um Gemini

workflow. I'm not having um Gemini itself within its own context come up with it. I'm giving it a bunch of

with it. I'm giving it a bunch of information outside of things. Okay. And

at the end of it, I actually have a fully functional proposal over here for bright path learning with an AI powered student success predictor. How cool is that? We have all of the problem

that? We have all of the problem statements, the solution statements.

It's really clean. It's pretty nicely uh well done. Uh even includes some

well done. Uh even includes some information here about pricing and so on and so forth. So, these are actual proposals that I sent to actual clients.

As you guys see, we just generated a bunch of demo information for a hypothetical demo client that actually meaningfully altered a workflow in something like 30 seconds of actual work. Everything else is me just waiting

work. Everything else is me just waiting for the model. Okay, so that was anti-gravity. Now, I just want to show

anti-gravity. Now, I just want to show you guys VS Code. And one of the reasons I want to show you guys this is because I want to show you that you can open up the same workspace on multiple different IDEs. You could actually create a

IDEs. You could actually create a workspace and then you could run it in anti-gravity, you could run it in VS Code, you could send it to your buddy who operates in cursor. There's so much that you could do here. It's fully

interoperable. The only thing that really matters is the agent itself and then the workspace. You could swap out Gemini for GPT 5.2. You could swap that out for Claude Opus. I mean there there's just so many different options

here obviously, but just want to give you guys um sort of a view into the fact that all the stuff is interoperable. It

doesn't actually really matter what you use. So just pick whatever makes sense

use. So just pick whatever makes sense to you, what you enjoy. Okay. So VS Code works very similarly because the two are very heavily inspired by each other. Um

on the lefth hand side we have the file editor. So right now I have the

editor. So right now I have the agents.mmd file open. Okay. So if I go

agents.mmd file open. Okay. So if I go over here you can see it's actually in the root directory. So I'm going to give that a click. That opens up the instruction file. Obviously I'm then

instruction file. Obviously I'm then feeding in um you know some very simple information here just saying run my Upwork scraper. It's actually gone

Upwork scraper. It's actually gone through generated proposals pushed to a Google sheet. Same sort of idea. If I

Google sheet. Same sort of idea. If I

open up this Google sheet I have information about specific Upwork jobs.

This took a few moments which is why I didn't do this in real time. Um in my case I was running a really simple workflow. I didn't want to edit a

workflow. I didn't want to edit a workflow here. or I actually just wanted

workflow here. or I actually just wanted to use one. And you'll see that there is a distinction between the building of the workflows and then the using of the workflows. In my case, I'm now using a

workflows. In my case, I'm now using a workflow, not building it. Um, which is why I just had it say, "Hey, let's run this thing." The color scheme is

this thing." The color scheme is slightly different. It looks slightly

slightly different. It looks slightly different. I'd say VS Code looks a

different. I'd say VS Code looks a little bit older, of course. But the

most important thing that I'll show you that sort of distinguishes VS Code from a lot of things is just how big their extension library is. They really do support a tremendous number of extensions. If I just type the letter A,

extensions. If I just type the letter A, you'll see here that there are like hundreds of extensions that it opened.

This is the search bar for all of the extensions. I could scroll down this

extensions. I could scroll down this thing for hours and probably never run out of things. Hell, I could probably do this for like the next two months or whatever and then I'd never run out of extensions. So, that's pretty cool.

extensions. So, that's pretty cool.

There's just a ton of different things you could do depending on what you're doing. There's code formatterers to

doing. There's code formatterers to change like the colors and stuff like that. Uh, you can kind of think of this

that. Uh, you can kind of think of this as like I don't know who here plays video games, but it's kind of like Skyrim mods, Oblivion mods, you know, like you can just modify it to do whatever the heck you want, which is really awesome. Okay, you guys have now

really awesome. Okay, you guys have now seen anti-gravity and VS Code in action.

Let's talk a little bit more about the workspace itself. I've shown you guys

workspace itself. I've shown you guys how to operate within a workspace, but how do you actually set it up? Well,

first thing is you have to obviously create a workspace. That's really easy.

Anytime you open one of these IDs for the first time, the first thing it'll say is, "Hey, you should create a workspace." So, assuming you've done

workspace." So, assuming you've done that, now you're inside of the workspace. What we have to do now is we

workspace. What we have to do now is we have to set up the folder structure that our agent can understand and then navigate. We also need to give it some

navigate. We also need to give it some instructions that it knows how we structure the folder and why. And if you think about what I'm doing with you guys and then what I did with the agent with the agents.mmd file, I'm basically

the agents.mmd file, I'm basically giving it a whole education as to why we are in the do framework, why we're using this to begin with. And I find that sort of context is really important. It's

like a training uh session for your agent. Get them up to speed. Have them

agent. Get them up to speed. Have them

understand the methodology and the philosophy behind why you're using them in that way. And they'll typically work a lot better than if you just tried to raw dog it. So I think about this the same as like setting up a desk for an employee at your organization. They need

to know where everything goes. They need

to have like the base sort of things set up. They need to have the base folders

up. They need to have the base folders and so on and so forth. Then once you've given them that structure, they can obviously excel within it. So I'm going to cover a lot more about this in the do section, but uh for now just know that a well organized workspace I would

consider essential. So what is the

consider essential. So what is the actual project structure? Well, let me show it to you. We start off with the workspace itself. And you can name the

workspace itself. And you can name the workspace whatever you want. Now

underneath the workspace, you then have two major folders. You have directives over here. Then right over here, you

over here. Then right over here, you also have execution.

Now, inside of directives, let me show you guys what that would look like. You

have a bunch of files. So, you would have, for instance, scrape_leads.md.

You might have another one, upwork applybot.md.

applybot.md.

These are your highlevel instructions where all of the top information goes.

you know like hey start the scraping leads thing by asking the user what leads they want to scrape right once they've supplied those leads uh the directions to you then ask them what platform they want to use just some very

highle stuff now underneath that as I mentioned we have the executions and then we have the actual like um Python scripts that correspond to the directives so over here for instance we'd have and let me just make this

really really simple to see we'd have things like uh appify which is a platform scraper py I underneath that we'd have I don't

know Upwork scraper py maybe underneath that we have upwork applier or something like that

py and what essentially occurs in your directives is you just say somewhere within it hey step three I want you to call ampify scraper py it reads that in the directive and then it just knows

which execution to call I have some recommendations here of course um use subfolders for inputs outputs, prompts, and reference materials. So that is sort of what the directives and the executions are. But if you, let's say,

executions are. But if you, let's say, have a bunch of files that you feed in routinely as resources, you can absolutely add a resources folder. The

only two folders that I would consider required in the DO framework anyway are just directives and executions. And

depending on the framework, you know, people have different ideas about this, but you can add in whatever other folders you want. You could add a resources folder. A common folder to add

resources folder. A common folder to add is a TMP folder. That just stands for temporary. So sometimes agents um need

temporary. So sometimes agents um need to create files temporarily to do things. They use files like as like

things. They use files like as like scratch pads. Uh my friend Gio yesterday

scratch pads. Uh my friend Gio yesterday was telling me about an experiment that somebody did where he had like a chat room for agents.mmd

where basically he had multiple agents run simultaneously and then add things to a chat room. I mean obviously the world is your oyster here and I'm not going to try and force you in a specific way of being, but there are a variety of other folders that I would probably

include as well. I'd include some clear naming conventions so the agent knows what lives where. For instance, if uh my thing scrapes leads, I would call it scrape underscore leads. I wouldn't call it like s_l with some naming convention.

I mean, these character tokens are cheap, right? Be very descriptive with

cheap, right? Be very descriptive with the titles of your files. And then if you have any documentation like the highle context and then you know like your agents MD and so on and so forth, make sure to

include that as well. Talked about the directives and execution folders. So I'm

going to leave that. Um directives

generally holds things in markdown.

That's important to understand, which is just a way to, you know, um mark up text a little bit. An execution is typically in Python, although that depends. And

this is just that simple separation between what you do and then how you do it. So the directives are what you do

it. So the directives are what you do and then the execution scripts are how the thing actually happens. I don't want to beat a dead horse here. Um the number one other thing that you guys really need to understand is this idea of an

env file. So when you're working in any

env file. So when you're working in any sort of programming environment, typically you don't want to store like passwords and secrets and API keys in the code itself. you want to store it in

a separate area which um programmers have created a convention around called your env. That's just sort of like where

your env. That's just sort of like where you store all of your API keys, all of your credentials and so on and so forth.

And the idea is instead of saying, "Hey, use this API key in your directive," you just say, "Hey, grab all your API keys from your env." That way, logically, if you ever wanted to share your directives later on, you could do so really easy.

You would just copy and paste them. And

I'm going to cover how to share and set up cloud-based instances later on. A lot

of people ask me why these naming conventions exist, why an env.

Some things in technology just are. You

ever ask yourself why um JPEG files are called JPEG files? Well, it's because this is actually like an organization. I

forget what the name of the organization is. It was like the journal for blah

is. It was like the journal for blah blah blah blah blah blah executive group, right? This is just a thing that

group, right? This is just a thing that has occurred 50 years ago that we all just must follow now. And if we change the name, then other people won't understand what they are. So it's just easier to stick with the name is widely

recognized by basically everybody. So we

just call these things and that's okay.

Likewise there are some conventions right now between the models themselves.

So for instance um I talked about system prompts things that you inject at the very top of any model conversation and there's a b a bunch of different ones right now. Claude.md corresponds to

right now. Claude.md corresponds to claude. Gemini.mmd is for gemini.

claude. Gemini.mmd is for gemini.

Curser.md is for curser. agents.m MD is sort of like a general one that is supposed to be a fallback in case you don't have this specific one. And you

know what I do? I just throw all of these in my main project route so that whatever model I use, I have the exact same sort of thing. So I will copy the same thing from agents MD to cloud MD to

Gemini. MD to cursor MD. This

Gemini. MD to cursor MD. This

interoperability is really really easy.

And obviously these names matter. Just

because somebody said, well, we should probably have some configuration file.

Why don't we just call it claude MD? We

use capitals because that'll stand out and make it like hypersp specific and differentiable and then other people sort of went on that bandwagon and that's how it is. If you upload a gemini.mmd to claude then claude isn't

gemini.mmd to claude then claude isn't going to understand what that is.

They're not going to automatically insert it. But if you upload a claude.md

insert it. But if you upload a claude.md

to claude it will. If you upload uh you know agents.mmd or codecs or cursor or

know agents.mmd or codecs or cursor or whatever to your various models of choice it'll understand what's going on.

The really cool thing is you just create the structure one time and then the agent just works with it for every project going forward. Which is one of the reasons why I love this. The

initialization is so easy that I now don't even tell people to initialize it themselves. I just give the agents item

themselves. I just give the agents item D file to anybody I want to set up and then I just say hey have your model do it. Then they just go to their agent and

it. Then they just go to their agent and they say hey can you set up my workspace according to this file and then it does so automatically. How cool. I want you

so automatically. How cool. I want you guys to know that as you get better and better with IDE, this feeling of overwhelm will decrease. But at the beginning, it is totally normal to feel overwhelmed with the menus and the

panels and the buttons and all the keyboard shortcuts. Um, it's just like a

keyboard shortcuts. Um, it's just like a beginner pilot looking at cockpit instrumentation right now. I think I told you guys that I was taking my pilot's license and it is it is really intimidating. This is the exact same way

intimidating. This is the exact same way that I tried to put myself in your guys' shoes when explaining this. I wish

somebody explained pilot instrumentation to me the same way I'm explaining ID instrumentation to you. But you don't need to learn everything at once. And

hopefully it's clear, as long as you understand those three things, the file explorer on the lefth hand side, the editor in the middle, and then the agent chat on the right hand side, you're already 80% of the way there, and you can build and use Agentic Workflows for

your own business. The goal isn't to master every feature here. It's just to be comfortable enough that the ID doesn't like slow you down. Okay, so let me show you how you can easily build proposals and high-quality PDFs and

visual assets with Agentic Workflows.

This is an example of a workflow that I use all the time in my day-to-day business. So immediately underneath this

business. So immediately underneath this I have a sales call transcript.

Essentially what we do is we feed in these sales call transcripts and we just tell the model hey I want you to generate a proposal with it. So what am I going to do? I will literally just say generate a proposal using the below transcript. Then I'm going to press

transcript. Then I'm going to press enter. What's going to happen is this

enter. What's going to happen is this model is going to immediately start looking through the existing directives which I'll talk a little bit about more later in the course. It'll find contact

details and everything that we need in order to actually send the proposal because I removed the email from this specific one. I am going to supply just

specific one. I am going to supply just a demo email. What its reasoning is doing is it's extracting the main problem areas, the main solution areas, the things that we talked about and also the pricing. Immediately afterwards,

the pricing. Immediately afterwards, it's going to ask me for the email address. This is a demo, so just use

address. This is a demo, so just use and I'm going to provide my own.

And once it has this information, it can proceed and actually go through with the generation of the asset. So it's not formatting this in the way that I want the proposals to look like. Keep in mind that I had no real work here aside from copying my transcript over. And even

that is unnecessary. I could have just used it directly from the transcript provider Fireflies, but I wanted to show you guys how malleable this sort of thing is. Whether you copy and paste it

thing is. Whether you copy and paste it in, whether you put an API call to like some transcript endpoint in, uh, you know, it works the same regardless.

Great. And it's finished. Now it's going to do is send a quick follow-up email.

And the email was sent successfully just using an MCP server that I set up. And

now we get a summary as well as a link so we can view it directly.

When I open this up, you can see the proposal document right here. It

includes um you know your problem areas.

Number one, your revenue is unpredictable because you're relying on referrals and sporadic outreach. One

month may bring three clients, the next month brings zero. The feast or famine cycle makes it impossible to plan hiring, delivery capacity, or growth investments with any confidence. This is

all stuff that the AI came up with. You

know, I chatted about this briefly on the transcript, of course, but um everything else here, the tone of voice and everything like that was just a very simple highle prompt instruction as well as a brief example. The actual workflow

here took me maybe 15 minutes to set up and to end. And as you can see now with just a prompt, uh I can generate high-quality sales proposals within seconds. So, this is what you are going

seconds. So, this is what you are going to learn how to do. You're going to learn how to set up workflows, not only to do things like generate proposals, although I absolutely recommend you do if you're in any sort of service business where you have sales calls, but

we can do more or less anything. I've

set up dozens of workflows to automate many of the mundane routine business tasks that I have. Things that just a few years ago, people probably would have raised an eyebrow at you and thought you were crazy for suggesting you can automate something like this.

All right, it's now time to talk about DO directive orchestration and execution. So up at the very top of

execution. So up at the very top of this, you can see that I've written three layer software architecture.

That's because that's what DO is. It is

a three layer system that we're wrapping around an AI agent in order to help constrain its outputs and take it from like a probabilistic thing which is all over the place to something very

standard, consistent, and deterministic.

So at the very top of this system is your directive layer. Of course, this is going to include workflows and SOPs. And

by the way, if you don't know what SOP means, that stands for standard operating procedure. And standard

operating procedure. And standard operating procedures are very common in any sort of business, which is one of the reasons why I like Do so much because all you really do is just import your standard operating procedures in

whatever business you are working with, whether it's your own or business you're helping. Then you just say, "Hey, turn

helping. Then you just say, "Hey, turn this into a directive as per do." And

boom, you're done. You now have like an AI agent that just does tasks that your company needs to do. So up at the very top kind of the first layer is this directive. Now underneath you have the

directive. Now underneath you have the orchestration layer. Your orchestration

orchestration layer. Your orchestration layer is your AI agent or AI employee in a way. And you'll also see that like not

a way. And you'll also see that like not only did I put a little robot face here, but I also put a person. And the reason why is because it's actually pretty similar to how most organizations work.

You have some highle directives. Those

directives are read by employees or you know other people in the business. And

then what they do is they just make decisions surrounding how to accomplish the highle uh directives. This is where they perform coordination, task management, and stuff like that. And

what they do with those decisions is they call or use tools. Now, if you're an AI agent, you're going to be using mostly software tools as expected. Hell,

if you're an employee, for the most part, you're going to be using software tools. Now, think of the tools that an

tools. Now, think of the tools that an average employee uses in any organization. We're using Google Sheets,

organization. We're using Google Sheets, Excel. We're using Microsoft Word, Docs,

Excel. We're using Microsoft Word, Docs, right? All of those things are actually

right? All of those things are actually analogous to tools that we use within an organization to accomplish things. It's

the same thing that our AI does with tools that it creates. Okay. So down at the very bottom here, you have the execution layer and this contains tools.

It contains Python scripts and so on and so forth. It's primarily responsible for

so forth. It's primarily responsible for action and output. I don't want people here to be really scared or worried about DO. It's a lot simpler than you

about DO. It's a lot simpler than you may think. The thing is we just need to

may think. The thing is we just need to frame it as like a three- layer software architecture in order for the rest of the course to make sense. So to be clear, do is literally just a folder structure plus a system prompt. And

pretty much all frameworks out there right now for aentic workflows are all we do is we just set up a folder called directives and a folder called execution. Then we add some files like

execution. Then we add some files like an agents MD, cloud MD or Gemini MD as our prompt and then you know we might add avi keys etc. Again, the API uh env

is literally just a convention that, you know, some programmers made forever ago.

So, it's great for beginners primarily because it's intuitive and it's really easy to understand. And it's also really cool for businesses because we can just copy and paste SOPs directly in like um a company that I'm currently working with right now does marketing

specifically for dental practices and they do about $2 million a year. And

when I introduced agentic workflows to them, you know, I'm kind of like in a meeting I met with the director and I started discussing how, hey, you know, I think we could probably automate a couple of the previously non-automatable tasks with aentic workflows, he's like, okay, so how do we start? And I was just

like, well, you guys got a knowledge base. Why don't I just feed the entire

base. Why don't I just feed the entire knowledge base in and see what happens?

And within 15 minutes or so, we had actually like procedurally turned most of those things into agentic workflows.

We had all of the the API keys. We had

everything that we needed preset which was lucky cuz a lot of the time you have to jump around and you know finagle various services. Um but yeah within 15

various services. Um but yeah within 15 minutes we had turned this into dough and we now have a workspace that you know the director managers and myself can use to do like 90% of the economically valuable work. Is that

going to lead to some headcount reduction? Probably. I mean when you

reduction? Probably. I mean when you automate 90% of 10,000 people's roles obviously you need to take a step back and start doing more management style stuff than actually getting your hands dirty. Uh but yeah, that's just a very

dirty. Uh but yeah, that's just a very simple and straightforward example of something that I have actually just just now done. The reason why dough works

now done. The reason why dough works really is because of the whole stochasticity idea. And stochasticity

stochasticity idea. And stochasticity just for anybody that's like why the heck is Nick using all of these crazy words. It's just the way to formalize

words. It's just the way to formalize randomness I would say. I mean it's a little bit different but for for our purposes you could use that. So it just takes this big like if this is like the total range of possible outcomes. Okay?

You know you could do uh this outcome you could do outcome somewhere here. You

could do this outcome you could do outcome somewhere here. All DO does is it just reduces this so that the range of possible outcomes is a lot more narrow. And so, you know, for the most

narrow. And so, you know, for the most part, we're operating within a very tightly bounded range of possible outcomes for our system. It can do this or it could do that. And it's very, very similar uh because we do this through

the separation of concerns. It's just a lot more reliable. This lets me get to 2 to 3% error rates on a lot of business functions. That dental uh marketing

functions. That dental uh marketing business that I was talking about earlier is a great example of that. It's

really not more complicated than that. I

also like to think of it as I don't know if you guys have ever gone bowling or something, but uh this is going to be my crappy bowling pin thing. Um you know, typically the way that bowling works is

you have gutters on the side and you know if your bowling ball is not very good or if you are not very good at bowling I should say. Um you know like a lot of the time it's going to veer off into the gutter and then you're screwed,

right? So as a total newbie, one thing

right? So as a total newbie, one thing that I really like doing is I like asking them to set up the guardrails. So

I say, "Hey, do you mind setting up the guardrails for me?" Then they set up these little guardrails that basically prevent the ball from um landing. And so

what ends up happening is I basically will bump off of a wall and then I still get to hit some pins. That's all dough is for agents. It just constrains it. We

just give it some guardrails and then we significantly improve the probability that it does something that we want. So

I'm going to go very into detail here and be very comprehensive because this is the framework we're using for the rest of the program. You've already seen me use this a bunch through the various demos that I've I've created. Now I just want to provide context for everything.

If some of this stuff is repetitive or if you think you already know this stuff, that's okay. I would recommend just watching it regardless. Try and

internalize as much of this as possible because this is the same idea that any framework uh is going to use. So the

directives obviously are SOPs written in natural language as markdown files.

Markdown is very important. File ending

all will end in MD. That's obviously

stands for markdown. Uh and generally speaking, this is just a sort of like markup language.

A markup language just formats text. So

this is plain text for instance, right?

First SOPs are written in natural languages as markdown files. Uh uh uh you know marked up version of this might be first. Let me make sure I got this

be first. Let me make sure I got this right. You had some stars. SOPs and now

right. You had some stars. SOPs and now this is bolded text are written in you know natural language. And so now it's like quoted text as markdown files. What

we're doing is we're taking text and then we're just marking it up. We're

adding some structure to it basically.

Um markdown is just one way to do so.

So, for instance, this on a page is actually markdown underneath it. Um, I

used markdown to help uh I used AI actually to help me convert a big 17,000word document into um a slideshow.

And so, this was actually a heading. And

the way you demonstrate or the way you use headings in markdown is you use little number signs. So, for instance, if I wanted to write this big heading, I actually would have written this layer one, you know, directives.

Underneath that, you have bullet points.

Bullet points in markdown are little stars. So star first, you know, s os are

stars. So star first, you know, s os are written, right? So all of these little

written, right? So all of these little characters are just a ways that you add formatting to text. And the reason why we do this for our AI agent for directives is because formatting allows us to add a lot moreformational content to the text. It also allows us to

structure things. So it's not just one

structure things. So it's not just one giant massive text dump. We add we get to add new lines. We get to add various tabs for indentation. Basically, we just add a bunch of structure to things as opposed to it just being this, right? we

basically convert it into something that is a lot more interesting. We have

spaces and we have little bullet points and you know the structure of the text kind of looks like a face funnily enough you know allows us to impart a lot more information per token and then it's also token efficient. There are other

token efficient. There are other markdown languages as well. One that

you've probably heard of before is or markup languages as well. One that

you've probably heard before is called HTML. With HTML the way you mark things

HTML. With HTML the way you mark things up is you use a variety of tags. And so

tags are these little number sign things. If I were to try and write the

things. If I were to try and write the same thing in tags, it would be significantly less token efficient and so I'd actually have written way more um total tokens, which obviously would have

consumed a lot of my context. So instead

of that, okay, instead of the HTML body, H1 layer 1 directives, H1, whatever, all we're doing to to accomplish the same thing is I literally just do a number sign. Obviously, this is one character.

sign. Obviously, this is one character.

That's like, I don't know, however many characters, way more, obviously, to just um demonstrate some some structure there. Okay, so that's markdown. Now,

there. Okay, so that's markdown. Now,

these define your goals. They define

your inputs. They define your tools, your expected outputs, edge cases, and ultimately a lot of other things that you can define. I don't proclaim to have the perfect directive creation structure. I'm going to show you my own

structure. I'm going to show you my own directive creation structures, and that tends to include all these things, but um in general, you just want to provide highle overviews. Now, the way I write

highle overviews. Now, the way I write these or the way I have AI write these is I write them like I'd instruct a competent employee. I would make them

competent employee. I would make them clear, but I would not micromanage. And

really, AI does this for you. All I do is I describe the what and the highle hows of my task in markdown and then I just trust the agent to figure out the rest. I'm going to remember to drink

rest. I'm going to remember to drink this tea cuz it is going to get cooled.

Damn, that stuff's good. Holy.

So directives obviously live in the directives folder in our workspace. The

way I separate each directive is as a separate markdown file that covers one workflow or one capability. For

instance, I would have a scrape_leads.

MD file, but I wouldn't have a run business MD file just because, and maybe we'll get to this point later, I don't know, but um just because this is a lot that we're asking from the model. And so

the model typically starts looping over and and doesn't really understand various edge cases and stuff like that.

I constrain these into sort of like modular directives. And then later on I

modular directives. And then later on I can actually group them with umbrella directives. Not umbrella to the point

directives. Not umbrella to the point where it's literally like hey run my own business but umbrella to the point where it's like hey you know run onboarding flow or something like that. So some

examples lead scraping MD proposal generation MD email_enrichment MD and so on and so forth. I highly recommend making the names descriptive. Logically

speaking these are the only things that uh descriptives descriptive um this is the only way that like the model can tell kind of what's going on here. You

can of course add um some other forms of structure to the text. You could add what's called YAML front matter, which I'll talk about a little bit more later on. But for the most part, like the

on. But for the most part, like the model just consumes the name and then uses that name to determine which workflows it's going to use. If I say, "Hey, I want you to scrape some leads," obviously it's going to do the lead scraping one, right? But if I just called that L_S with some

naming convention, it would have no idea what it's doing. So very important here to just like be descriptive. Don't use

acronyms. Don't use anything that like complexifies the names of the directives if you want the agent to be able to use it as best it can.

Very important point is that directives contain no code at all. There is zero code within a directive. All directives

are are natural language instructions.

We don't have any code, no executables.

And really there's there's very little technical here. You know, I may [snorts]

technical here. You know, I may [snorts] include some URLs. I say, "Hey, go to this URL in order to get information about this." But I'll never actually

about this." But I'll never actually include any sort of code or executable.

The reason why is because we want these directives to remain readable by all humans within the organization. And they

should just make sense to all people within the company. If your directives are to the point where they're so technical and confusing that like any, you know, average low-level staff member within the business could not read it

and understand what's going on, you've screwed up. The whole idea is that you

screwed up. The whole idea is that you want to lower the barriers to entry so that anybody in your company that is system-minded, they don't have to be technical, but they have to know systems can actually just improve things. You be

like, "Oh, um, yeah, take a look at that directive and let me know if there's anything that you think I'm missing."

And then they just read it natural language and they go, "Oh, you know, uh, sometimes customers ask for X, Y, and Z.

We should probably add some logic there." Right? You want that person to

there." Right? You want that person to actually be able to substantially improve the organization. You don't just want it to be like a black box. Because

that's one of the main benefits of this, right? We're making this really, really

right? We're making this really, really interpretable. removing bottlenecks

interpretable. removing bottlenecks across the organization to have people see and understand how uh the systems in the business work. Okay, so next up we're going to talk about layer two which is orchestration. This is kind of

like the who. Um orchestration is basically a competent project manager.

So a good project manager in business rarely actually does the hands-on work themselves. They're basically just like

themselves. They're basically just like a nexus and that nexus takes information in and then it kind of puts information out. And you know this might be person

out. And you know this might be person one, person two, person three. They're

going to take inputs from these three sources. They're going to do some

sources. They're going to do some thinking and then they're ultimately going to go and delegate some additional work to person 1 2 and 3. So they make routing decisions at the end of the day and they take advantage of available tools. If you think about old school no

tools. If you think about old school no code flows like NAD and stuff like that, this job was basically done by you and you would orchestrate it once when you built the flow. You'd say this node goes

to this node, this node goes to that node, that node goes to that node, that node goes to that node. Maybe this thing loops around a little bit and then eventually we, you know, do this node or

something like that. This is a decision that you would make once when you built the flow. What's really cool is the

the flow. What's really cool is the orchestrator basically just does all of that on its own. So if I just show you guys as like a practical example here, the orchestrator

instead just compiles all the tools and then at runtime it decides, hey, you know, I actually want to do this and then this is actually going to go over here. After that's done, it's going to

here. After that's done, it's going to go over here. That's going to go over here. We're going to loop back three

here. We're going to loop back three times over there, start over here, and then we'll finish over here. And because

it's flexible, it can adapt to any situation at the time that you are asking it to do things. You just give it tools and then it just does all the routing and stuff like that for you.

Obviously, we want to provide at least some structure, right? We don't want to just give it a bunch of tools and say, "Hey, figure it out." That's what our directives are for. So, it does ensure work gets completed according to those.

But the flexibility here allows it to deal with situations like when something breaks, how to diagnose the problem rather than just crash and and you know, 404. And then later on if you use sub

404. And then later on if you use sub aents like I recommend throughout the program um we're going to have like a document flow that not only will go through see uh workflow end to end if there's any problems it'll diagnose it

and so on and so forth it'll actually go back and it'll document for the purposes or rather the benefits of future instances of the agent um you know changes that it made things that you know the agent needs to keep in mind

logical errors that you know maybe agents typically make to avoid API exceptions that don't really make sense or work and so on and so forth.

All right, layer three is execution, which is the how. So, logically

speaking, execution is deterministic.

It's very modular. It's very

straightforward. Doesn't mean it's simple. The execution scripts are stored

simple. The execution scripts are stored in the execution folder. I typically

just use Python for this. Why? Cuz the

programming language doesn't really matter to be honest. And when you have Python, like at any point in time, if you needed to, you could convert this into whatever the heck you want. You can

convert Python into Rust, you can convert uh into Node, you could convert it into Java. I mean, like whatever language you want really. These things

are all [snorts] essentially just conversions of natural language at this point. Anyway, each script handles just

point. Anyway, each script handles just one thing. So, one job or one task. I'll

one thing. So, one job or one task. I'll

give you an example just using what we talked about earlier. So, if I have like a scrape leads directive, this is like the highle kind of workflow. Right? Now,

this workflow isn't just going to have one, you know, scrape_leads.py

script. This might actually have multiple different scripts. This might

have uh you know depending on whatever you're using might be like scrape_appify.py

scrape_appify.py might have like a upload to gsh sheet.py

hell might even have if you have to make some interface or something present to user.py.

to user.py.

But the point is these things all just do one thing really well. So this one scrapes appy really well. This one

uploads to a Google sheet really well.

This one presents to a user really well.

These are just like things that you know you like like tools that an agent can use in order to do some task. So what

happens is because they're deterministic, they do the exact same thing every time when given the same inputs. So like if I were just to I

inputs. So like if I were just to I don't know do this raw dog it and just feed in some prompt to my agent and say, "Hey, I want you to scrape aify for X, Y, and Z." And I had no tools and no directives, you know, it would

eventually figure out what I wanted to do. But if I did it 10 times, you know,

do. But if I did it 10 times, you know, on route one, it would go from here to here and then on route two would feed back and route three, you know, we just have fundamentally different um

executions every single time, right?

When you have the exact same inputs provided to the exact same execution scripts and then you get the exact same outputs, it becomes very obvious like what the model needs to do and you heavily constrain the inputs and outputs uh and you essentially just provide a

simple rule. Hey, you know, if I say,

simple rule. Hey, you know, if I say, hey, scrape appy or whatever, uh, for Texas, uh, for 200 people, it'll actually feed that in as a parameter to the scrape appy. It'll actually like

have dash dash, you know, location equals Texas, for instance, and then d- um, you know, amount equals 200 or something like that. And because we are

being extraordinarily explicit here, there's never any misunderstanding. So,

the agent just always knows what to expect. So, do you. Another example here

expect. So, do you. Another example here would be a scrape_apollo. That would

scrape leads from Apollo, but maybe you also enrich the leads. Well, now you have enrich_clearb. Maybe that enriches

have enrich_clearb. Maybe that enriches company data via that tool. Maybe you

then have a send email that sends emails via specified service and then a create pandock which generates proposals. What

you'll quickly realize is when you build a sufficient enough library of tools, you can have multiple directives reference the same tools. Like for

instance the send email pi maybe as part of my scrape_leads.mmd

directive I always send an email with a summary of the leads right so maybe you know somewhere here I say hey you know generate the the leads scrape it with apolla and then send an email well what about the create panadoc maybe in the

create panadoc uh maybe I have like a generate proposal MD well the generate proposal MD um also needs to send an email what's really cool is when you

define these atomic functions Both of these can call the same execution script. And because we've optimized the

script. And because we've optimized the hell out of these execution scripts by rerunning and self- annealing and all this stuff, which we'll talk about later, um, this is really robust and it basically like works every time.

Execution scripts are not AI for the most part. They don't hallucinate. They

most part. They don't hallucinate. They

don't make things up. They basically

either work correctly or they throw a clear error. So there's no ambiguity.

clear error. So there's no ambiguity.

There's a programming term here called unit testing, which basically means like you can like isolate this down to its barebones function, just its input and its output, and you can just test that.

You can version control them. So you can have like a log of updates and you can optimize them independently. You could

start with like um some sort of serial flow where it goes one and then it does two and then it does three and then after a few runs maybe it'll come up with a more efficient way to do things.

For instance, maybe it'll split it and it'll parallelize one, two, and three and then recombine the inputs or something for some API call. Uh the

options here are virtually limitless. Um

but because they don't guess or hallucinate, you can just incrementally improve these things over time. I had

this question come up the other day, so I figured I'd answer it in this course.

Um, nothing says you can't actually use AI inside of your scripts. For instance,

you might have a thing called process leads with, you know, claude. py that, uh, I don't know, it feeds in a bunch of leads or

grabs the leads from like a Google doc or something or Google sheet and then it just like passes them all through Claude and has you tell something about each lead. I don't know, whatever the heck

lead. I don't know, whatever the heck you want this to say. Well, you can still use AI to do that for you, right?

It's still passing it into Claude. It's

just doing so in a much more predictable way because you are defining it within a single workflow as opposed to just like giving it full orchestrator access. Like

for instance, your process leaves with Claude would probably start by like reading the sheet, right? That's

probably what's going to happen under the hood. After you read the sheet,

the hood. After you read the sheet, it'll then um send each row to Claude.

Uh when you do that, you'll have like a specific prompt that is like deter, it's not deterministic, but it's as deterministic as possible. You know, you set the temperature really low. It like

expects the same outputs for the same inputs and so on and so forth. After

you're done with that, maybe you like add update to sheet or something. Um so you can call, you know, open AI anthropic Google at your whims. I do it all the time

within my flows and actually is a pretty big chunk of how I do things. I also

call like neural networks and stuff like that. I use various libraries. Uh you

that. I use various libraries. Uh you

don't have to just you know do it all with old school Python automation. I

guess the point that I'm trying to make is just make these execution scripts very atomic. Make them do one thing and

very atomic. Make them do one thing and just make them as deterministic as possible. Um this will significantly

possible. Um this will significantly improve the quality of your end result.

So why does this do model work? It works

because it plays to everybody's strengths. When you do not constrain the

strengths. When you do not constrain the outputs of LLMs, they're really unpredictable, right? They'll try

unpredictable, right? They'll try anything and when they fail, they fail spectacularly. And it might be like they

spectacularly. And it might be like they work 80% of the time, but the 20% of the time they don't. They will like blow up a building or something. Uh, pre-built

tools replace the construction of tools on the fly. Because the LLM is running pre-built tools, it doesn't have to make them from scratch every time, which reduces the total number of steps that you have to take to get there. A really

simple analogy for this is imagine if you just gave somebody a recipe versus asking them to invent a new dish every time. Like if I just said, hey, can you

time. Like if I just said, hey, can you make that paella recipe that you've been making me recently? The likelihood that I'm going to get the PA recipe I want is probably a lot higher than if I just have it, you know, go off the cuff every

single time. it will know the flavoring,

single time. it will know the flavoring, the ratio of ingredients I like, the various steps that it takes, how to put the muscles in, I don't know, just tons of stuff. Whereas, you know, every time

of stuff. Whereas, you know, every time it invents this new dish, this new pa of 3.0, obviously, it's just like going off of its own biases and randomness at that particular moment. So, in addition to

particular moment. So, in addition to directives and executions, we also have two essential configuration files. And

it's actually in practice a little more than two, but I just call it two because it's a system prompt and then it's an env. um agents.mmd contain the

env. um agents.mmd contain the instructions injected at the start of every conversation with the orchestrator. Now these are named

orchestrator. Now these are named according to your um ID environment. So

this could be cloudMD, gemini.mmd or it could be whatever the heck it it asks for cursor.mmd whatnot. Um I would just

for cursor.mmd whatnot. Um I would just always have like all of these simultaneously. The reason why is

simultaneously. The reason why is because if you just have all of them simultaneously you can just like move into any new IDE or any new agent or any new model and it'll just like immediately uh understand what you're saying. So in this way you could

saying. So in this way you could theoretically have like you know rate limits for your Gemini model um and then rate limits for your claude model and then rate limits for your open AI model and you just open all three of them in tabs and just have them all work on

things to minimize the probability of you running over anything. Most models

at this point are pretty similar. We've

kind of converged to really really similar accuracy ratings and scores on stuff. So aside from preference and

stuff. So aside from preference and stuff, this is how you keep those costs low. In addition, your env file is where

low. In addition, your env file is where you store all your API keys and then your credentials. Um, what this ends up

your credentials. Um, what this ends up looking like for instance is just using that claude example earlier, uh, if we want AI to do something, we would actually have claude or rather anthropic

API_key and then you just have like the the key itself right over here. Then over here you'd have like open AI API_key.

Then you'd actually store that key over here as well. And you just like dump this. It would be a massive list of just

this. It would be a massive list of just all of like the credentials and keys that you'd ever want. your execution

scripts instead of having to hardcode the key would just say, "Hey, go into ENV and then find it instead." And

there's just like very simple programs that do that sort of thing for you. Just

so we're all on the same page, what agents MD actually does is it acts as your persistent context. You inject this automatically every single time at the beginning of a session, so you just don't ever have to repeat yourself. It

also explains the do framework structure to the orchestrator. So everything that I've done here, we are basically going to turn into an agents.mmd file and then just give to the orchestrator so it understands what is going on. we're

going to give it to our agent and be like, "Hey, make sure to do it this way because it's reliable and because execution scripts are pretty deterministic and so on and so forth."

So, it's really meta, right? Like

everything I'm telling you right now, we're just going to tell to the agent.

We're just going to do it in a very like context compressed way. This will also define the error handling behavior. The

agent does not spiral when something breaks. And then obviously, what's

breaks. And then obviously, what's really cool is you can actually just make your agents.mmd better and better and better. Like I find uh routine edge

and better. Like I find uh routine edge cases that I didn't handle for with my agents MD probably like once a week and then I just like add a line to it and then the next time like my model just doesn't make that mistake. I did not always self anneal for instance I just

realized that huh there's some situations where my model solves the problem itself and then other situations where it comes to me for help why don't I just make it explicit hey man I want you to solve the problem for yourself that is what resulted in the self

annealing concept all right so let's actually go and have AI set up directive orchestration execution for us I'll show you guys the system prompts agents.mmdenv

agents.mmdenv and everything okay so let's actually build our very first real agentic workflow together the first thing you need to do is open up your IDE

In my case, I'll be using Visual Studio Code for this demo. Not because I think it's better than anti-gravity or anything like that, but just because I want to show you guys you could use whatever the heck you want. You know,

it's all interoperable these days.

Anyway, the very first thing we need to do is we need to create a new workspace.

So, I'm going to head over here to the top lefthand corner and then I'm going to say open folder. From here, I'm going to at

open folder. From here, I'm going to at least on a Mac, click the new folder button. Then I'm going to say YouTube

button. Then I'm going to say YouTube workspace. do then going to create. Once

workspace. do then going to create. Once

I'm in it, I'll click open.

Next up, what we have to do is we have to create our system prompt file. I get

a lot more into detail about these later, but for now, what I'll do is I'll open up this file. I'm going to type claude.md.

claude.md.

I'm going to paste in one of the examples that you can get in the top link in the description. So, that is this my system prompt. Then going to save. The next thing I'm going to do,

save. The next thing I'm going to do, I'm assuming you've already downloaded Claude Code. If not, you head over here

Claude Code. If not, you head over here to extensions, type, you know, in this case, Claude Code, but realistically, whatever model you want. Give that

button a click, click install over here.

You're going to need to sign in and all that stuff. But assuming you have your

that stuff. But assuming you have your own key, and assuming you have your own um account set up on at least, you know, a $10 or $20 a month plan, you're good.

I'm then going to go to the top right hand corner here, click this little claude code button, and now I'm just going to move back a bit and start asking it to help me. Now, what I want

to do is I want to build a simple email onboarding flow. Essentially, when

onboarding flow. Essentially, when somebody joins my organization as a client, I want to send them a brief email saying, "Hey, thanks so much for joining. Really looking forward to

joining. Really looking forward to having you." And you know, here's a link

having you." And you know, here's a link to a kickoff call that you can schedule.

This is a super easy and straightforward thing to do. And you can of course set up systems to do this outside of Agentic workflows. I'm just showing you this

workflows. I'm just showing you this because I think it's probably the most straightforward example to show you how to chain together three or four things that I can think of. We'll progressively

design more and more complex workflows.

But for now, what I need to do is I need to talk to this model. I need to have it do things. But if you notice on the

do things. But if you notice on the lefth hand side, I don't actually have like the workspace itself set up. I just

have this claw.md. So the very first thing I'm going to do is down here, I'm just going to go bypass permissions.

Whatever model you're using probably has a bypass permissions mode nowadays. And

I'm I'm just going to say set up my workspace in accordance with claw.md.

I mean, I could have said whatever. I

could have said just set my workspace up or something like that. What it's going to do is it's going to read through cloud.mmd. It's going to understand how

cloud.mmd. It's going to understand how this works and it's going to create a full directory structure based off that.

now. Okay, it's adding a bunch of information web hook.m MDs talking about the deterministic and execution layers and so on and so forth. Now it's going to go through and verify the final

setup. And now it's giving me a brief

setup. And now it's giving me a brief summary. Okay, great. Now that I have

summary. Okay, great. Now that I have this set up, I want to show you guys how easy it is to actually build this workflow. All I'm going to do is I'm

workflow. All I'm going to do is I'm going to give it a very highle natural language instruction of what I want.

Hey, I'd like to build a brief onboarding workflow. Basically, I want

onboarding workflow. Basically, I want to be able to tell you onboard client email@acample.com and then have you send an email to that new client that introduces them to our

company, gives them some background, and then invites them to a kickoff call using a calendar link.

Then going to press enter. You'll notice

that because I'm using my voice, sometimes this text is a little bit misformatted. That's okay. Doesn't need

misformatted. That's okay. Doesn't need

to be perfect. This model is smart enough to understand what's going on.

[snorts] It's going to ask me some questions.

What should I use to send emails? SMTP,

resend, send grid, whatever. What's the

company info? What's the URL? Now, I

need to obviously go and I need to get this information, come back to it. But I

should know that I don't even need to like know for sure. Hopefully, it's

clear. I just want to like send through my own Gmail account. So, I'm just going to say, sorry, I don't know what any of that means. I just want to send a

that means. I just want to send a welcome email from my Gmail account.

And I'm going to provide it my own.com.

For company info, I'll just give you a brief list of bullet points whenever you send the email.

And underneath for the calendar link, just use an example calendar link for now.

Cool. I'm giving it some highle instructions here, and it's going to help and walk both of us through the finishing of this workflow.

The first thing it will do is if we open up our directives folder, it'll build this onboard_client.mmd.

this onboard_client.mmd.

If I go up here, you can see there's now an onboardclient.md

an onboardclient.md with a bunch of highle directives with this information.

Now, you'll see that it's installing dependencies and so on and so forth. It

doesn't fully understand what to do here, but that's okay. Okay, what it's doing next is it's walking us through a one-time setup with our Google information. So, what I'm going to do is

information. So, what I'm going to do is I'm just going to create a new app specific password. Let's just call it

specific password. Let's just call it YouTube example. And then going to go

YouTube example. And then going to go over here. I'm going to paste this in.

over here. I'm going to paste this in.

This is now going to take the app password and actually use it to update the env file.

Says the app password saved. We're all

set. First, I'm going to ask it what does the onboarding email look like.

This looks pretty reasonable. I'm now

going to go through and then edit this template so that we could send what I think is a higher quality template every time. Okay, just spend a few moments

time. Okay, just spend a few moments here putting together this onboarding email. It says, "Hi, name. Thanks for

email. It says, "Hi, name. Thanks for

choosing to work with us. We're excited

to have you on board." Here's what happens next. We hop on a quick kickoff

happens next. We hop on a quick kickoff call to align on goals. You meet the team and get synced with your project manager. From there, we'll map out a

manager. From there, we'll map out a plan tailored to you and finally receive daily updates when the project is complete. Book your kickoff call here.

complete. Book your kickoff call here.

Very straightforward template. I

basically just want this to send every single time. So, it's just going to go

single time. So, it's just going to go and update the directive and presumably the execution to always reflect this information. And then finally, I'm just

information. And then finally, I'm just going to say onboard nick at nickleclick.ai.

nickleclick.ai.

And at the end of it, you could see we now have a really well formatted and simple onboarding email. This whole

workflow only took me a few seconds to put together. Hopefully you guys see the

put together. Hopefully you guys see the power for nontechnical people, even people that don't understand what app keys are or env tokens or anything like that to actually meaningfully integrate with software that we're using. All

right, so now that we've seen a little bit about how to set things up, how do you actually go and create like really good directives? Well, you need four

good directives? Well, you need four things. You need a clear objective

things. You need a clear objective statement, aka what this directive does.

You need some form of input specification, so what data does the agent need to actually get started? You

need a step-by-step process, which is a sequence of operations, scripts, and expected outputs in natural language.

And then you also need a definition of done. So that's quality criteria. How do

done. So that's quality criteria. How do

you know that the agent has actually succeeded? It needs to be able to grade

succeeded? It needs to be able to grade itself based on its output. For

instance, like you'll know you're successful when you have a Google Sheet link URL with at least 100 rows filled in, something like that. You should

also, of course, include edge cases. So

any known exceptions, if there are quirks with an API, if there are things that come out as error codes that should not come out as error codes, if they have common failure modes, you should actually include all of that in the directive. Uh you should also describe

directive. Uh you should also describe fallback behavior like, hey, if the Apollo scraper we're using fails, try the instantly lead uh enrichment tool instead. And unlike old automations, you

instead. And unlike old automations, you don't have to like build this massive complicated error handling function.

Unlike naden or make.com or any of these visual coding tools, you don't actually have to go through and like create these error handling flows. You you just add one line and you're like, "Hey, if this happens, then do this." And it's so much simpler. It also includes some sort of

simpler. It also includes some sort of instructions saying what to return if everything fails gracefully. Like a lot of um systems do fail really gracefully.

They don't even really tell you that they fail. If you expect a 100 leads to

they fail. If you expect a 100 leads to pop up or 100 YouTube videos to come from your YouTube video scraper or whatever, you know, like one will uh it'll technically have done so correctly, but you know, nothing will have errored out. So there's no real

built-in way for the model to know unless you make it hyper explicit what happens if things go to plan. That's why

you need a definition of done. And then

you also need something to say like, hey, if this does fail gracefully, if we're under 100 records, let's say if that's our minimum, um, rerun it over and over and over again with wider filters until we get to 100. don't

return this to the user until we have at least whatever he put in. All right, for my next system, I basically want to build a CRM manager for ClickUp. ClickUp

is one of many CRM tools that you could use. I really like it because I think

use. I really like it because I think it's simple, it's fast, and then it includes a bunch of functionality that weaves together different tools like it has built-in messaging. Um, it obviously

has documents. I could store my

has documents. I could store my knowledge bases in here and so on and so forth. But I want you to know the

forth. But I want you to know the specific tool doesn't really matter at all. You can build this sort of thing

all. You can build this sort of thing out in basically any CRM so long as it has the ability to connect via API and MCP and that sort of stuff. So basically

what I have here is I have a really simple CRM setup called template creative agency. I'm going to pretend

creative agency. I'm going to pretend I'm a creative agency here. You can see there's a sales pipeline. Inside of the sales pipeline, I have people like Nick Sarif and Peter Jackson and Peter Smith,

Peter Jackson, Sally Lozen, her last name's Lozen, Koth Arllan, and so on and so forth. Basically stored um on this

so forth. Basically stored um on this cool little table. And what happens like any CRM is people come in through this intake stage like Bast Sarif and then um essentially they

are assigned a status. Then as they are updated, I move them to things like meeting booked and then proposal sent and close lost or closed one. Uh

depending on whether or not they accept the contract. However, I don't really

the contract. However, I don't really want to interact with it manually anymore. I think it'd be really cool if

anymore. I think it'd be really cool if I could weave this into other workflows like our onboarding workflow that we made earlier. So, how do I do this? I'm

made earlier. So, how do I do this? I'm

just going to ask it to build this for me. I'd like you to be a wrapper around

me. I'd like you to be a wrapper around my ClickUp CRM. I want to be able to ask you to do anything inside of ClickUp, then have you automate the process for me. This will also allow us to connect

me. This will also allow us to connect to other workflows that we build around my agency. All of the CRM information is

my agency. All of the CRM information is stored inside of the and let me head back over here and let's see what it's called. Template creative

agency space.

Give me three ways we could do this.

Okay, it's now going to create me everything that I need. The first option is a direct script library. It'll create

a set of execution scripts for common ClickUp operations with a master directive that routes requests. That's

pretty cool. I would have to invoke it every time. Then there's some sort of

every time. Then there's some sort of conversational idea. Then there's also a

conversational idea. Then there's also a web hook bridge. I like the idea of number one. I want to see if there's a

number one. I want to see if there's a simpler way to do this. Is there any simpler way to do this? Like is there an MCP or just anything that wouldn't require us building a specific step for every request?

It's going to go through and reason first. So, it's going to check to see

first. So, it's going to check to see whether or not there is anything out there that would allow us to do this more easily. What it's doing here is

more easily. What it's doing here is it's using a web search sub agent.

Believe it or not, we're going to talk a lot more about sub agents later, but sub aents have pros and cons. When you use sub agents, things typically take a lot longer to finish, but the pro is you isolate the context. And um what that

means is you just don't need to worry about inserting all this stuff into the main flow. Cool. So, this is sort of

main flow. Cool. So, this is sort of what I wanted to do initially. kind of

cheating here, but I know MCP is just a simple and easy way that I could build something like this. And I'll show you guys more about this later. But as we see here, there's an official and then there's also a nonofficial one. What I'm

going to do is I'll say, "Hey, let's do the official. How do I get my API

the official. How do I get my API token?"

token?" Okay, it's giving me some instructions here. So, I'm going to head over here. I

here. So, I'm going to head over here. I

just need to regenerate this API token.

So, first I have to put my password in.

Just bear with me.

Next, I'm going to copy this token over.

And then I'm just going to head over here and paste it. One thing that you'll find that models do pretty often is, and I don't know if this is because they want to conserve on their own token usage or something, instead of just

doing the thing for you, often times they will say, "Hey, I'm going to find information on how you can do the thing." What is super super powerful is

thing." What is super super powerful is just to say, "Okay, great. Do it. Looks

like we need some more information here." So, we need to go to ClickUp in

here." So, we need to go to ClickUp in our browser, look at the URL, and then get the team ID.

I see it right over there. Let me just paste it in. Okay. And now all I need to do is just restart Claude Code. So, let

me click this little X, head over here again. I double tap on the page in order

again. I double tap on the page in order to create that new file.

Okay. And now I have an MCP. So, let me just give that a click. When you type back SLMCP, you can now see the MCP servers you have. and I'll say, "Awesome. Can you create a new record

"Awesome. Can you create a new record for me?"

for me?" So, because this is an MCP, it's like a general solution. It's not a specific

general solution. It's not a specific solution. We need to insert some

solution. We need to insert some information about this. So, what type of record? Where should it go? I'd like you

record? Where should it go? I'd like you to act essentially as my ClickUp wrapper.

Keep in mind that this is a new instance. So, I need to provide it some

instance. So, I need to provide it some highle instructions. again.

highle instructions. again.

So all conversations are going to be related to that space.

I'd like you to store this information somewhere. That way the next time I ask

somewhere. That way the next time I ask you to do this, you'll do it the first time.

Go and learn about the space first.

New lead, Peter Rockwell.

Okay. And now what it's doing when I say new lead Peter Rockwell, it is creating a lead in that space. Pretty

straightforward. Let's go check and make sure that it's good. And as you can see here, we now have a meeting URL link as well as a status of meeting booked.

Hopefully, it's clear. I could talk all day about this and give this all of the information that I want in order to have it, you know, manage my uh ClickUp CRM for me. So, that's one way to do so with

for me. So, that's one way to do so with an MCP, which is really straightforward and it's super simple. Let me show you another way we can do this just using like the ClickUp API instead. So I'm

just going to exit out of this and then create a new cloud code instance. I'm

going to say, hey, can you uninstall the ClickUp MCP and remove anything in our environment that has to do with ClickUp?

I'm doing a demo.

Then going to bypass permissions. So I

just don't have to worry about it. It's

just going to do it all for me. Hey, I'd

like you to build a series of ClickUp directives so that I could automate the process of adding records, updating them, and so on and so forth. I

basically want you to act as my ClickUp wrapper. I want to do this via API

wrapper. I want to do this via API calls. We previously tried MCP, but I'm

calls. We previously tried MCP, but I'm doing a demo and I just want to do this via API instead. Okay, it's now building this out systematically. So, it's going to start by building a base ClickUp API

client. It's then going to create CRUD

client. It's then going to create CRUD scripts to create, get, update, delete.

So, I'm going to create directives for each operation. Then, finally, it's

each operation. Then, finally, it's going to update my env template. It says

with a ClickUp API key placeholder. Um,

I did just remove it, so I'm going to have to add that in again most likely.

What's really cool is I know nothing about any of this stuff, and it's just doing it all completely automatically right now. It's writing all the

right now. It's writing all the directives, all the executions, literally everything that I need. And

so, the reason why I'm showing you multiple different ways to do things is because there almost always are multiple different ways to do things. And with AI and agentic workflow builders like this, it's not necessarily that one approach

is better than the other. Sometimes I'll

try an approach and for whatever reason, whether the API isn't cooperating or it's just not very logistically reasonable, I will abandon it halfway and then just do another one. There's no

reason why I have to commit to something that isn't working. And I can always change things. Nowadays, the barrier

change things. Nowadays, the barrier isn't really whether or not it's possible. The barrier is basically just,

possible. The barrier is basically just, hey, how much time do I want to spend guiding or steering the ship in order to get this thing done for me. Okay, it's

now going through adding all the information that we need. I gave it the API key as you guys could see above.

It's going to essentially loop over as many times as it takes because of what is in the cloud MD. Eventually, it will um, you know, solve its own problems through a process called self annealing.

And then we'll be able to do things like create tasks, delete them, update them, and so on and so forth. So, it's just running through and testing all of the various scripts that it put together.

The creating of a task, the deleting, the cleaning up, so on and so forth. So,

let me give it some more highle instructions just to tell it I really wanted to work within that template creative agency uh uh space. I'd like

you to do all of your tasks solely in the template creative agency space.

Update everything to reflect this. Then

whatever you need to in order to reflect this. Then create a new lead called Nick

this. Then create a new lead called Nick Sar.

Cool. Looks like it already knows what it needs to do. So now it's going to create the lead. And you can see it's even given me a link to the lead so that I can pull it up and see it for myself, which is pretty cool. Awesome. Why don't

we see if this has access to some other fields? Do you have access to custom

fields? Do you have access to custom fields? Okay. First, it's going to see

fields? Okay. First, it's going to see the custom fields in this list. It's

then going to see if we could set the appropriate one. Nice. That's pretty

appropriate one. Nice. That's pretty

cool. So, whereas the other one could not set custom fields, um, this one can set custom fields, which is pretty sweet. As you guys could see, sometimes

sweet. As you guys could see, sometimes there's pros or cons to different approaches. This one was really awesome.

approaches. This one was really awesome.

So, to be honest, I now basically have like a whole CRM manager. Great. Delete

the record. That was just for demo.

I'd personally say having some sort of CRM wrapper like this now with the power of current technology is like a non-negotiable. This thing just makes

non-negotiable. This thing just makes our lives so much easier. And what's

really cool is we could weave flows in together. So when somebody becomes a new

together. So when somebody becomes a new client, for instance, we could then automatically send that onboarding flow, then maybe even reflect that by adding a comment or something like this. These

things will supercharge any CRM very very quickly. Okay, I want to talk a

very quickly. Okay, I want to talk a little bit about cloud skills. Um, this

is really similar to DO like we just ted chatted about, but it is specific to the cloud family of models. So you can't use the same cloud skills structure that I'm about to show you in like Gemini or

OpenAI or or GPT 5.2 or whatever. It's

very very specific to Claude. That said,

you know, all of these model families now have their own versions of this. So

I wanted to cover probably like the most popular one just so we're all on the same page. I care a lot about

same page. I care a lot about interpretability and modularity. So I

want to be able to use the same workflow setup in, you know, model A versus model B versus model C. Um cloud skills are obviously hyperspecific to anthropics model. Now, this was their attempt to

model. Now, this was their attempt to standardize Agentic workflows into reusable portable packages. And just

like DO, it's a folder structure. It

contains instructions, scripts, prompts, and resources that Claude will load every time you call something. So, it's

just a slightly different folder structure that includes a file called a skill.md. And I'm going to run you

skill.md. And I'm going to run you through that in a moment. The way that skills work in a nutshell is just ignore the lefth hand side of this graph cuz I think this is a little more complicated than we probably need right now. But

basically, you have your agent and your agent organizes things into these skills folders. And so, it's a skills folders

folders. And so, it's a skills folders slash whatever the the skill um that you want it to to know is. So, in this case, there's a skill called big query. Then,

you'll see there's a capital skill.md

with a data sources.md, a rules.md. Over

here, there's an NDA review, which includes a skill.md. The skill.md is

just your directive, right? And you'll

notice that because it's in markdown.

Everything else here is entirely up to you. And so it's sort of like a loose

you. And so it's sort of like a loose framework right now where people are just dumping in whatever the heck they want the agent to have access to. It's

also just a form to uh a way that you can modularize things. And basically

what you'll do is you'll just have like a big list a big directory called skills. Then underneath that you will

skills. Then underneath that you will have things like you know hey uh let's do big query. Let's do one called docx.

Let's do one called pdf. Let's do one called I don't know scrape leads. And

each of these are going to be folders um themselves. So very similar to do. just

themselves. So very similar to do. just

takes a slightly different approach.

Instead of having like the executables and like the scripts and stuff like that stored in other folders like an execution scripts folder, um it just stores it all in the exact same one. The

way I treat things is as an instruction manual that Claude reads first. There's

one slight difference between the way that the markdown file is written in so far that um it uses what's called YAML front matter. YAML just stands for yet

front matter. YAML just stands for yet another markup language by the way, which is really funny. There's like a million different ways to do this.

Basically what this is is this is like a short I don't know 100 character 200 character description of what the skill does. Um so as opposed to with you know

does. Um so as opposed to with you know the directive orchestration execution framework you know I don't usually use YAML I just like have it whip it up although YAML I think would be an improvement. Um you know instead of just

improvement. Um you know instead of just naming something really descriptively what this does is actually just provides some context. Hey this script does X Y

some context. Hey this script does X Y and Z. Hey this uh skill asks for this

and Z. Hey this uh skill asks for this thing. And then you know what'll happen

thing. And then you know what'll happen is upon runtime claude will load the skill based on whatever task you're asking to perform just based off of the YAML front matter which just means it saves a lot of tokens. It doesn't have

to read the whole thing. So this is just a small block of metadata at the top of the file. There's like a name field,

the file. There's like a name field, there's a description field, and then there's a purpose field and I'll show you an actual concrete example in a second. And then it's like kind of

second. And then it's like kind of separated like this. And then when the agent loads the file um to actually like search through your skills, you say, "Hey, you know, I want you to scrape some leads." It'll actually just load

some leads." It'll actually just load this. So, it's way way shorter. Small

this. So, it's way way shorter. Small

metadata allows it to, you know, only load a few hundred characters at a time as opposed to big chunks. It allows it to understand what the skill does without reading the whole thing. Now,

there's also a big library of pre-built skills right now for common tasks, mostly relating to documents. Um, and

these are just skills that have been like hyper optimized over the course of tens of thousands of runs. You can think of them as execution scripts and directives that are just really, really, really self- annealed and they're just really, really powerful. So, we can do

PDF creation, do word documents easily, Excel spreadsheets, PowerPoint presentations. The quality is

presentations. The quality is surprisingly good. And because so many

surprisingly good. And because so many people have run these things because they've optimized the hell out of it, they tend to execute super quickly and then they also tend to be like pretty reliable. All right, let me show you

reliable. All right, let me show you some cloud skills in action. Let's talk

about how to build things in cloud skills format instead of do format. I

want you guys to see it's more or less the same thing. This is just highly cloudspecific. So I have a simple task

cloudspecific. So I have a simple task in front of me here. I want to create a new cloud skill called generate- report.

And I want this to build a weekly weather report with publicly available information from some API. I just

Googled weather API. Pasted this in there. I don't even know if it's going

there. I don't even know if it's going to work, but we'll figure it out alongside each other. I also said I want a Canada specific just because I'm Canadian. I.e. this report should be all

Canadian. I.e. this report should be all about the weather across Canada. Now the

last thing I need is I need some sort of template. So I'm just going to go and

template. So I'm just going to go and I'm going to see if I could download a free report template.

Let's see. It's going to open up a bunch of tabs. What do we got here? 2035

of tabs. What do we got here? 2035

annual report. That looks ridiculous.

[gasps] Um, okay. This one looks pretty cool. Can I just download this whole

cool. Can I just download this whole thing? Okay. Anyway, I'm just going to

thing? Okay. Anyway, I'm just going to go over to Canva here. And then I'm just going to download this as uh what are we going to do? PDF. Let's just do PDF.

We'll do all pages. I'll click download.

Once I have this, I'm then going to provide this file to Cloud Code.

I have a template file in I'll just drag this over tot and I'll just call it uh orange and black modern annual report

that I want you to use. Go. Awesome.

So it's then going to pull that file and then it's going to because it knows how to generate cloud skills sort of natively go through the whole process.

Okay. It's going through and then creating the skill directory structure.

Uh it's then writing the skill MD with instructions. It's doing a fair amount

instructions. It's doing a fair amount of stuff. So I'm just head over to here

of stuff. So I'm just head over to here to skills and then I'll see where this would be. Okay. Generate report right

would be. Okay. Generate report right over here.

Okay. And inside there's a skill.md.

Then there's also a scripts folder. This

is where we're going to insert the scripts. It's now going to go fetch a

scripts. It's now going to go fetch a bunch of weather data. The cool thing about Claude skills is there's this little YAML front matter. It's called Y

A ML and then front matter is just everything that's between these three dashes. And here we have the name, a

dashes. And here we have the name, a brief description, and then also some allowed tools, which is really cool. So

you can get very granular with how you give your agent access to these workflows. And then what's cool is they

workflows. And then what's cool is they only actually um load this into context before deciding on which skill to use.

So that way you save a fair amount of tokens because it doesn't have to like read every single file, right? Okay, I'm

then going to get an API key payment.

Okay, it looks like open weather map is not free despite it saying that it is free. I need to sign up and then enter

free. I need to sign up and then enter some payment information. So don't use that. U what I've done here is I've just

that. U what I've done here is I've just said, hey, it's not free. So find a source that is free. So now it's going to go and it's going to find me something that is realistically. Looks

like it found an alternative source called open- so it's just going to rewrite it with that information in mind. Now that it's done a little bit of

mind. Now that it's done a little bit of work, what it's doing is just testing this skill. Okay, looks like it has now

this skill. Okay, looks like it has now generated me a file. Let's just say open PDF.

Cool. And now we have it. So, Canada

weekly weather 2025, table of contents, national overview, weather highlights, west coast prairie, central Canada. So,

you guys can see it is very, very easy to create a template using a PDF. Just

drag and drop that puppy in. And then

boom, you now have native intelligence that is capable of interacting with tools like this to generate honestly a very clean and very sexy proposal

document. Pretty straightforward, huh?

document. Pretty straightforward, huh?

So, I mean like this is just one of many asset generation workflows that you could do. Um, hopefully you guys see you

could do. Um, hopefully you guys see you could now like generate proposals in a flash. You could generate any PDF in a

flash. You could generate any PDF in a flash, customized assets or slide decks or whatever the heck you want. um it

really only takes a data source, the template itself and then you waiting around 5 minutes or so as it self anneals and then generates. Let's talk a little bit about model context protocol.

So this is essentially a USB for AI. The

idea is that it is a universal adapter that lets any assistant whatever model family connect to any data source interoperably. Now when I say USB um a

interoperably. Now when I say USB um a while back you had so many different types of USBs. You had like a USB 1, you had a USB 2, you had a USBA,

a USB. I don't actually know if this

a USB. I don't actually know if this one's real, but you had like hundreds of different types of USB configurations, basically hundreds of different cables.

And then um eventually somebody made a USBC and they realized that this is just like the superior format and then they made either regulations depending on where you live or just heavily incentivized the market to just produce

USBC's because USBC's if we all just standardize to one adapter means that like I could just buy any device and then I could just slot that into any other device and it would just work. I

don't have to carry around 20 different types of cables. I just know that this sort of adapter function is just going to make everything work and uh it's going to be super easy and more convenient. That's essentially just what

convenient. That's essentially just what MCP is. We're just doing that for our AI

MCP is. We're just doing that for our AI agents. This was introduced by Enthropic

agents. This was introduced by Enthropic back in November 2024. It's a

standardized way for AI assistants to connect to any external data and tools.

And this isn't just Claude to be clear.

Um they just made this for everybody. So

this works with, you know, like the OpenAI family of models. This works with the Gemini family models. The whole idea is it just eliminates the need for those custom USBs for every connection. Just a

universal translator. It's like imagine there was some language that you know anybody on planet earth could speak and you know when you meet a person who doesn't speak the other language that you speak you just all use the same language it's espironto or whatever but

it's for um you know AI agents that's basically it there are two main pieces to understand there are MCP clients on one hand and then there are MCP servers on the other hand so you know these

clients are basically our AI apps so these are our things like anti-gravity these are our VS codes and these are also are things like uh I don't know

clawed desktop these are things like you know chat GPT and basically what these are is you remember how earlier in the course I said that chats are just like the

interfaces that agents are using right now they're sort of borrowing them because we don't have a better interface well that's essentially all a client is it's just an interface so the client is the tool that houses the agent right

it's the shell around it and what this does is it connects to servers and these servers are based on specific tools. So

for instance, there is an Appify MCP server. In addition to an Appify MCP,

server. In addition to an Appify MCP, there's like an Apollo MCP.

There is a I don't know Google Drive MCP. There's a Sheets MCP.

MCP. There's a Sheets MCP.

And the point is whatever client you're using at the time, so maybe anti-gravity in this case, just calls the specific MCP whose configuration files you include in your workspace. So in

anti-gravity I might have you know an appy mcp drive mcp and sheets mcp and then what I do is I just say hey can you you know look at my drive for whatever file and then turn that into a big CSV

and then can you feed that CSV into appy and you know assuming that these three MCPS are good because there's a lot of quality variance in MCPS right now um it can actually do what you want it to do you can also store highle directives

that explain how to chain these together even more in-depthly and more reliably and then the MCPS are essentially ally just your execution scripts. Right now

there are three main ways that MCP servers communicate with MCP clients.

There are resources which are structured data like documents, code, database records and so on and so forth. Then

there are tools which are functions that your agent can call. These are analogous to execution scripts on our end. And

then there are prompts which are basically just like system prompts for specific things. They guide how the

specific things. They guide how the model should interact with specific server. Hey, you should use this uh

server. Hey, you should use this uh execution script when you want to do this function. Hey, you should call this

this function. Hey, you should call this resource. You shouldn't pagionate all of

resource. You shouldn't pagionate all of them. You should only call the first 50

them. You should only call the first 50 lines. This just is like highle

lines. This just is like highle instructions that help the model do things more reliably. The whole idea of MCP is really just to make the entire

internet web accessible to our agents.

Every tool gets its own MCP server. What

your agent does is it only loads the ones that you absolutely need. This

means you never have to build custom tools from scratch. though I think it is pretty easy and pretty great to get yourself that functionality and you get to give your agent breadth out of the

box with very little effort on your part. In addition, you can also build

part. In addition, you can also build your own custom MCP servers. The value

here is not only are you going to have your own agent use it, of course, you could share it with other people. And by

sharing it with other people, you can either ask them to either pay you or something to build the MCP server or, you know, let's say you're an API that builds an MCP server around your function, you can make things more

accessible and then increase your company revenues. So, it's very very

company revenues. So, it's very very easy to build these things with AI assistance. When MCP came out, it was

assistance. When MCP came out, it was very difficult, but now it's super easy.

I actually built one in 10 minutes the other day. I never read any MCP

other day. I never read any MCP documentation and it did something really cool for me, which I may talk about in a future video. This means you can create specialized tools for specific workflow needs anytime that you want. And then if other people within,

want. And then if other people within, let's say, your organization want to use this or whatever, you just share the MCP server. Uh it's always going to work the

server. Uh it's always going to work the same out of the box because it's the same server now. There are multiple people that can iterate and improve it, not just you. So the main question I get at this point is why don't we just use

MCP for everything? Sounds great, right?

Maybe we should. Well, the reason why is because MCP takes a lot of tokens. And

the more context a model deals with, the dumber it gets. If you fed in the exact same prompt to two models, except prompt one said what you wanted it to say in, I

don't know, 10 words, and prompt two said the exact same thing, but it wrote it really inefficiently and made it really, really, really, really long. The

model would almost always perform better here. Maybe this would have a 99%

here. Maybe this would have a 99% success rate, whereas this would have an 85% success rate or something. What I

mean to say is there's a very strong relationship between token count in context and then performance and this is improving as models get more

intelligent but essentially performance as tokens go longer and longer and longer in the context almost always necessarily will decline. It's not

exactly like this because usually when you provide more context, it's actually a little like bump until you get to a certain point and then it starts declining because it's like here we didn't really provide enough information for the model to know what's going on.

Whereas here, maybe we provided a bunch of examples or whatever, which is why it does better. But inevitably, the longer

does better. But inevitably, the longer that you um add a bunch of information that isn't relevant to your task, the more tokens that you have in that prompt, the crappier your outputs are going to be. And the issue with MCP is

it actually loads pretty much all of its available functions into your agents context window. Now there are some

context window. Now there are some developments that are fixing this. These

are like at runtime MCP servers where um your AI just makes an intelligent determination about which MCP servers to load and stuff like this. But MCP as a framework is still pretty new and a lot

of the MCP servers out there are pretty crappy. So regardless, we're loading a

crappy. So regardless, we're loading a ton of tokens into a context window.

Every function will have a name. They'll

have a description. There'll also be a schema. This will be a few hundred

schema. This will be a few hundred tokens usually. And what that means is

tokens usually. And what that means is if you connect five servers and every server has 10 tools. So like if you connected to the drive server and then the drive server had I don't know get

file. Okay, this is one of the functions

file. Okay, this is one of the functions or execution scripts. I don't know it has read file. It has share file and so on and so forth. Right? Every single one

of these would have a name, description, schema name description schema name description, schema. We're getting

description, schema. We're getting really high up in the tokens already, right? If you have 300 tokens per

right? If you have 300 tokens per definition, even five servers with 10 tools each means 15,000 tokens. And

that's before you've done anything. So,

it's like you're already on that graph that I showed you guys earlier, you know, if this is your performance when your token count is really low, you're probably already like down over here.

You have some loss in percentage, which is just ultimately not efficient for business purposes. And you're probably

business purposes. And you're probably wondering like, well, Nick, how bad is it really? What I want to do here is I

it really? What I want to do here is I just want to show you a quick example on some older models. And obviously, keep in mind that in order for us to do research on things, they necessarily have had to been out for a while. Um,

but older models and how their accuracy on tasks scales with the number of documents in the input context. So

number of documents in the input context is basically equivalent to tokens in this way. So I don't know just call the

this way. So I don't know just call the the number you know one document in this case is probably equal to like 1,000 tokens or something like that. So as we see here at the very beginning when the

context is quite small and we only have five documents in the input context. You

know this um model here GBT3.5 turbo 16k performs very well. It performs maybe somewhere around 75% or so. The second

we double that accuracy is now to slightly over 65%. We double that again and now it's almost down to 60%. And

then if we 1.5x that, now it's like somewhere between 50 and 60%. So

performance here really drops off extraordinarily quickly. And so to make

extraordinarily quickly. And so to make a long story short, the reason why this happens is really similar to what I showed you guys earlier on in a demo where like if you just have one token and then you have three potential tokens

here, you know, basically every single time you are forced to compute like the next token in a sequence, the total variance of the things that you could be generating just kind of go through the

roof. And so that's that's what's

roof. And so that's that's what's occurring here. In order for you know

occurring here. In order for you know this model to somehow know that the right answer is over here obviously it needs to somehow maintain some degree of accuracy and coherence. And that just becomes less and less and less and less

likely uh the more tokens that you generate. Now obviously it doesn't

generate. Now obviously it doesn't happen this quickly. It happens over the course of many thousands of tokens nowadays. But back in the day when I was

nowadays. But back in the day when I was working with um just the base vanilla GPT2 the output quality was super sensitive to the number of tokens the input prompt. Like if you added an

input prompt. Like if you added an additional five tokens and those tokens were not very high quality tokens, they didn't really add a lot of value. Like

accuracy would plunge off a cliff. Screw

documents here. Pretend like we're just talking number of tokens. At five it might be 70, but at 10 it would literally jump down and so on and so forth. So anytime you try and get to any

forth. So anytime you try and get to any reasonable answer, you're already working super super below um you know total accuracy limits. Here's another

example of memory retrieval accuracy. So

basically if there is some token buried super deep in the context of you know a model that's doing 2 million48,000 context window um it forgets it you know

when there are only 30,000 tokens in the prompt or whatever it sees and finds it like 100% of the time but if there are I don't know 2 million it'll actually forget about that a massive chunk of the time and it won't even realize like that

there is a token within its context.

basically its ability to retrieve things from its memory, intermediate memory in this case, which is just the chat and the prompt, um, plummets. Finally, you

could see here a needle in the haystack sort of example. Um, very similar to what we were talking about earlier, but basically as the number of tokens goes up, you see a massive decrease in just the model's ability to meaningfully keep

track of things. And this is just sort of the way that intelligence works, right? The more things we're trying to

right? The more things we're trying to juggle and keep in our head simultaneously, the higher the likelihood that we're going to forget any one of them. So, as a demonstrative example, let's say I wanted my agent to write me an absolutely beautiful poem

all about the meaning of life and our place in the universe. I say, "I'm a big fan of MayaangAngelou and Pablo Nuto is wonderful as well. Please make this um short but also punchy and very

beautiful." If you think about it

beautiful." If you think about it logically, like this prompt right here is a certain number of tokens and I can count that here. I'm using a service called wordcounter.net. It doesn't count

called wordcounter.net. It doesn't count tokens, it counts words. But if you want the number of tokens, you basically just grab the number of words, then you multiply it by, you know, uh, 1 divid 0.7 approximately. If I do that math,

0.7 approximately. If I do that math, this is somewhere on the order of like 67 tokens. But I want you to look

67 tokens. But I want you to look really, really closely at what I just wrote here. Are all of these words

wrote here. Are all of these words required in order to get the model to do something for us? Like what is the information density of this sentence?

Hello. Is that required? Probably not,

right? I could probably realistically remove that. could. It's kind of a long

remove that. could. It's kind of a long way to say can. Can can you is kind of a long way to just tell it to write something. So, write me an absolutely

something. So, write me an absolutely beautiful do I need that? No. Write me a beautiful poem all about no about the

meaning of life and our place in the universe. I say

universe. I say emulate Maya Angelou Pablo Naruda.

Short punchy and I don't actually need to say very beautiful because I just said so earlier up here. Now, if you compare what I just

up here. Now, if you compare what I just wrote um initially at 47 words to what I wrote here at 22 words, notice how I basically said the exact same thing I did in the first prompt just in terms of

the actual like pure information density. I just did it in less than half

density. I just did it in less than half of the words. So now instead of 67 tokens, this is probably somewhere right around like, you know, 28 tokens or something like that. What that means, walking back to our example, is you can

realistically significantly improve the ultimate quality of an output just by refactoring the sentences that you feed into a prompt. Instead of hello, could you write me an absolutely beautiful poem all about the meaning of life or

whatever, I could create a new prompt instance and then I could just say the exact same thing. And instead of me doing this on, you know, two lines or something like that, I could do this on one line. And although it is very

one line. And although it is very difficult to determine the quality of a poem quantitatively what is occurring statistically is the quality of this poem over here will be better than the

quality of this poem over here. The

reason why is I just wrote it in a shorter sort of punchier way. So as

opposed to if you think about this graph um you know quality and then the prompt length as opposed to me being somewhere over here like in this example realistically

this example I'm probably somewhere over here right so the reason I'm showing you this is because this is exactly what models are actually doing under the hood instead of writing in in like laborious long sort of ways what they are doing is

they're actually compacting the words that you are saying into as high an information density summary of your prompt as humanly possible. And they

have a couple of strategies to do this.

I don't know if you guys have seen like reasoning tokens, but the way that reasoning occurs here is it's actually done like a very high information density way. They actually specifically

density way. They actually specifically have trained the model to write in a way that is shorter on tokens as opposed to longer. If you look at other models out

longer. If you look at other models out there like GPTOSS 20 bill for instance or maybe 120 bill, um these are open source models that OpenAI released a little while ago. You'll notice when you expand the reasoning tokens a very

peculiar thing. It writes super short.

peculiar thing. It writes super short.

It says need to define X but also Y but maybe Z. And you're like what the heck's

maybe Z. And you're like what the heck's going on? This is like an alien really

going on? This is like an alien really short form way of writing. Well, the

reason why it's writing that way is because it's just much higher information density. And the higher

information density. And the higher theformational content in your prompt per token, the ultimate better response you are going to get. Another strategy

that models will use is they will compact. Okay? And what I mean by this

compact. Okay? And what I mean by this is basically every time you feed in any prompt to a model, what it's also doing is it's going back and feeding in every message that you and it have ever sent to each other in the same chain. So what

compaction is is it basically is just you take the entire history of your prompt and then you just summarize it.

Summarize everything we've talked about so far. So now I'm just going to have it

so far. So now I'm just going to have it summarize it all into a very succinct message. And then the way the compaction

message. And then the way the compaction works is once we hit a certain token amount which uh could be you know 50% of the total number of tokens allotted or whatever this summary is then fed into the next instance of the model and so

now you know a future instance of in this case claude code would have access to more or less the full summary. Sure

we'll miss some details but a lot of those details aren't really that consequential or important anyway. Think

of how many fewer tokens this is than literally my entire conversation history from start to finish. Another big issue is when your agent calls an MCP tool directly, the entire response goes into the context. So if I were wanted to pull

the context. So if I were wanted to pull a document from Google Drive, for instance, I would actually then have to store the entire thing in my context, at least the way models are right now. If I

wanted to query a Google sheet for like 10 rows or something, let's say all 10 rows had like 20 columns each. Well, now

I have 200 additional cells within my context. Meaning that your agent can hit

context. Meaning that your agent can hit the context ceiling really fast. they

can burn a ton of money and so on and so forth when you use generalized MCP tools, not tools that you build yourself, but ones that other people build for you without really optimizing the process.

Last thing I'm going to note on this is not all MCP servers are created equal. A

lot of servers are rushed to market to capitalize on the hype. I know a couple just off the top of my head that are just super poor. They don't return like any good error codes. They don't even interact with the APIs correctly and tons of people are unfortunately

struggling because of that. Um, some

good examples are perplexities and NAND servers. Uh, but some really bad

servers. Uh, but some really bad examples of this, too. I'm not going to name the names, but some are a complete joke. In general, you will know when you

joke. In general, you will know when you start interacting with an MCP server.

Just going to flag a bunch of errors.

Your model's just going to be dumb as hell. You could tell pretty quick. All

hell. You could tell pretty quick. All

right, so let me show you how easy it is to connect the Google Drive MCP server.

We've already done a little bit of MCP.

I've obviously wanted to tease that throughout the course to keep you guys um interested and engaged, but this time I'm actually going to do a full comprehensive walkthrough on how to do it. We're going to connect this to our

it. We're going to connect this to our agent, and then we're going to use it to perform a really simple operation. I

just want you to notice how how seamless the integration is. Once it's set up, I don't actually have to even like set up the directive or the script or anything.

I can just like uh communicate with it in plain language and it can go in and call the appropriate tools for me. Let's

talk MCPs. Now, as I've talked about, model context protocol servers differ in their quality. Some were made pretty

their quality. Some were made pretty hastily, others were made very um carefully and are very high quality. But

because of this, you do have to be a little bit careful and be open to doing some trial and error when it comes to adding your own MCPs. Regardless, I'm

going to show you guys how simple and easy it is to do. First of all, there are tools and websites out there like mcpmarket.com and mcpservers.org whose sole job it is to basically

categorize and then list all of the good MCP features out there. So, as you can see, there's an MCP for Trigger Dev, MCP for OpenSpec, Fast API, Pipe Dream, PAL,

and these on these tools anyway are basically rated uh based off of their quality. So, the higher up the better,

quality. So, the higher up the better, right? So, if you want the ability to

right? So, if you want the ability to automate browser interactions for large language models using Playright, this is the MCP for you. You know, if you want Chrome DevTools, this is the MCP model for you. If you want to automate, I

for you. If you want to automate, I don't know, Sereno specifically, then this is the one for you, and so on and so on and so forth. What I want to do in this video is show you just how easy it is to set one up. Um, you guys have already seen me do this for ClickUp,

although that wasn't the point of the tutorial. What I'm going to do in this

tutorial. What I'm going to do in this demo is just be a lot more specific about it. So, simplest and easiest way

about it. So, simplest and easiest way to get up and running with an MCP is just to ask your agent. So, I'm just going to say, hey, I want to set up a Gmail MCP so that I can send emails on

demand from my email address. And then

I'm going to give it some details just that it knows that, you know, this is like a Google Workspace sort of address.

And let's see what it does. First, it's

going to look and see whether or not there's some email MCP already. It's

probably not going to find it. It really

does help to open up these thinking modules. So now it's going to say, "Hey,

modules. So now it's going to say, "Hey, you know, I see you've already set up an SMTP email for this email address, but instead here are two approaches. First,

you can do quick SMTP. Second, you can do the Gmail MCP." So obviously, I want to do Gmail MCP. Let's do the Gmail MCP.

I want you to do everything you can for me. Typically, models will give you

me. Typically, models will give you instructions and stuff like this, but it's much better just to have them do it all for you. So, anytime you don't really know what to do or it's laborious or involved, just see how much the model can do for you. And that's what it is

currently doing. Okay, cool. And this

currently doing. Okay, cool. And this

actually ended up finding a previous OOTH instance somewhere on my computer.

I should note it was not in this folder.

I just asked it to get up and going.

It's running into some issues here because I haven't actually done this for this MCP before, which is understandable. Now, it's going to add

understandable. Now, it's going to add some to my cloud config. Okay, now it's asking me to sign in. So, I'm going to sign in right over here. Cool. Says the

authentication successful. We can now close this window. Okay, so now I just need to restart cloud code. Okay,

just going to go MCP or manage MCPS.

See that I had have my Gmail MCP connected.

And now I can just say, "Hey, send an email to Nicholas orgmail.com saying what's up." Boom. Just sent me the email. Fantastic. That was easy.

the email. Fantastic. That was easy.

Okay, that's cool. Um, now that we've sent the email, obviously we have to talk about how to set up your own MCP servers, which is way cooler. So, how do you actually go about this process?

Well, I didn't actually know until quite recently. I just asked how would I

recently. I just asked how would I create my own MCP server, and now it's giving me a bunch of knowledge. Here's

how to create your own server using Python. So, hypothetically, just for the

Python. So, hypothetically, just for the purpose of this demonstration, I want to set up a really simple MCP, one that um just does something really straightforward. Just reads my website.

straightforward. Just reads my website.

Maybe it has some information about my website, and then it just like returns information about it. So, I said, "Create a simple custom MCP server whose sole job it is is to interact with this website www.leftclick.ai."

website www.leftclick.ai."

Now, in case you guys didn't know, leftclick.ai is my business. Um, we are the definitive AI growth partner for fastmoving B2B companies. Uh,

essentially what we do is we build outbound growth engines that supplement AI to do things like personalize the emails, find leads, and so on and so forth. I talk about it a lot on my

forth. I talk about it a lot on my channel. And so, literally all I want

channel. And so, literally all I want this MCP to do is basically just to be be a resource for this website. I want

people to be able to download it and then just be like, "Hey, tell me about leftclick and I want it to call the MCP." Is that something you need? No,

MCP." Is that something you need? No,

obviously not. But you don't need MCPs in general. MCPS are just convenient,

in general. MCPS are just convenient, nice little wrappers around functions.

Moving back to Cloud Code here, you can see that it now created an MCP-servers folder. And what it's doing next is

folder. And what it's doing next is it'll write the server Python code. I

have no idea what that Python code looks like. After that, it'll create some TOML

like. After that, it'll create some TOML for dependencies before providing some registration instructions for me. Okay,

so it looks like it just finished.

Creates a server that exposes five tools. Get company overview, get

tools. Get company overview, get services, get booking link, get case studies, and search site. So that's

pretty easy. It's saying, "Hey, do you want to register with cloud code?" I'll

just say, "Great. Sounds good.

Register."

It'll go through the rest of that process for me. Okay. So now I'm going to do a new instance of Cloud Code.

Again, going to go /mcp status. It's now

loading my servers. And you can see now we have the leftclick st server available. So go to bypass permissions

available. So go to bypass permissions and then I'll say tell me about leftclick. Now what occurs when this

leftclick. Now what occurs when this happens is because we have access to the MCP data, it'll actually find that and then get me information about it. So

that's what's happening right here. We

called the MCP server as opposed to doing something else. Maybe I'll say what's the booking link. The reason I'm asking this is because I saw there was a booking link feature. So it's going to

call the get booking link function. Here

it is. Leftclick.ai I book a call to schedule a complimentary 30-inut discovery call. Now, in my case, I don't

discovery call. Now, in my case, I don't think I actually have a calendar, which is why it just gave me the thing and then it told me where to find it. But

hopefully, it's clear. You can build your own MCP servers super easily. So,

why build your own MCP servers to begin with? Well, generally speaking, like I

with? Well, generally speaking, like I probably wouldn't put together MCP servers for most things these days unless I wanted to share them with others. So, like a creator building an

others. So, like a creator building an MCP server for all of his followers to use, that's a pretty good um option. And

so maybe if there's something cool that you know I want to share with you guys, I might do that and then make it publicly available. But aside from that,

publicly available. But aside from that, like why would you build an MCB server instead of maybe using cloud skills or do I've had a lot of people ask me this, Nick, why don't you uh recommend MCP more often and so on and so forth. And

the reason why is it's just not really required. MCP is positive in so far that

required. MCP is positive in so far that it standardizes the ability to call tools and whatnot, but it's also negative in so far that it loads a ton into context. Like what you're not

into context. Like what you're not seeing here is how many tokens that I am essentially consuming by having this MCP server. If I go back slash and then

server. If I go back slash and then write the word context, you'll see that it actually includes a bunch of information about my context usage. And

so of the basically the entire conversation we've had so far, um I've used 1.4% in the system prompt, which is just the um you know, claude.mmd, 7.4%

in my system tools, which is just something I don't have control over. And

you'll see that there's 8.2% 2% of my entire context window dedicated just to MCP tools. The rest of the stuff, 0.6%

MCP tools. The rest of the stuff, 0.6% 0.6% of my messages. And so what's really really kind of annoying is that this thing has basically filled up about half of my entire contact window. And

really I just have like a bunch of really simple tools. Leftclick at

company overview, uh, Gmail send email.

You know, this is eating up a ton of my total token space if you think about it.

The left click server itself is uh almost what I guess that's like 3,000 or so over 3,000 3,300 or something like that um of my tokens. And you know these tokens aren't free. I spend money to use

these tokens. I also obviously every

these tokens. I also obviously every time I make a message and you know have some output um the number of tokens in my prompt it does affect the output quality which we're going to talk about later. So, for the most part, I don't

later. So, for the most part, I don't actually recommend using MCPS unless it's something hyper standardized or unless it's like a one-click thing and uh unless, you know, you're building one that you want to, you know, share maybe with your team or maybe with like a

group of people. All right, so now let's talk about building the workflows. I've

built a bunch of workflows for you throughout various demos, but I now I want to provide you guys a systematic approach to be able to do so yourself really easily and really straightforwardly. First major

straightforwardly. First major principle, everything begins and ends with your system prompt. That system

prompt, as we know, is typically called agents MD, claude MD, Gemini MD, or cursor MD. And there are many more

cursor MD. And there are many more naming conventions. I'm not going to

naming conventions. I'm not going to cover them all. The [snorts] name basically just needs to match whatever your IDE or agent looks for. And the

content should be identical regardless of how you call it. Now, for D specifically, I'll show you guys exactly what mine looks like in a sec. This

system prompt or agents MD or cloud MD or whatever, it's basically just a supercharged prompt. When you

supercharged prompt. When you communicate with chatbt in your window or in your browser and you say, "Hey, I want you to do whatever for me. That's a

pretty short prompt. This one is basically a prompt that's inserted every time and it's just super super long, super intense, super comprehensive, and it covers more or less all of the edge cases and ideas that you want the model

to have. It should explain your

to have. It should explain your framework. It should also explain your

framework. It should also explain your thinking, what you want it to do at every step, and then more. This is how you customize your agent essentially, so it's not just a cookie cutter vanilla agent that functions the same for everybody else. The prompt right now is

everybody else. The prompt right now is kind of the moat. Now, I do recommend you to copy and paste mine because it's just like out of the box pretty good.

But there's some important things I'd like you guys to make sure to include regardless of whether you're using mine or whether you guys are using somebody else's. The first is you should explain

else's. The first is you should explain the framework. So whatever framework

the framework. So whatever framework you're using, whether you are using do or claude skills, you should actually explain that to the model. You should

tell them where the resources are. You

know, hey, directives are in the /directives folder. Hey, you should use

/directives folder. Hey, you should use TMP if you want to store temporary files. Make sure to delete temporary

files. Make sure to delete temporary files after you're done. I also find a lot of success in explaining the rationale behind the framework. It

reduces error rate significantly. So I

don't just say hey you're using the do framework I say hey right now as a large language model the probability that you can do things completely on your own without any framework is pretty low because of that I'm using a framework called directive orchestration execution

here's how it works directives store whatever orchestration is you execution does whatever by using this framework you significantly reduce your error rates and blah blah blah blah here's why you should do this right we actually

convince the model you almost have to get like buyin from the model when you get buyin from the model the resulting outputs are a lot higher quality the second thing you should include is an explanation of self- annealing. Now, I'm

kind of cheating here because I haven't actually got to this point, but bear with me. Self- annealing is the process

with me. Self- annealing is the process of the model fixing its own mistakes without coming to you first. So, rather

than just break like an old school automation, self- annealing means if there's an error, you then feed that error into the model, the model then reasons and then it solves and then finally updates so that it doesn't run

into that problem the next time. In a

nutshell, self annealing allows the models to become more resilient. Doesn't

just get back to working. And every time something breaks, it's a feature, not a bug, because it reveals weak points in your flow that you didn't even know existed. I'm going to tell you all about

existed. I'm going to tell you all about self-nealing and go really in depth with like system prompts and stuff like that later on, but for now, it's sufficient that you just know what it is.

The third thing you need to include is you need to include a sense of autonomy.

What do I mean by this? Well, I let the model know that, hey, my goal is for you to run autonomously without me. You are

an agentic workflow. I say you should test each system on its own. you should

identify mistakes on your own and you should loop repeatedly until you make it work. I also say, "Hey, be careful when

work. I also say, "Hey, be careful when you're sending API calls or consuming my tokens for testing reasons." And then I say, "Hey man, this is really just a rule that says come to me only if you

absolutely need to. I don't want you to come to me unless you are 100% confident that you cannot solve this thing without my human input." And that's very, very rare. When you do this, your model gets

rare. When you do this, your model gets significantly more autonomous and you really change it from like this uh a co-builder programming thing into like a co-orker and a co-mp employee. At the

end of the day, directives and execution scripts are basically living documents.

So, if there's an error or a constraint that you guys find, you should instruct your agent to update them. Cool. So,

talking a little bit more about building, if you have SOPs, you're actually already halfway to having strong agentic workflows. All you really do is you just open your IDE. You drag

your existing SOP document from, you know, your knowledge base or your company PDF or your company uh one drive or Google Drive into your workspace. You

just say, "Hey, I just uploaded a file into the workspace. Could you turn it into a directive and build the execution scripts to make it happen?" Now, if it's a really simple SOP, let's say something that doesn't even need an execution

script necessarily. It's just like a an

script necessarily. It's just like a an AI prompt thing, it it'll just do it and it'll do it like really quickly. If it's

a complex one, it may ask you to verify its approach. Hey, you know, here's some

its approach. Hey, you know, here's some ideas that I have. What do you think I should do? Okay. Yeah, let's pick the

should do? Okay. Yeah, let's pick the first one. Let's proceed. When the agent

first one. Let's proceed. When the agent does this, it'll create the directive in /directives. It'll build whatever

/directives. It'll build whatever scripts are needed, then store them in executions, and then if it doesn't have API tokens or whatever, it'll just ask you to add them to an ENV. This works

really well because SOPs are literally already directives. They contain

already directives. They contain everything the agent needs, the goals, the steps, the inputs, outputs, and edge cases. If yours are written correctly,

cases. If yours are written correctly, all you're doing is you're just translating your human readable documents into another human readable document in the form of directives.

You're not really getting the agent to like come up with anything new. It's

just reformatting and translating into a more token efficient format. All you're

really doing is converting a recipe into a format that some sort of robot chef can follow. You're basically like

can follow. You're basically like programming this thing. If your SOPs aren't very good, believe it or not, this is actually an opportunity to make them better because your agent, knowing that it does not have everything that it

needs in order to do the task, will ask clarifying questions. This will force

clarifying questions. This will force you as a systems engineer to resolve ambiguities that a human being might just figure it out without explicitly having to write. The resulting directive

ends up being a lot better than the original SOP a lot of the time. And it

means that your messy docs become an opportunity to actually clean up your processes and become a clearer company.

I think that's really underrated, but companies in general tend to bury the lead. A lot of the time they don't

lead. A lot of the time they don't actually make explicit or verbalize all of the knowledge within the business.

It's like, oh, just ask Pete for whatever. Send an email to this person.

whatever. Send an email to this person.

I mean, your agent will say, well, like, who the heck is that and why does that matter? Right? Can we just include the

matter? Right? Can we just include the information that we need in order to do it? Now, if you have a big weight step

it? Now, if you have a big weight step or something, it'll be like, "Okay, to be clear, why do you want me to wait?

What is the purpose of this?" And so, the very building process itself can actually help significantly upgrade your business. Now, let's say you have no

business. Now, let's say you have no documentation. Well, if you don't have

documentation. Well, if you don't have any pre-existing documentation or SOPs, no problem. We can still make this work.

no problem. We can still make this work.

What you do is you begin with some very basic bullet points that describe your ideas surrounding the agent. I use

really plain conversational language. I

will literally write down what I want to do as if I'm explaining it to a colleague. I have a bunch of people in

colleague. I have a bunch of people in my team. A lot of the time this is

my team. A lot of the time this is messages that I would have sent to them.

So sometimes I literally just go into Slack and I say, "Hey, I want you to do X, Y, and Z. It should be this. It

should be that. It should be that."

After I'm done explaining it like I'd explain it to a colleague. I then just copy and paste it in my agent. Do not

overthink the structure. Don't overthink

the format. Just get your ideas down.

Agents are really good at formatting this. You can also use voice prompts

this. You can also use voice prompts like you've seen me do a bunch. And then

you can refine and add detail later as you test and learn and try different approaches. The really cool thing is you

approaches. The really cool thing is you don't actually need to know how to code at all. You just need to know how to

at all. You just need to know how to explain what it is that you want, which I think is a far more achievable skill.

This is a real prompt from a lead generation system that I just built. I

said, "Hey, scrape leads from Appify based on the industry and location I specify. Then verify 80% match my target

specify. Then verify 80% match my target market before doing the full scrape.

When you're done, enrich missing emails using a secondary service like any mailinder. Then add everything to a

mailinder. Then add everything to a sharable Google sheet and send me the link." Pretty straightforward and pretty

link." Pretty straightforward and pretty simple, huh? All right, let me show you

simple, huh? All right, let me show you a practical demo. All right, let's build another agentic workflow together. This

one I want to be a lead generation or lead scraping workflow. You guys might have seen me build these sorts of things before on my channel. I love building them because they are so high leverage relative to what I used to have to do

back in the day. So, I figured I'd just bring you guys alongside me for uh one of the new lead scraping workflows that I'm going to put together. So, the first thing I'm going to do, just like I always do, is I'm going to give it in natural language a set of instructions

to club. I'm using a voice transcription

to club. I'm using a voice transcription tool. So, I'll say, "Hey, I'd like to

tool. So, I'll say, "Hey, I'd like to build a lead generation workflow that scrapes publicly available information

to get me a list of B2B leads. What are

the three best approaches for this?"

Now, I kind of know what I want to do here, but I want to show you guys how you can use an agent, not only as some builder, but also as something to assist you with the ideation. So what this is saying is we could start by using a

LinkedIn sales navigator or similar tools to identify decision makers by title, industry, company size, then enrich with contact data via APIs. That

sounds pretty good to me. So I'm going to need some additional tool. That's

okay.

Let's go with the first. I think I've heard of a few different tools we could use to do this. Phantom Buster is one.

There's another one called Vain. Which

do you think is best for our approach?

How should we go about this exactly? So,

it's now going through and it's performing a bunch of research on these tools. Okay, now it's gone through

tools. Okay, now it's gone through performed a bunch of research on all of the tools that we could use and it since recommended me a uh a pipeline. So, that

sounds awesome. I really like this. Why

don't I say let's do it. Yes, I already have a sales navigator subscription.

Let's do it. Build out a pipeline. I

also already have a pre-existing subscription to any MailFinder, which is an enrichment tool. So, why don't we use that as part of our flow? I want you to build this using the DO framework. Let

me know if you need anything.

So now what we've done is we've basically taken our demand or our request I should say and then we've paired it down into a much higher probability build path um

just based off a couple of back and forth questions. If you think about it,

forth questions. If you think about it, the total amount of time that it takes an agent to build something is pretty short, all things considered, but it's still like five or 10 or 15 minutes. If

you screw up and you go down the wrong path, in order for you to walk back and start fresh, you're probably going to have to spend another 10 or 15 minutes in order to have the agent rebuild the next thing. And so, at a very high

next thing. And so, at a very high level, giving it a tiny bit of input initially is super powerful, and it's also a big time saver. So, I usually recommend going back and forth at least a little bit while it does its searches.

and you know use your own human knowledge really to pair down the total um possible number of paths. So it's

going through building a Google Sheets LinkedIn lead genen lead enrichment pipeline and any mailfinder client pipeline. All right, once it's almost

pipeline. All right, once it's almost done all of the scripts, it's going to create a directive just to tie everything together. Do all this for me.

everything together. Do all this for me.

Okay, I'm now having it wrap things up.

We can now start giving it a test.

Obviously, it is one thing if a model tells you that it is good to go. It's a

complete other thing um whether or not the flow actually works. So, we always have to verify that the flow works with with a real test. Okay, it's now testing out any mailinder, testing out the Google Sheets connection.

Looks like it found an issue with the way that it was going to do the connection. I added a credentials.json

connection. I added a credentials.json

file here just from another workspace, which is basically like an ooth thing.

Um I didn't generate this thing. I had

the model generate it for me. It's now

going to ask to authenticate for the first time. Anytime you connect to a new

first time. Anytime you connect to a new Google credential with OOTH, you're going to have to do this. Now I have the browser authentication. I'm just going

browser authentication. I'm just going to pump over here and connect this. This

is a great opportunity for me to point out a common issue that people have with the Gentic workflows. It's where they um essentially have the model generate a test case for them. So in this case, that's what's occurring here.

Test_leads.csv.

It then uses the test data essentially to test end to end to see whether or not the flow works. That's not good enough because if you think about it, the model just created a bunch of scripts. So the

test case that it will come up with is most likely going to be in the same format that all of the rest of the scripts and so on and so forth expect.

What's way more informative is for us just to do this entirely based off new data. So that's what I'm going to do

data. So that's what I'm going to do next. I don't really want to export the

next. I don't really want to export the leads from Vain. I instead want you to do all that for me.

Okay. And it looks like it now is ready for a test. So I just need to give it a sales marketing or a sales navigator URL anyway and it'll do everything or I could run it myself with one command.

That's cool. Um what I'm going to do is I'll just go back to LinkedIn sales nav here and I have a link. Basically what

what happens on LinkedIn when you want to find something like a list of people is you need to generate a search on the lefth hand side. Now you just need to copy over the URL and then just paste it in. So I'm just going to paste this in

in. So I'm just going to paste this in and I'm just going to see what happens.

We'll just test it in 10. All right. And

now it has found 231 prospects. So it's

going to go through and scrape the 231 profiles via vein. Then enrich with any mailinder before exporting to Google Sheets. Okay, it had some issues with a

Sheets. Okay, it had some issues with a particular API call uh to Vain. It since

self-annealed and automatically fixed it all. So it's just continuing down the

all. So it's just continuing down the building process on that first run. Once

I have it finished this first run, I'm just going to ask it to do a second run.

And I'm going to do it completely from scratch. So it's going to be like a cold

scratch. So it's going to be like a cold start. I'm going to instantiate a fresh

start. I'm going to instantiate a fresh cloud instance, one that has no idea what the heck's going on. Then we'll see how it goes. Okay, one of the outputs was buffered. That just means that uh

was buffered. That just means that uh basically it was in a loop repeating. So

I just paused it and said how are we doing? Looks like it's still running. So

doing? Looks like it's still running. So

Python is buffering the output. We're

just going to wait for the completion.

Sometimes some of these tool calls can take a fair bit and that's what's happening with any mailfinder. The

reason why this is actually good for us is because I get to show you guys later on what it looks like to optimize a workflow realistically. And I know this

workflow realistically. And I know this because I've done a fair amount of enrichment at this point. You do not need to take this long to enrich 200 records. You could probably enrich 200

records. You could probably enrich 200 records in maybe like 15 seconds or so through bulk requests. Um the first time that a agent ever builds a workflow,

it's going to do so in as simple a way as humanly possible. Typically through

serial requests, which just means that it's sending one request at a time, waiting until the request is done, then sending another request after that. But

what you can do with a lot of workflows is you can parallelize them, which means you could actually send 200 requests simultaneously and then wait for the outputs of all 200 in the same time block as opposed to, you know, independently. So I'm still going to

independently. So I'm still going to wait for this thing to finish because I want this test to be done end to end at least once. Um, after that, we're going

least once. Um, after that, we're going to look into ways to make this faster through parallelization and so on and so forth. Okay, so I got a little bit bored

forth. Okay, so I got a little bit bored and I just said, hey, could we make this way faster? It's since um offered to

way faster? It's since um offered to batch all of these requests. So that's

what it's going to do next. and let's

see how quickly it performs. While I'm doing that, let me just create a new search. Maybe instead of United States

search. Maybe instead of United States residents, um I want to search Canadian residents. [gasps] That way, we'll be

residents. [gasps] That way, we'll be able to split test this very quickly and easily. As you can see here, we have 31

easily. As you can see here, we have 31 results. Uh maybe we'll also do posted

results. Uh maybe we'll also do posted on LinkedIn, so maybe 45 or something like that. Okay, no, it's just 20. If I

like that. Okay, no, it's just 20. If I

deselect this, how many do we get? 683.

Uh too many. Why don't we just do Vancouver instead? I I want like between

Vancouver instead? I I want like between 50 to 100.

Okay, 66. That's perfect. So, this is going to be the URL I use to test the um totally fresh app. It's now just going to go through the process of self annealing, running, testing, and so on

and so forth. Looks like it found 139 valid emails of my 231 sent. Now, it's

just going through and updating the script a couple more times. Cool. It's

gone through and since found me a bunch of leads, I can open up the spreadsheet to get 159 rows. So, um, these are all of the the records with email addresses.

Um, there were more records that didn't have email addresses, but we just left those out. Obviously, this is pretty

those out. Obviously, this is pretty solid, but, um, I want to number one, make sure that we're documenting this.

So, I'm going to head back over here, and I'll say make sure to document all changes, both directives and executions.

Once it's done with the documentation, I'm then going to open up a totally new fresh instance and then go through and then um, update and then test. Cool. And

it looks like it did some updating.

That's pretty solid. What I'm going to do next is I'm just going to open up a new instance of Cloud Code. Going to set it to bypass permissions and I'll say,

"Hey, here's a search URL. Scrape these

using our pipeline."

All right. So now this is a totally new fresh cloud code instance. Let's see how it performs. It's going to start by thinking it's checking the directive for LinkedIn scraping, which is great.

That's what we wanted. It's then going through here. URL is a sales navigator

through here. URL is a sales navigator search has a bunch of information here.

It's going to check how many leads are available. Cool. Found 66 prospects. It

available. Cool. Found 66 prospects. It

is now going to perform the full scrape.

Okay. And it looks like we got uh 45 out of those 66. So, this did work on a totally fresh list. Um took me about 4 minutes. I got a little bit overeager

minutes. I got a little bit overeager and I was like, "Hey, are you done yet?"

But realistically, this uh this works pretty well. So, I mean, a couple of

pretty well. So, I mean, a couple of different approaches that I could take here. Obviously, I could make this

here. Obviously, I could make this better, could make this faster. I could

set up approaches to dump all this into Google sheet instantly using bulk. I

could do I could do a lot of stuff and uh that's what I want to talk about next. But for the purposes of this

next. But for the purposes of this demonstration, this is good to go. We

have essentially created a workflow to completely or almost completely automate the entire process of scraping LinkedIn.

Obviously, there is still one manual step, which is we need to provide the LinkedIn sales navigator URL, but that's something that we could reasonably automate if we'd like to as well. So,

here's what you don't need to specify.

You don't need to know which APIs to use or how they authenticate. You also don't need to know how to structure the code or handle an error case yourself. And

you don't even need to know any Python, any JavaScript, or any programming language. The agent's whole job is to

language. The agent's whole job is to abstract that complexity away from you and turn it into a natural language. A

really cool hack that I'm using a lot more of now is I don't just have the agent solve it one approach. I actually

have the agent produce three approaches simultaneously. Then I either pick one

simultaneously. Then I either pick one of the three, whichever one makes the most sense, or this is kind of neat, [clears throat] I have parallel instances of my agent generate all three

directive and execution scripts based off of each approach. I then just test their outputs and I rate. I test them on things like how fast it is, test them on things like how reliable it is and how

cheap it is, and then I just pick the best performing one, and then that's it.

Why three approaches? Well, if you think about it, the cost of exploring multiple approaches is basically free. They're

not it's not free free tokens are not free yet but they are very cheap compared to the cost of intelligence and it's also a big chunk of the search space. Uh basically if this is like the

space. Uh basically if this is like the amount of space you have to search through in order to come up with your really really cool problem rather than have your agent just go like manually one by one by one by one and just kind

of do this whole thing on its own. Um

you can actually just like quarter this you know and in my case I said three but you could totally have it four and then just have like four agents independently simultaneously. I can't draw

simultaneously. I can't draw simultaneous executions here, but just assume that it is. Explore that search base in like a tenth of the time. When

you do this, I recommend you have it run in a temporary folder. So, you say, "Hey, do this in a temporary folder.

Don't do this in the main directive execution um framework." Cuz I'm actually giving this to a few of your brother and sister agents to run simultaneously to figure out the best approach. There are a couple of

approach. There are a couple of trade-offs with every single way that you build. The first is speed versus

you build. The first is speed versus cost. So, do you need it fast or do you

cost. So, do you need it fast or do you need it cheap? Obviously, we're looking for situations where we have both, but a lot of the time you have to make trade-offs. Next is reliability and

trade-offs. Next is reliability and complex complexity. The simple solutions

complex complexity. The simple solutions do break less often. If you can store things in one execution script, it's way faster and better than if you store things in 10. The next is breadth versus depth. So if you cover more ground or go

depth. So if you cover more ground or go really, really, really deep on a few items, it's going to depend or it's going to change how your agent constructs things. And then finally,

constructs things. And then finally, sometimes you just need human judgment to weigh these things. So I would recommend at least asking your agent, how would you do this stuff before you actually have it go and build uh every approach. If you think about it

approach. If you think about it logically, this steering is the highest return on investment time that you will ever spend across your entire agentic workflow career. And the reason why is

workflow career. And the reason why is really some of what I talked about earlier. If you just look at any process

earlier. If you just look at any process that has variability in its outputs, okay, this variability grows over time as you proceed through the process just because there are more and more and more

and more steps possible, right? And so

right now, this is kind of like the range of all of the possible um decisions that the model could make.

Well, if you think about it, the one thing that you have the power to do at the very very beginning is you have the power to steer what direction this thing goes. And so let's say hypothetically my

goes. And so let's say hypothetically my goal is over here, right? Or maybe we should say my goal is over here. If at

the very beginning, literally from the first step, the model is already in the wrong direction. It doesn't really

wrong direction. It doesn't really matter how much time and energy it takes to build things, right? But if you could just reorient this approach down over here, then your solution is actually in the range of all possible outcomes. I

call this steering just like steering a car. If you steer, let's say you're

car. If you steer, let's say you're going like a real straight line track and your car at the very beginning of the track is already starting to veer off a little bit. Obviously, the most important thing you can do as a, you

know, driver is you could just steer it so that it goes basically as as straight down the middle of this thing as humanly possible, right? And that's just

possible, right? And that's just ultimately something that really takes like a minute or two. I wouldn't

recommend trying to outsource everything to the model, like the thinking itself.

The first version of anything you build probably will not be perfect. And the

first versions of a lot of the things that I build do suck, but that's okay.

That's actually one of the points. Dough

really depends on iteration. So just run the workflow a few times, watch what happens, open up the reasoning loop, and then just take some notes on what's slow. Hey, I don't really like this.

slow. Hey, I don't really like this.

Hey, this takes forever. Is that

necessary? Hey, um, I don't like how this had to call this API. Hey, this is a little too expensive. How can we do it cheaper? Right? Actually, just tell the

cheaper? Right? Actually, just tell the model what it is. Like, it's you're not going to hurt its feelings. It's a the form of intelligence that none of us can really quantify. Don't anthropomorphize

really quantify. Don't anthropomorphize the damn thing. What'll happen is the agent will diagnose the problem and then implement a fix. And ideally, assuming that you have it in your system prompt, it'll also update both the execution script and your directive, which means

next time you run from a fresh instance, it will already know the solution. And

that's typically what I recommend. I

recommend running it, fixing it, getting in that testing loop over and over and over again. And when you really want to

over again. And when you really want to verify that this thing works, you just open it up in a new instance and then have it run. Every problem that you encounter will make your system stronger if you're smart. Edge cases will get handled that you never anticipated. uh

and after a few iterations you will have a robust workflow uh that I've heard a lot of people say this term battle tested I think battle tested about is about as real and as accurate a way to describe it but you'll have something that is actually just kind of like been

there done that it has seen all possible instances of the problem because it's run 10 or 20 times it sort of knows what to expect um you know you basically go from a workflow that the very first time it runs maybe is 80% reliable to one

that's 90% reliable to one that's 95% reliable one that's 97% reliable one that's 98% reliable and so on and so on and so on and so forth until it's like 99.25% or something. And maybe this is the theoretical limit that you reach.

All right, let's build a lead genen flow start to finish using everything that I've talked about so far. You remember

how earlier we created a lead generation workflow? Well, what if instead of just

workflow? Well, what if instead of just using one cloud instance to generate it, we used multiple cloud instances to generate the lead generation workflow in parallel. not only would be able to

parallel. not only would be able to generate higher quality lead generation workflows, we'd be able to create things that are most likely better because we are able to search more opportunities and options. If that doesn't make sense

and options. If that doesn't make sense to you, I'm just going to copy and paste the same thing that I pasted in here.

Instead of three best approaches, I'll say five best approaches, I'll say be comprehensive and give me all possible options. And then instead of publicly

options. And then instead of publicly available information, I'll say HVAC companies in Texas to get me a list of B2B leads and their emails.

Okay, great. Once I give this parent agent some room to think, what I'm going to do is I'm then going to open up a bunch of additional clawed code instances. So, new,

instances. So, new, new, new, new. So, we're going to have five in

new. So, we're going to have five in total. What I'm going to do is I'm just

total. What I'm going to do is I'm just going to set things up so we could see them all.

Next, I'm going to provide some scaffolding. So, I'm just going to say,

scaffolding. So, I'm just going to say, "Hey, your task is to build a lead generation workflow according to the below details." I'm giving similar tasks

below details." I'm giving similar tasks to five other agents. Since you're

operating the same workspace, uh to minimize the probability of a conflict, do all your work in a new tmp/ test3 folder. And then what I'm going to do is

folder. And then what I'm going to do is I'm just going to feed in all of this.

So, I'm going to say boom boom boom boom.

And then boom. And now I'm actually just going to run all of these simultaneously.

What's cool is this is going to create new folders inside of this TMP which are not going to interfere with our other directives, our execution scripts. I can

now remove this top level script here for simplicity. And now it's going to go

for simplicity. And now it's going to go through and just create all of these.

Not all of these are at the exact same level obviously, but um you know this test two directory structure and the test 4 uh when they get created they're going to just do their work in there. So

in this way I'm capable of exploring a large number of options in a very short period of time. I mean obviously I can take a brief highle look at like one of these things and say okay this one is most likely uh the highest probability

of working but it's much easier if I just explore them and then what I do is anytime I run into a hiccup with one of these flows I just take a look at what the hiccup is and if the hiccup is like so big that it would be a pain in my ass

to deal with then I just drop that and then I don't continue. Then for the survivors, um, once I have like a pretty good-look workflow, I'll test them all side by side, ask them to go do a scrape, and then once I've done the

scrape, I can just compare and contrast results. What's really sweet is when all

results. What's really sweet is when all these things are done, I can sometimes combine the best of each, and then I can say, "Hey, build a unified lead generation workflow that combines the best of X, Y, and Z." And then it'll,

you know, find 30% of leads with one approach, 30% of leads with the other approach, 30% of the leads with a third approach, and so on and so forth.

Anecdotally, it feels really cool to be able to manage and orchestrate this many simultaneous builders. I don't usually

simultaneous builders. I don't usually do five at a time, but I just wanted to demonstrate that you can explore a very large search space in a very short period of time. So, after a few minutes, these are now beginning to finish. The

one on the left hand side has tested the pipeline with a full batch. Just going

to take a peek. See, we've now generated four of these files. We then have our pipeline summary, and now we just need to enter some API keys essentially. Now,

the issue is I've yet to give it a Google Places API key or a Hunter API key. So, I'll just say, "Could you set

key. So, I'll just say, "Could you set up the Google API key for me?" I don't have Hunter, but I do have an email

finder. Please do this instead. Over

finder. Please do this instead. Over

here Apollo.

Okay. And then one of these wanted a sales navigator URL for HVAC companies.

So, I'm just going to go HVAC. And then

geography. Why don't we just go Texas because I think that's what that was.

Rest of this looks pretty reasonable.

It's 4,000 results. I just want a really really like simple one. So, I'm just going to go change jobs 54. That way, we should only get 54. Go back here and then I'll feed in the URL.

I then see an Apollo API key. Yes,

Apollo API key. It's then going to go through and give me instructions on one of my API keys. So, I'm going to head over

API keys. So, I'm going to head over here to Google Places API. What I want is the Places API new apparently. So,

I'm going to enable this. And now it's just a process of getting API keys for everything really.

Copying the API key. Just going to paste that in there. This is now testing. This

is going to test. This is now testing.

And then we just have these two over here which are in the process of building. This here ran into an issue

building. This here ran into an issue with one of the scrapers. So, it's

decided to pivot and then use an Appify API token. That's cool. I don't mind

API token. That's cool. I don't mind that. This here on the left is now doing

that. This here on the left is now doing some debugging and so on and so forth.

That's okay. I don't need to be a part of this. All I'm doing is I'm just

of this. All I'm doing is I'm just overseeing. And if any one of these

overseeing. And if any one of these workers needs me for anything, I'll provide it. All right. And we are just

provide it. All right. And we are just testing across the board. We got 50 leads running for most of these tests.

Some of them are 10. That's okay. I'm

seeing this task over here is running into some issues. Namely, the Apollo API key that I provided earlier was for a totally free account. So, it doesn't look like I can it can actually go and enrich them. This one here on the left

enrich them. This one here on the left looks like it's pretty solid. So, it's

since found a verified email address.

That's pretty cool. I did uh no work here. I just let it run. This over here

here. I just let it run. This over here is doing a batch email scrape. And this

right over here is now running a pipeline test with a fixed client. I've

actually forgotten what's going on over here on the left. So I'll say describe what is occurring top to bottom. So this

is scraping the Google Places API for terms like HVAC contractors, heating contractors. It's going across 50 Tex

contractors. It's going across 50 Tex and cities. Then it gives me a big list

and cities. Then it gives me a big list of leads. It's then enriching with

of leads. It's then enriching with emails before exporting to Google Sheets. So, that's pretty cool. Let's

Sheets. So, that's pretty cool. Let's

run this on a test of 50. Meanwhile,

over here on the right, we did run it on a test of 50, and it looks like we ended up with 26 email addresses. That's

pretty badass. I should note that not all of these are valid. I'm seeing here one of them is for somebody that works at Neurolink. So, probability of that

at Neurolink. So, probability of that being a valid lead is kind of off. Um,

I'm going to want to double check that.

So, I'm going to go back here and I'll say, I noticed one of the leads was for Neurolink. How are these filters? Are

Neurolink. How are these filters? Are

they super accurate? Make sure to double check. Meanwhile, this one over here on

check. Meanwhile, this one over here on the lefth hand side is doing some enrichment. This is now actually testing

enrichment. This is now actually testing to see how many of these leads are HVAC related. So, we're seeing a bunch of

related. So, we're seeing a bunch of these are HVAC related. A bunch of these are not HVAC related. So, uh the search that we're going to be providing here is presumably going to have to be a little bit more specific. I can't just like,

you know, head over to LinkedIn Sales Nav, copy and paste something with a term HVAC, and then have it work 100% of the time. Okay. on the right hand side.

the time. Okay. on the right hand side.

This is now giving me some highlevel instructions on how I can uh you know do the search better. So that's nice. HVAC

and refrigeration equipment manufacturing. Why don't I actually go

manufacturing. Why don't I actually go ahead and just do this? So I'm going to remove this keyword HVAC. And what I want to do is click industry.

Go down here.

I see HVAC right over there. I'm going

to include that. This is 341 results. So

then I'm just going to copy this and paste this back in. Let's run a test on 50. Cool. Cool. Cool. Looks like this

50. Cool. Cool. Cool. Looks like this lead flow here worked really well. 18

out of 20 businesses had websites. 13

out of 20 had emails. Meanwhile, we

happen to get Satia Nadella, the CEO of Microsoft's email over here. That's

always fun. Okay, cool. And now we have a whole list of steps right over here in the middle. So, that's awesome. Gives me

the middle. So, that's awesome. Gives me

a brief description of what's going on.

And yeah, I mean, I like this. So, why

don't I actually see a result? Where are

the leads? Looks like it's going to find me the leads. Text businesses with emails. Then it has them all over here.

emails. Then it has them all over here.

This is cool. So hopefully it's clear at this point. I mean I could do pretty

this point. I mean I could do pretty much whatever I wanted, right? And like

we've actually gone through and explored a tremendous amount of search space in a very short period of time. I could for instance just um send the same message to all five. Hey, show me the results in a Google sheet. You know, I could then

standardize the test and just ask all of them to do 20 leads simultaneously and then I could just have them really quickly test to see which one delivers me the highest degree of accuracy on the

leads. Um I could also disqualify a

leads. Um I could also disqualify a couple. Don't really like this one. I

couple. Don't really like this one. I

mean like it it's working. It just found me three. uh with verified emails, but

me three. uh with verified emails, but I'm seeing that it's using an Apollo endpoint, which isn't 100% right. Um

it's kind of crazy because we're not supposed to be able to use Apollo in this way. We should be having to pay a

this way. We should be having to pay a fair amount of money. And you know, I think there are a lot of things that realistically anybody could do. You

could also just use all five of these, but yeah, I just wanted to show you guys what that looks like. So, what I'm going to do is I'm just going to pretend that I've now selected three and I'm going to

say excellent. turn this into directives

say excellent. turn this into directives or merge these directives executions with the main branch your approach one

then update everything to ensure that the file paths etc are correct that's actually really cool I wasn't expecting this to do anything with Apollo um I

mean I fed it in my API key which is free but uh yeah normally they don't allow you to see any of that and finally it ended up finishing and it since merged my directives with the main directives folder. So I actually have

directives folder. So I actually have the Texas SOS Legen directly here. What

I could do now is I could test it. I

could rerun it. I could optimize it by just asking it to do things faster and faster and faster. And yeah, I was able to accurately assess that this is the flow that I wanted in light of five other ones. Total cost to this was no

other ones. Total cost to this was no more time than it would have taken me to do the first. Sure, I did spend some of my um in this case Claude Max plan usage, although keep in mind that we're

talking cents on the dollar here. I also

spent a few dollars on Google Places API. You know, I would have spent a few

API. You know, I would have spent a few dollars over here. I spent a few HTTP calls over here and then, you know, some Ampify tokens over here. Realistically

though, this allows you to do 5x the tests for like just a couple of dollars per workflow build. Way cheaper than anything um that N8, make.com or Zapier would have charged you just for like

development and testing costs alone. And

we get to do it through self annealing and have a very robust reliable workflow to boot. So, how do you actually improve

to boot. So, how do you actually improve these workflows over time? And when I say this, I mean practically. Like, how

do you actually cut through the noise and then do this thing in a way that is consistent and reliable? Well, you just ask. I actually literally just say, can

ask. I actually literally just say, can you make this faster? Can you make this cheaper? Over and over and over and over

cheaper? Over and over and over and over again, like 30 times. I say, list 10 approaches to make this thing cheaper.

List 20 approaches to make this thing faster. Most of the approaches will not

faster. Most of the approaches will not work, but I will use my human judgment.

And then after it opens up and gives me 20 possible opportunities, I then just pick one that I think makes the most sense. And then we proceed with that.

sense. And then we proceed with that.

Then I just repeat the process over and over and over again until my workflow is now significantly faster and significantly more optimized. That said,

cuz I think a lot of people have probably stumbled on this, um, I do have a rule and my rule is the order of magnitude rule. I don't actually do this

magnitude rule. I don't actually do this anymore unless I can get at least a 10 times improvement in a key metric. For

instance, time, cost, or accuracy because a workflow running in 3 minutes versus 2 minutes, well, technically it's a 33% improvement or whatever, it's not actually meaningfully better for me. and

the amount of time that I take to implement it multiplied by the introduced error risk by doing what is typically an approach that trades off

time, money or accuracy for speed against each other means that I'm usually losing. If you think about it,

usually losing. If you think about it, it's basically what's the metric we want? We want like time, right? And so

want? We want like time, right? And so

the degree to which the time gets better is sort of related to the degree to which maybe the cost and the accuracy go down. And so the amount of time that I

down. And so the amount of time that I spend on this I in addition to like the introduced error rate and stuff like this means that this only really makes sense to do if there's a very clear path to making your flow 10 times better.

What's an example of this? Um I used to scrape tons of leads using a serial approach and I found that it took forever. My serial approach was

forever. My serial approach was something like you know 20 minutes for 2k leads. If you do the math on that

2k leads. If you do the math on that that's like I don't know 100 leads a minute or so. Um, I came through and I tried optimizing the hell out of the serial approach with like every way way, shape, and form that I could. I tried

like changing the compute that I was using. I tried changing like the Ampify

using. I tried changing like the Ampify actors I was using. I tried changing like the API requests that I was making to Google Sheets and stuff like that.

And I was only really able to get this down to maybe 15 minutes. That is like a 25% improvement in time of course, but a lot of the time this is even my bottleneck. Like it doesn't actually

bottleneck. Like it doesn't actually matter if it takes 15 minutes or 20 minutes because I'm not utilizing the leads 100%. Anyway, what I ended up

leads 100%. Anyway, what I ended up finding was I ended up finding an approach that batch parallelized them.

So sent instead of um 2k leads for 20 minutes, it basically sent 100 leads at a time 20 times and then it finished in approximately 1 minute. Um this for

example is a 20 times improvement. This

is something that I'd actually do. Um

that actually worked. But this whole like I don't know this whole like uh detour or rabbit hole thing was just a total waste of my time because this turned the flow into an unreliable mess.

So my rule is I basically just like I don't make small optimizations anymore because they reduce accuracy and reliability for marginal gains. I would

only do this on something that I actually see there being an order of magnitude possible improvement. What are

some examples? It's like moving from software encoding to hardware encoding.

You don't need to know what that means.

Just make sure that when you ask the model and you see words like that, it's like okay, I should probably use the hardware encoding. Parallelizing or

hardware encoding. Parallelizing or using what's called like multiple threads or using multiple service workers simultaneously. These are things

workers simultaneously. These are things that usually do provide like an order of magnitude jump. Um, sometimes you can

magnitude jump. Um, sometimes you can like fundamentally change the order of operations in a workflow. Uh, but in general, unless the model expects that this is going to provide at least a 10x boost, I don't really recommend doing

it. What is really cool is that every

it. What is really cool is that every workflow that you build does become a permanent asset in your library. And I

mean this both in the way of directives and execution scripts as well. Your

library ends up infinitely reusable. If

you think about it, you could open up any workspace in any IDE or agent model.

You could also copy directives and execution scripts over to anybody else's workspace like your friends or your colleagues. You could put it on GitHub

colleagues. You could put it on GitHub with like GitHub code spaces, something I'm going to talk about soon. You could

reuse automations the exact same way that you do them in, you know, drag and drop no code tools like naden, make.com, or gum loop, but you just do that with natural language instead. Your

blueprints, if it makes sense now, is just like a bunch of words on a page, which are much, much more portable. And

over time, your ID will become basically a giant treasure chest that you can deploy anytime you want, anywhere you want. So, for instance, what my library

want. So, for instance, what my library can do right now is it can do automated lead scraping, automated email enrichment, automated personal replies on campaigns that I run because we're predominantly like a cold email agency.

I can initiate high quality voice agent calls. I literally just say, "Hey, call

calls. I literally just say, "Hey, call this person. Hey, I want you to call

this person. Hey, I want you to call people on this list. Hey, I want you to split to like 20 20 uh threads and then call 20 people." I could do automated proposal generation. I could do slide

proposal generation. I could do slide deck creation that actually matches my tone of voice and it looks pretty good.

Um, and all of it is customized to how I communicate. It is not generic AI slop.

communicate. It is not generic AI slop.

Um, so it's pretty cool. Obviously, I

didn't build all this stuff overnight.

It took me a fair amount of time, few days, well, a few weeks now to really uh put the finishing touches on all these.

But yeah, I mean, at the end of the day, this thing can basically be your terminal for life. A real example from my actual day-to-day was automating my school posts. So, I kept forgetting to

school posts. So, I kept forgetting to post a weekly community call thread. I

did it three weeks in a row, which is really embarrassing, especially because I uh like to make it clear that if I don't do like the foundational fundamental things that I promise people I will do, then why why the hell am I entitled to their money? So, I gave a

bunch of people refunds. Um, I asked my agent, Claude Opus 4.5, at the time if automating this was straightforward. I

had never even really thought of this before, but I was basically just like, "Hey, I keep forgetting about this thing. Man, I really suck. Any ideas?"

thing. Man, I really suck. Any ideas?"

And then it's just like, "Oh, yeah, we could totally automate that." So, it went and found a reex uh pre-existing school system that I had built um which just handled like the authentication and the logging in. Then it built a simple scraping spec and it figured it out in

like 3 minutes flat and I automated my school post in 3 minutes flat using a simple schedule timer which I'll talk about later. So now it just happens for

about later. So now it just happens for me which is incredible and it's super easy and it's super straightforward. Um

you can solve so many tiny little problems in your life using tools like this. So once you've built like

this. So once you've built like individual workflows that work really well, then you eventually transition to what I call metadirectives. So at the end of this, what you will essentially

have is you will essentially have okay giant families of workflows that do various things. For instance, I will have like a marketing workflow

umbrella. And this is a family of

umbrella. And this is a family of workflows that does things like, you know, scrape leads, create ad copy, you know, do uh voicemail drops, I don't know, whatever the heck, right? And so

what this umbrella workflow, this metadirective does is it just ties them together. So, for instance, if you have

together. So, for instance, if you have a bunch of separate workflows for, I don't know, a welcome email, the setup of a workspace, and the copyrighting of an email, this is sort of like an onboarding thing, right? So, you could just tile all these together with a new client workflow that just does all them

in sequence. I recommend storing the

in sequence. I recommend storing the directives separately in order to make this happen. I don't recommend just like

this happen. I don't recommend just like having a giant new client workflow that's like four quadrillion lines because it's much easier and more maintainable for the model to load only what it needs in context at any one particular time. But this becomes really

particular time. But this becomes really powerful because they just chain all of the existing capabilities together.

Instead of you having to go like 1 2 3, you know, you have like four or five workflows. What you do is you just turn

workflows. What you do is you just turn that into one workflow and then every time you want all of these done in sequence, you just call the big workflow, not individual workflows. It

also means that when you prompt the model and use it as like an assistant or whatever, you could just say, "Hey, I want you to do X, Y, and Z onboarding workflow." And then you can just step

workflow." And then you can just step away, have a freaking nice cup of tea or something like that and come back and everything's okay. You don't actually

everything's okay. You don't actually have to get like interrupted all the time. And yeah, when you combine that

time. And yeah, when you combine that with the infinite reusability of these workflows, this becomes really, really powerful because then you can just send your new client workflow to the other three account managers on your team and then they can just run it every time

they get a new client. or as I'm going to show you later, maybe you could attach that to a schedule trigger or some sort of web hook so that it just runs autonomously without you. Hopefully

that makes sense. Now, we're starting one of my favorite topics in directive orchestration execution and just agentic workflows in general, and that's this idea of self annealing. First, let's

talk about annealing in a general sense.

Annealing is the process of heating a piece of metal and then slowly cooling it down. Basically what happens is

it down. Basically what happens is previously the molecules in the metal are kind of all over the place. But what

happens when you heat up a metal is they end up actually moving to like their highest or rather lowest energy state and they end up looking kind of like a crystal lattice which is really badass.

And then what we do is we cool it down very quickly which then hardens this and sets it into you know some really strong robust piece of metal. Blacksmiths and

so on have been doing this for many many generations. It removes a bunch of these

generations. It removes a bunch of these internal weird misconfigurations of the atoms and it creates a really strong more stable structure. So people do this with swords and you know uh uh devices

and and pieces of metals all the time in real life. It's cool as hell. And today

real life. It's cool as hell. And today

I wanted to talk about a similar concept in agentic workflows. So what if we had the ability to stress test our workflows as well to make them significantly more resilient? Turns out we do. When we

resilient? Turns out we do. When we

build instruction sets, prompts or directives for our agents. I want you to think of them as looking something like what we see on the left hand side here.

In short, these are pretty rough. We

have some idea of how we want the workflow to develop. Maybe we want it to start here and then go over here and go over here, here, and then here. But we

don't really have uh uh you know a strong mechanism to do it. All we really have so far is just an outline. You

know, when we when you say step one, do X, step two, do Y, and step three, do Z, all this really is is just a couple of bullet points on a piece of paper. And

even if you have an agent like produce a workflow for you uh in a directive form, it's not super tight. What self-

annealing does is basically every single time we run into some error or issue or opportunity for improvement, the system reinforces that flow. And so if this on

the left hand side is what we kind of do on the first day, this on the right hand side is after maybe 60 days of you using an agentic workflow. Instead of it just being this small little piss ant line on

the left, we have a super strong battle hardened protocol. You know, every one

hardened protocol. You know, every one of these little shields is some form of retry logic. You know, uh it's so much

retry logic. You know, uh it's so much beefier. There's like validation steps

beefier. There's like validation steps that that go into place. Maybe you have human in the loop at specific steps you didn't realize that you needed before and so on and so forth. And so you know if I'm somebody designing a workflow

despite the fact that I start over here on the left hand side at the end of the self- annealing process my workflow actually becomes super super robust and very resilient as well. So that concept is self- annealing instead of brittle

systems that break every time that you error out like with you know nadn or make or whatever. When you build these systems they just strengthen over time.

The secret ingredient is adding a level of thoughtful error handling to your system prompt. And the whole idea is

system prompt. And the whole idea is when you do this, it will learn and it will adapt. Problems essentially stop

will adapt. Problems essentially stop being like problems in the error sense and they start being opportunities for you and the model to build edge cases um error handling and sort of unexpected uh

uh steps in that you just didn't really understand the first time because a lot of the time the only way to know is just by doing a bunch. So when you enter the self annealing loop essentially what happens is there will be some sort of

error. Immediately after you will

error. Immediately after you will diagnose where the error is coming from then you will attempt some sort of fix.

After the fix you will then update. So

you'll actually update the workflow the execution script itself and then you'll just rotate over and over and over and over and over again. And then finally eventually this stops erroring out right and then it becomes successful. And when

it becomes successful, all we do is we just do some sort of documentation upgrade. And so we let the directive

upgrade. And so we let the directive know, hey, you know, this is a common issue that previously used to happen a lot. We've since reinforced against it,

lot. We've since reinforced against it, and it's a lot better. And then the next time the loop uh fixes, and let's say this eventually goes into some sort of error. Well, guess what happens? We just

error. Well, guess what happens? We just

run the same thing. We go through an error, then we diagnose, then we fix, or attempt to fix, I should say, and then we update. And then we just loop over

we update. And then we just loop over and over and over again until we can no longer loop. Okay, so this is really

longer loop. Okay, so this is really like that four-step process. The agent

will continue until the operation succeeds or it hits like some super unfixable wall, just something that like actually requires a human being even when something is unfixable. You'll find

that an agent often will find a creative workound. So like for instance, if one

workound. So like for instance, if one of the things that you asked for is like you asked for 50 leads or something or maybe I always use leads cuz you know I'm just super in that business. But

let's just take a step back here and say you are looking for like 50 blog posts on a subject, right? And your whole job is you want to like take these blog posts and then use them to create something. Your definition of done is

something. Your definition of done is you get 50 blog posts from your scraper.

Well, let's say the scraper only returns 40. This loop will start and continue.

40. This loop will start and continue.

And maybe the reality is there just aren't any more blog posts on the internet about this. Well, your model finds a creative workaround by maybe changing one of the filters in how it pitched the first thing. and it lets you

go from 40 to 50 technically accomplishing what you were looking for despite the fact that it is a fundamentally different process. Now

you're using maybe a different set of filters and then although it didn't work 100% it worked 80% the model will then give you a notification or ping you or something to be like hey this mostly worked know if this filter is okay too.

So then you provide some feedback or whatever and then it actually cements the fact that this filter is okay too preventing it from ever happening again.

And in that way every cycle will leave the system a lot more robust and reliable than it was before. So, as a business owner, somebody that's been doing stuff like this for the better part of the last decade, I like thinking about agents and agentic workflows as

basically many employees. And in

business, when you hire a bunch of people, you quickly realize that you can bin human beings into two camps. You

could have employee A, who I'm going to consider the blocker, and you can have employee B, who I'm going to consider pretty self-capable. So, in the

pretty self-capable. So, in the situation of employee A, anytime that they have a problem, and I've hired a lot of people like this, that problem is now your problem. So, hey boss, I tried

doing XYZ, couldn't make it happen.

Could you help me with this? Meaning,

this is the sort of person that cannot proceed without your intervention. Every

time they run into an issue, well, now it's your issue as well. All work grinds to a halt, not just theirs. This is the sort of person that makes the same mistakes over and over and over again, doesn't seem to learn, and ultimately you become the bottleneck for their

productivity. They almost require you to

productivity. They almost require you to micromanage them in order to succeed.

I'm sure there's some business owners here that are watching this video. This

happens very often and this is one of like the easiest and simple tells that you probably shouldn't hire a person that you know runs into issues and can't actually self-mmitigate them. Employee B

on the other hand is a star performer.

They encounter the same problems but they have a simple SOP. The SOP is well even if I don't know how to solve the problem. I'm going to try on my own

problem. I'm going to try on my own first and so they'll only escalate when it's absolutely necessary. They respect

your time. They document solutions when they run into them that your team so that your team never ever hits the same issue twice. They make a a statement in

issue twice. They make a a statement in your Slack. Hey guys, ran into XYZ

your Slack. Hey guys, ran into XYZ problem. Just wanted you all to know

problem. Just wanted you all to know that you could fix this by doing XYZ solution. Sometimes they even run a

solution. Sometimes they even run a quick session to teach others what they learned. Now, if I gave you a choice

learned. Now, if I gave you a choice between these two, which one would you choose? Obviously, you'd choose employee

choose? Obviously, you'd choose employee B. And I think most business owners

B. And I think most business owners would too. Well, self annealing agentic

would too. Well, self annealing agentic workflows behave like employee B. They

don't behave like employee A. And so,

we're giving them a level of autonomy that I think a lot of people previously would have considered insane.

But I think the definition of insane is going to change pretty quickly as these models get more and more intelligent.

How do you actually enable this cool process? It really just boils down to a

process? It really just boils down to a small set of instructions and a prompt.

You just add to your cloud MD, Gemini MD, agents MD, whatever a key thing that just changes its opinion uh essentially like the default mode of problem solving. And the default mode of problem

solving. And the default mode of problem solving with these programming agents is usually, hey, if I can't do something, return it to the user and ask them what they'd like me to do. which makes sense because for the most part this these sorts of models are used predominantly

in like enterprise coding applications now where like a small change can actually result in a big downstream problem but like if we're building simple agentic workflows that are modular and like unit testable uh and then we're just using them in our IDE

like that doesn't apply to us.

So all we say is something along the lines of hey when you encounter an error first diagnose it then fix it then update your scripts and directives to handle similar errors in the future. Now

I always add is something like try super duper hard before escalating to the user. What happens over time is the

user. What happens over time is the initial workflow will look very different on the initial implementation than it does you know several weeks later. Retry logic in instances where

later. Retry logic in instances where one-off failures occur will be added automatically. It'll do things like um

automatically. It'll do things like um self retry loops. It'll do things like um if you guys are in the programming space, you'll know there's stuff like exponential backoff.

There's various forms of error handling like logging and so on and so forth. And

because it is hyper optimized to program really well and understands these things outside of the box, it'll just do them for you. Which means edge cases that you

for you. Which means edge cases that you never anticipated get handled as your agent encounters them. Efficiency

improvements occur organically. You

know, bulk endpoints, parallelization, multiple workers. If there's like a a

multiple workers. If there's like a a request that you made initially in your directive, I want this to occur under 5 minutes after you run this every single time. Just make sure to like see how

time. Just make sure to like see how long it took. If it takes more than 5 minutes, IDate solutions. If you have simple little blockers in there or decision or router points uh in there, agents will naturally do a lot of this stuff for you, which is really cool. And

then obviously you can also just ask, "Hey, make this thing better. Make this

thing better. Make this thing better.

Make this thing better." In this way, your system continuously optimizes itself without any form of ongoing intervention. Uh which is the coolest

intervention. Uh which is the coolest thing ever in practice. That said, when you guys start getting really deep into self- analing and you have workflows that do a lot of their work themselves, safety becomes a much bigger portion of

the conversation than it ever was before. Like with N8N and Make.com

before. Like with N8N and Make.com workflows, the biggest potential issue was basically that you just like turned it on and you forgot to turn it off and then it just continued consuming your credits or operations or whatever longer

than you realistically wanted it to, which charges costs and so on and so forth. But most APIs, most systems, and

forth. But most APIs, most systems, and most automation platforms now have some sort of built-in detection for this, or at least thresholds that you could set.

So, it's not that big of a deal. But

with fully autonomous AI, especially AI that were proposing giving total bypassed permission access to a system, safety becomes much more important. I

was just reading this thread the other day where somebody let Gemini basically run autonomously for I think it was like 2 days or something like that and you know it checked in and it had some cool little workflow loop where it did this but then when they went back to it they

realized that they didn't put it in a container. They basically gave it full

container. They basically gave it full system access and then it like deleted their whole like C or D drive. Anybody

that's in the know, you delete your whole CR D drive, your computer's basically screwed. You know, you have to

basically screwed. You know, you have to do like a fresh install. So that's on your server, right? The thing is you're also giving this thing access to the internet. And so if you have cookies or

internet. And so if you have cookies or API keys or whatever, I'm sure you can imagine even if there's like a 0.1% risk. If you just stack up that 0.1%

risk. If you just stack up that 0.1% over the course of a very long period of time, okay, this is just uh let's say you know 99.9 raised to the 1,000

operations. At the end of this process,

operations. At the end of this process, there is only a 36% chance that the model will actually do what you initially intended it to do. Despite the

fact that on an individual basis, every step was 99.9% um secure and logical.

The more steps you have, the basically the larger those error bars become like I've drawn a few times now. So, what

this means is we really do have to add at least some sort of uh uh guard rail towards the model so that it doesn't screw things around completely. Now,

there are a few simple ones that I do.

My processes are never a thousand steps, right? I mean, I might be dealing with a

right? I mean, I might be dealing with a five or 10step process. So, I typically don't have to go much further than this, but if you want really autonomous longunning agents, um you need to develop what are called harnesses for them, which I cover later. But

basically, here are four things that I would always do. I would always ask the model to confirm beyond making API calls above a cost threshold. So, a lot of APIs have the ability to check usage.

So, I'd actually add like a little step in there that says, "Hey, make sure to check the usage. If you've spent more than, you know, $5 in the last like few minutes, then you should not continue doing this. You should let me know, send

doing this. You should let me know, send me a notification, whatever. Hey, never

modify credentials or API keys unless I explicitly tell you to." That's valuable because a lot of the time it'll do things like reformat your API key.

Sometimes it'll delete API keys that it thinks it doesn't need anymore.

Sometimes, you know, that'll be a big pain in your ass because you have to go back to the platform then reinstitute an API key. Never remove secrets out of ENV

API key. Never remove secrets out of ENV files or hardcode them into the codebase. Models are really good at this

codebase. Models are really good at this already, but I always just like having this explicit because if I try and share something with somebody at any point in time and it has like my enthropic API key or whatever, then these guys now own my ass. And finally, although this does

my ass. And finally, although this does eventually run into a limit, I have the model log all self modifications as a change log at the bottom of the directive. What this does is it

directive. What this does is it basically allows me to take a look at any point in time be like, "Okay, so like what was the sequence of of events?

What was the order of operations?"

essentially. Um, I do this in like GitHub format. So, it's sort of like a

GitHub format. So, it's sort of like a commit if you guys know what that means.

And it's a really simple just like one paragraph. Uh, well, a lot of the time

paragraph. Uh, well, a lot of the time it's just like a one sentence explanation of the changes that we made, how the changes worked and whatever. And

the reason why this is valuable is because like if you're not using version control like a lot of people will not be using uh and I know that for a fact at least you have like a change log that the model can use to go through and see hm before this I was doing X and that

was working okay. Then I tried doing Y and Y is working not so good. So let's

move back to X. You should also just accept that some rules will occasionally be broken. That's just how these things

be broken. That's just how these things are. We know that agents are

are. We know that agents are probabilistic at this point. 100%

compliance and everything is just not realistic and it's not achievable. So

despite our best efforts, there will always be some sort of edge case failure. Although it is getting a lot

failure. Although it is getting a lot better with time, obviously this is just a trade-off that we have to accept anytime we're using AI. I mean, AI multiplies our leverage by thousands upon thousands upon thousands of times,

right? But in doing so, it also

right? But in doing so, it also multiplies um accuracy or or reliability issues as well. Again, it's one of those like even if our human workflows are 99.9% accurate, obviously if you run

them enough times, let's say a thousand times, these errors compound and then you end up with a total process that's only maybe 36% successful.

[gasps and sighs] Well, a human being can typically spot that earlier. But

also, a human being typically just doesn't do a thousand operations in a row, right? There'll usually be some

row, right? There'll usually be some sort of check mark or guardrail. With

agents, you could do a thousand operations like this. So obviously

despite the fact that like our accuracy levels are still really high because we're giving them so much autonomy and because at the end of the day they do lack some context that human beings have and you know a lot of people would argue they're not as intelligent as like the most intelligent human being. This thing

is just going to occur and there's just nothing you can do about it. So I plan for graceful recovery not perfect prevention and I'd recommend you do too.

Cool. Let's chat about using these workflows. And I just want to make this

workflows. And I just want to make this clear that this program is both about building workflows. Then it's also using

building workflows. Then it's also using said workflows. And the two are not the

said workflows. And the two are not the same. Building a workflow versus using a

same. Building a workflow versus using a workflow are two very different things.

When I build a workflow, I am having my agent essentially be a programmer for me. When I use my workflows, that's sort

me. When I use my workflows, that's sort of DO, right? The directive

orchestration execution idea. My agent

is just executing a sequence of steps that a previous iteration of an agent built. So these agentic workflows are

built. So these agentic workflows are mostly about the using side of things, right? like building them while is

right? like building them while is important and stuff like that, it's just a very small part of actually living in your ID and getting things done. And to

that point, I have an important thing to say. The interface to everything is now

say. The interface to everything is now just a text box. So my actual day-to-day work occurs almost entirely now through a single text box. It occurs through,

you know, anti-gravity or Visual Studio Code. And I just have the agent do

Code. And I just have the agent do everything that I have created painstakingly over the course of the last few weeks using the tools that I've I've set up. So, I'll have it do things like generate, you know, my YouTube thumbnails. I'll have it do things like

thumbnails. I'll have it do things like uh generate scripts and stuff like that that I could send to people. I have it do things like generate pitch decks so that I could send to people that are interested in working with me, generate proposals. I do things like analyze my

proposals. I do things like analyze my transcripts and stuff like that. But I

don't do it in individual software applications, okay? I don't do it in

applications, okay? I don't do it in Fireflies and Google Drive and Panda Doe and, you know, Quiller and all these other platforms. I literally just do it

all through a single text interface. And

this is just the way that high leverage work is now going to be done, at least until we come up with a better alternative, which may come in some time. But I wouldn't hold out on it. For

time. But I wouldn't hold out on it. For

a lot of people, a single text box feels like a downgrade. Cuz if you think about it, we've spent decades learning software through visual interfaces and menus. And GUIs, graphical user

menus. And GUIs, graphical user interfaces, are basically the current standard. If you contrast that to typing

standard. If you contrast that to typing and stuff like that, a lot of people also consider it really slow and tough compared to, you know, clicking buttons and whatnot that they're used to, right?

Sometimes people type at 50, 60, 70 words per minute. I have some family members that can't type it more than 20 words per minute. Obviously, that is very slow relative to dragging stuff around and clicking buttons and stuff like that. So, there is no obvious right

like that. So, there is no obvious right way to do this. It's very open-ended and unfamiliar, and I'm sure eventually we'll converge on like a really cool visual thing that combines the best of both worlds. But there are ways to make

both worlds. But there are ways to make doing a lot more natural and efficient, which I want to talk about. The first is just to switch to using voice transcription tools. In case you guys

transcription tools. In case you guys didn't know, you can now just say whatever you want to your computer, and there's like a 99.9% chance that it will understand that and be able to turn that into text. The reason why this is

into text. The reason why this is valuable is because the average typing speed is 50 to 70 words per minute, which is really slow bandwidth. The

average speaking speed is 150 to 200 words a minute, which is three to four times faster. You guys have been

times faster. You guys have been listening to me talk at between 150 to 200 words a minute on average. Sometimes

I'm a little bit slower, maybe around like 130. Other times I'm a little bit

like 130. Other times I'm a little bit faster, maybe around 220 or so. But in

general, I'm speaking maybe three times faster than most human beings type, which is very, very important. Nowadays,

models are pretty smart. So, you don't even need to really organize your thoughts in a hyperspecific way. Like

back when I was using GPT3, okay, back in the uh the good old days, you had to be extraordinarily precise and concise with your prompts because even 10 additional tokens could really really screw up the intelligence and the steerability of the model. Nowadays

though, I could have prompts that are thousandword text dumps where I just I'm in my car driving somewhere. I click the voice transcribe tool and then I just talk. And it does a really good job at

talk. And it does a really good job at turning that into something useful. The

highest bandwidth way of communicating with computers, at least right now, is the following. Nobody really talks about

the following. Nobody really talks about this, but you transcribe your text as input, which gets you to route 200 words a minute. So my input bandwidth is now

a minute. So my input bandwidth is now 200 WPM. And then you don't like have it

200 WPM. And then you don't like have it say stuff to you like you do with like I don't know the chatbt voice call or whatever. Instead, you just read as the

whatever. Instead, you just read as the output because most people can actually read between 300 to 500 words per minute if you skim. And most people will skim in some way, shape, or form. Some people

can go much faster to like a thousand.

And in that way, you have like 200 word per minute input, 1,000 word per minute output, you know, in terms of skimming to relevant materials. Um, the old way of doing this is like 50 to 70. And then

if you're doing voice, it'll be, you know, like 200. So, what we're doing here is we're basically quadrupling our input um at at at both sides of this. So

this is like a 3 to 5x and this is like a 5x at least. So maybe like a quadruple I would say. Um I would recommend just doing that moving forward. It's way

simpler. The only situation which I actually type stuff now is if I like absolutely have to because there is some hypersp specific file that I need to reference on my computer somewhere. And

even then I'll usually just like copy the name and paste it manually. From

here on out when I say the word prompt assume I'm just generating all this with my voice. And then you guys have also

my voice. And then you guys have also seen me do this on multiple demos. But

um I will proceed to assume that you guys know that. How do you actually use workflows? Well, it's really simple.

workflows? Well, it's really simple.

Hopefully you guys have already seen. We

just ask for it. There's no need to memorize the exact name of the directive. Agent typically knows the

directive. Agent typically knows the directives exist because we've included that in our system prompt and it'll scan for matches automatically. You do of course need to provide some data um specifically that your directives input

schema requires. So if your directive

schema requires. So if your directive says, hey, you know, I want you to include uh I don't know the name of a person or something like this and we need the name of the person in order to generate some form of proposal or something. And if you say, "Hey, just do

something. And if you say, "Hey, just do the thing." It'll look at it and be

the thing." It'll look at it and be like, "Hey, you're currently lacking this input." So, like, "What's the name

this input." So, like, "What's the name of the person you wanted? Let me know and I'll I'll create that for you."

Really, this is just like ordering food, right? Kitchen needs to know what dish

right? Kitchen needs to know what dish any modifications or whatever. You can't

just say, "Hey, get me food." You need to be like, "Hey, you know, can you can I have like the hamburger with a side of fries, please?" Like, there's a level of

fries, please?" Like, there's a level of specificity here. You don't have to go

specificity here. You don't have to go super deep, but you also don't need to overthink it. I'm pretty specific with

overthink it. I'm pretty specific with my requests that I know have specific input methods. So, like in the case of

input methods. So, like in the case of getting me some leads, I can absolutely just say, "Hey, get me some leads today, obviously, it's going to ask me a bunch of questions and then I'm going to have to like feed those questions in and then I can kind of mess about with my directive, right?" So, I much rather

directive, right?" So, I much rather say, "Hey, scrape 200 HVAC companies in Texas, then verify the emails, personalize them, and then give me the Google sheet." This takes, you know, 2

Google sheet." This takes, you know, 2 seconds longer than the first version, but because I'm at the helm of the ship, I'm able to steer it into a much uh more straight line direction to what it is that I want. The more steps you put in

an AI's hands, the more chances for errors that it has. Remember that error rates multiply. If I had, you know, a

rates multiply. If I had, you know, a 90% chance doing the first thing correctly and then a 90% chance doing the second thing correctly, um, you know, I would have a, I don't know, I guess a 081% total chance. Ideally,

we're dealing with higher rates, but let me just show you how that transforms, right? If I give it everything I need

right? If I give it everything I need immediately, I now have this is a 90%.

Let's say, you know, in the first one, I say get me leads. Well, what happens? It

interprets my request as saying, okay, we need to get some leads, so let's go to the directive or whatever. and then

it says we don't have any leads. Hey

Nick, can you send me some leads? And

then I need to provide it leads and then it goes through another process and then gives me a total uh success rate of let's say 81%. Here if I just say you know hey scrape me 200 HVAC companies in Texas, verify their emails and so on and

so forth. [gasps] It's only been one

so forth. [gasps] It's only been one step. So I've significantly reduced

step. So I've significantly reduced what's called the compound probability of the error. When you're specific, you also reduce the back and forth. It

lowers your overall failure risk and then it's just faster. So I just do it faster that way. If you're not sure what's available, you could just ask like, "Hey, what workflows do I have?"

Um, you know, eventually after you design so many directives, it does start being a little bit overwhelming for both you and the model. And obviously, there are some strategies that you could use to help accommodate that, like sub agents, which we talk about later. But

for now, just know that, you know, if you don't know what's available, absolutely just ask your model. You

could ask the model to do things like refactor your directive base. Hey, are

any directives that look really similar?

Are there any executions that look really similar? I want you to run a

really similar? I want you to run a comprehensive refactor and everything to like group them in ways that make sense.

You obviously have a lot of freedom to do this in your own. Now, for really complex workflows, I'll usually just paste in the context rather than typing it all manually. Like um you know, rather than asking the model to do some sort of like Fireflies API request for

me, I'll just like paste my call transcript directly in. Takes

approximately the same amount of time.

It's just this one is like exact and there's no room for error. Another

really common request that I typically will do is I will like go to a website and I'll just like command all copy everything and then paste it in the model and be like, hey, you know, build me a proposal with this website or something. Obviously, I could have it,

something. Obviously, I could have it, hey, HTTP request this link and then it goes through that. But, I mean, it's the same thing, right? It takes me the same amount of time to do that versus this.

So, from the model's perspective, doesn't matter. Everything gets inserted

doesn't matter. Everything gets inserted in context the same way. Can be a big time saver since HTTP calls and then API requests and then accessing databases and stuff like that can take some time to set up. So, if you're using this as a

user, right, you are executing your workflows using this orchestrator, you can absolutely just like co-create with it. You can go on websites yourself,

it. You can go on websites yourself, copy paste stuff in, it's no big deal.

The next thing I wanted to do is talk a little bit about how to peruse API documentation with Agentic workflows.

So, as you guys remember in a previous demo, I built a workflow that took LinkedIn Sales Navigator URLs, fed them into the service vein, uh, you know, did a couple of other things, and then ended up giving me a big list of leads in a

Google sheet. So, how exactly do we do

Google sheet. So, how exactly do we do this sort of thing in like a reasonable way? Well, obviously we could just, you

way? Well, obviously we could just, you know, tell the model, hey, I want you to build XYZ with Fain. But what you'll quickly realize is models will spend maybe 50% of their time just looking up API documentation and another 50% of the

time like running into some sort of error. Like for instance, if I were to

error. Like for instance, if I were to use this API documentation so let me just go over here then feed this into AI and say something like tell me about this API documentation.

The first thing it'll do is it'll take the link and then it'll try accessing it using some sort of web search tool.

That's what it'll do here. The thing is, not all API docs are created equal, and so some API documentation pages don't actually include um all of the information that we need in order to do what we need to do. Some of them don't

return things the way that we need them to. So here it's saying the page is

to. So here it's saying the page is fairly lightweight on specifics. No

detailed endpoint schemas, rate limits, or code examples. You need to log into their dashboard to add the full open API spec with the request and response schemas. But that's kind of weird

schemas. But that's kind of weird because we have all the information right here, right? Well, that's the thing. Some of these API pages only load

thing. Some of these API pages only load through JavaScript. So realistically,

through JavaScript. So realistically, this isn't actually capable of accessing the API docs. If I said, hey, you know, could you find the endpoints or something? It could eventually do so,

something? It could eventually do so, but it probably wouldn't do so very well.

So I say, what are the API endpoints here? It's going to look for more

here? It's going to look for more information. So it's going to look for

information. So it's going to look for some spec to get more detailed information about the page. It's going

to run through the same thing that it just did a moment ago, probably to no success. And here you see it uses

success. And here you see it uses JavaScript to render the UI, which means the endpoints aren't actual HTML. So now

it's just starting to look and sort of guess at what the um JSON information is for the API. Sort of annoying, right?

Doesn't actually provide that information. So what else is it going to

information. So what else is it going to do? Well, it's going to do more. It's

do? Well, it's going to do more. It's

going to start looking for other people's um API docs. It'll start

looking for blog posts and stuff like that. And I mean like this information

that. And I mean like this information here, it's not terrible or anything, but if we're clear about how long this takes and then um what sort of resources it's requiring on our end, if I just type

back slashcontext over here, you can see now that we've already started filling up our message um context, right? I

mean, you know, MCP is still the prevailing one because this is using the same um series that we were using before. But yeah, I mean, like messages

before. But yeah, I mean, like messages are already 1.4%. We haven't even done anything yet. Imagine if this continued

anything yet. Imagine if this continued operating on its own sort of like loop for another 30 seconds or so. Hell, we

probably get up to like 3% 4% 5% or more. And so in order to prevent all of

more. And so in order to prevent all of that from occurring, um, a lot of the time for APIs, I will actually just open the things that I want. So we wanted open, we wanted get, and then [gasps and sighs] what else did I do?

There was like a URL check right over here. And I'll just copy all of it in

here. And I'll just copy all of it in directly.

These are vehins API docs list endpoints for me. So now instead of having the

for me. So now instead of having the model do all of that searching itself, which if you think about it is like that's an additional step which compounds error probabilities, I just copy and pasted everything in which means it's going to get everything right

on the first try. It's not going to go back and forth or try and guess at various API endpoints or whatever. I

basically have everything that I need.

If I wanted to make a simple API call to the post endpoint, what would that look like in Python? Now it's actually going through and then giving me all the information that I need. That's pretty

straightforward. Okay, great. Let's do

it. Now, I should contrast that with a few other APIs out there that are actually optimized directly for AI and large language models and agentic workflows. So, one in particular is the

workflows. So, one in particular is the Ampify API and these guys I want to say are like a leader, but um there are other services that are catching up and they're doing stuff like this as well.

Like obviously I could feed all of this in to AI via plain text and you know it would do a good job, don't get me wrong, but what you'll see is that now there actually are copy for LLM buttons up at

the top of the page. If I were to copy this for LLM, view it as markdown, open in chat, GBT, open in cloud, open in perplexity, it actually like includes information for AI models and I mean like this is just a

markdown version of everything we saw on the page. Because it's marked down, it's

the page. Because it's marked down, it's actually already significantly more efficient and AI natively understands how to traverse this. So this is a brief

example of like APIs accommodating to AI models and agentic workflows. APIs are

sort of like anticipating that agentic workflows are going to quickly come and swallow up everything. So they're making all of their documentation totally available through like very token performant, token efficient markdown

like this. So you know if I wanted to

like this. So you know if I wanted to have it check the um documentation, I would actually just copy this and then I would just say

tell me about this API. It would

actually go when it would um first access the page itself to grab all the markdown data. And what's cool is

markdown data. And what's cool is despite the fact that it's a fair amount of text, this does so very quickly. Once

it's done, it gives me a big overview.

Then I can also ask follow-up questions.

What kind of endpoints are most common?

Okay. And as you can see, it's already providing me a bunch of information. So

that's pretty sweet, right? You would

not believe how much money on the internet is available for the taking if you just know how to connect APIs. And

nowadays, to be honest, you don't even really have to know how to connect APIs.

You just need to be able to communicate the fact that you want to connect to an API to a model. So if you could just say, "Hey, here's an API. Could you like really quickly connect to it and then send a quick test query like XYZ and then it does?" So, you know, you can

actually swoop up a large chunk of like the economically valuable work on freelancing platforms, simple one-off queries that, you know, like businesses commonly require. Hey, I'm using

commonly require. Hey, I'm using Xplatform, but Xplatform doesn't have a a one-click Zapier integration. How do

we connect to their API? It's so scary and intimidating. I mean like you can

and intimidating. I mean like you can actually solve that really easily not just for yourself but for other people with a tool like this. In terms of how to actually do the stuff once a workflow starts for the first few times maybe

first 10 or 15 times I actually recommend watching it work end to end.

It seems like this is a big time investment keeping in mind that workflows can take you know 30 seconds to a minute to execute. Um, I don't think this is anywhere near that big of a deal because if you just watch the reasoning for a little bit for even like

one or two executions, you typically learn more about what's the model is currently and actually doing under the hood than you would if you had like 3 days of autonomous flows. Uh, and so in doing so, you're very very quickly able to iterate and make it very very good.

You don't have to like stretch that iteration process out for like weeks or months. What's cool too is when you

months. What's cool too is when you watch workflows, you get to develop a sense of intuition about the reasoning the model goes through. And I honestly think there's probably nothing more important, no better skill to develop than intuition surrounding how models

think as of the current date. I mean,

these models are going to run our economy very soon and they're already running our economy in many ways. So

like if I am going to spend some time working, my whole time working should be spent developing an intuition for how these models actually function. I mean,

it's also really satisfying. It's super

cool just to see the model solve problems and, you know, make logical conclusions based off information that I provided it. And it's usually pretty

provided it. And it's usually pretty easy to pinpoint when the reasoning goes sideways. the model will be like wait

sideways. the model will be like wait maybe I should use this approach and then you're looking at it you're like well that's not the approach to use which means you can actually significantly cut the amount of time it would take by just like pressing X and then pausing the run and then just

saying hey sorry it's actually Y right way easier to do it that way and then co-creating with that model also again lets you build that good intuition for how your workflow is supposed to work now if I'm handling a really long workflow like I have a video editing

workflow whose full execution due to you know the ffmpeg library can take like 45 minutes or something I'm not going to just sit there and watch it obvious obviously because most of it is the script executing and then my hardware running and stuff, right? So, I'll just

open an extra agent window and then I'll use what are called background tasks.

Background tasks depend on the different model provider and interface that you're using. Claude introduced background

using. Claude introduced background tasks a while back and I've been using the Claude family of models um quite a bit recently. So, that's easy. What I'll

bit recently. So, that's easy. What I'll

then do is I'll set up some sort of hook in my IDE to play some sort of sound when the thing is done. Hooks connect to specific points in the workflow. Uh what

that means is like if you know my workflow takes 30 minutes and it's a background task when it's done I can actually have my computer go duh ding and then you know tell me when the thing is completed. I'll show you guys an

is completed. I'll show you guys an example of that later. Um there's also native system notifications. Obviously I

just find the sounds more reliable for getting my attention. I get a lot of notes nowadays. To set up hooks

notes nowadays. To set up hooks depending on the platform you just create a mini workflow that triggers the sounds or the animation. So you can just like give it a cool sound that you want and then say, "Hey, set up this up so that when you finish operating um

there's some hook and then it it triggers this sound and it just plays natively on my computer because that'll help me direct my attention back to you and then like help you with the next step." Claude has really good

step." Claude has really good documentation on hooks. Most people that have built hooks have done so with Claude. You can check their hook docs

Claude. You can check their hook docs for specifics. Um the common use case,

for specifics. Um the common use case, as I mentioned, is to play sound when the workflow finishes just so you can check the output, verify things which you wanted. But you can also do things

you wanted. But you can also do things like play different sounds for human in the loop steps where it's like, hey, action required type stuff. Okay, brief

example of me setting up a hook. Here's

a practical guide on setting up hooks.

So, first of all, what I'm going to do is I'll say, hey, how's it going? I'd

like you to set me up a hook that plays a nice chime sound every time that one of my agents is done with a task. That

way, I'll know to go back to the task because I normally have you alt tabbed while I'm doing other things.

This already knows that it's a clawed code hook feature. There are shell commands that execute in response to events like tool calls. So now it's giving me all this information. First,

it's going to do some research. Then

it's going to actually write a script to run the claude code hook. All right. And

it's now adding the hooks configuration with a little glass sound. I don't know if you guys heard that, but that's that.

It just finished. So yeah, I did just finish. I'm going to pretend I'm alt

finish. I'm going to pretend I'm alt tabbed somewhere, not paying attention, but I'm not hearing the chime.

So, it looks like every time it plays it directly, I could hear it.

Okay. So, I'm going to back slash check hooks.

I'm just going to start a new Cloud Code instance like it's telling me to do.

Hey, how's it going?

Perfect. And now I hear the chime. So,

it's that easy. You can now set let's say five of these simultaneously.

One, two, three, four. Then I'll just open all these in separate tabs. Then I'm

just going to send to all of them. Write

me a funny poem.

Now I will send to all. One, two, three, four five.

Nice. Now, this thing has gone through and written me funny poems, and I got a bunch of chimes, too. Hopefully, you

guys could see how this thing could be helpful if you guys were working on a cloud code instance without notifications enabled or something like that, uh, and then you were on another tab. In practice, I find when you are

tab. In practice, I find when you are juggling a bunch of things and trying to stay in context, but obviously also monitoring or orchestrating some sort of AI flow, um, a big chunk of the time you will spend is literally just completely

wasted time where you haven't given AI the next instruction. So to really economize that time, simplest way to do it is just to like have some sort of notifying flow. Play a nice chime noise

notifying flow. Play a nice chime noise or I don't know, you could set it up so the claud window actually pops up every time it's done. That way, you'll very quickly go back to this, give it some additional instructions, and then be able to double up on the return on your

time. Now, when any workflow completes,

time. Now, when any workflow completes, you're almost always going to get a deliverable. This is a link or a

deliverable. This is a link or a document or a summary or something.

You'll also usually get some sort of report of what happened during the execution. My recommendation for you is

execution. My recommendation for you is to review the output, confirm that it meets your needs, and if it does, tell the model. Let them know. Say, "This

the model. Let them know. Say, "This

worked great." If you've had to do some trials and some some iterations in order to get this, let the model know that like this is what you want and to update the directive in execution unless it's

already done. So most of the time this

already done. So most of the time this will happen automatically, but it's cheap and almost free to say that every time you get like a really really good output. As I mentioned previously,

output. As I mentioned previously, individual workflows are really useful, but I actually think chaining them together is where the real magic happens. I always provide that umbrella

happens. I always provide that umbrella analogy and I like how my umbrellas are getting better and more and more um sophisticated as this course goes on. I

don't think I used to see that little thing up there. That's really badass. Um

this is like your, you know, marketing umbrella, you know, your new new client onboarding umbrella or whatever. What

you do is you get all the individual workflows that you've created, group them under this thing, and then next time you can just run all of them simultaneously by just saying, "Hey, trigger the new onboarded client

automation." This solves the manual

automation." This solves the manual handoff process with the deliverable.

Like you could build a lead scraper. You

could build an enrichment workflow, but what that means is this workflow will start and then it'll finish and say, "Hey, we're done." And then you actually have to take that link and say, "Okay, now do the enrichment workflow. Oh,

okay, now we're done." You have to take that and be like, "Okay, let's actually send the emails. Okay, now we're done."

Like much better for me just to eliminate that process completely and then, you know, only check in once we've actually completed the entire thing, right? Assuming that I've verified that

right? Assuming that I've verified that every individual step does what it is that I want it to do because otherwise, yeah, you're basically the bottleneck.

And I can't tell you how many times I've just had 10 claw instances open or 10 Gemini instances open and I just forget to proceed with one of the steps. It's

like, "Would you like me to send the email?" And then I'm like, "Where the

email?" And then I'm like, "Where the heck's this damn email?" And then I look back and I realize, "Oh, I didn't actually tell it to continue. I wasted

like an hour." So, I've covered similar examples, but here's another one. Uh,

lead scraping is really popular. So, you

find potential customers, then you enrich their emails, then you personalize their first line generation.

I do this using a casualization workflow I've shown you guys multiple times, but essentially this is all just batched under um you know like end toend

new client workflow. So that when I get a new client, it actually goes through, analyzes the client niche, scrapes leads, enriches the emails, and then does personalized first lines before giving me a Google sheet. It's kind of

cool because this is all stuff that I was doing manually step by step. As you

get to higher levels of abstraction, eventually we'll have things that are basically like do all of the marketing for this campaign and it'll do a really good job. When does the agent actually

good job. When does the agent actually require our help? Well, sometimes the agent genuinely cannot fix something automatically. And it's rare, but when

automatically. And it's rare, but when this happens, it'll typically just ask you directly. Usually, it'll provide a

you directly. Usually, it'll provide a fair amount of context, which is good.

Now, the question is what it was trying to do, what went wrong, and then what options exist to fix it. Your job is literally just to look at that and say, "Okay, let's do this then or okay,

update the directive to do this or are you sure you fully tried?" Or, "Have you research all of the solutions?" or

something along those lines. And so, in this way, you're not only like uh, you know, like a decision maker at a high level. A lot of the time, you're also

level. A lot of the time, you're also just a motivator. To be honest, I can't tell you how many times I've had one of these agents go on some loop for 10 minutes and try and build something and, you know, they get really close, but then they just can't seem to get the API

spec. And then I say, "Could you

spec. And then I say, "Could you research the API spec?" And they go, "All right, yeah, I'll go research the API." And then they actually go do the

API." And then they actually go do the thing and they get it right on the first try. It sounds weird, but a lot of the

try. It sounds weird, but a lot of the time agents don't just need the decisions made, they also need some level of motivation. I've also found that sometimes a gets stuck in a really

silly loop. Sometimes it'll literally

silly loop. Sometimes it'll literally just like do the same thing over and over and over again and then it'll try the same next solution over and over and over again and then it'll just chain those two together and go back and forth and back and forth and back and forth.

Who knows why this happens? I'm sure the smarter the models get, the less this will occur. But when this happens, you

will occur. But when this happens, you you just pause it. You look at the reasoning. You see what's going on. You

reasoning. You see what's going on. You

say, "Hey, you've just been doing these two things for the last like 20 minutes.

Could you please not do that anymore?

Instead, do research on this best solution before proceeding." The reason why you do this is because iteration is actually just really cheap. So it's much better to do something than nothing.

Like I mean the cost of you sending this one message or whatever is like cents on the dollar, right? And then the potential upside is is very very big.

And typically when you have like a massive disparity between the cost and then the upside, it would take many many many runs of this thing completely

failing without returning some sort of like ROI. And in my case, you know, I'm

like ROI. And in my case, you know, I'm usually capable of doing on the first or the second try. So when should you jump in? When should you do let it run aka

in? When should you do let it run aka when is there human in the loop? The way

that I determine when I should build a human in the loop flow or rather I should use human in the loop in a in a flow is what is the magnitude of the outcome and then what is the sensitivity

to quality. So if the magnitude of the

to quality. So if the magnitude of the outcome is really big aka this single task matters a ton for my business then I'm going to step in. If it's very sensitive to quality, as in if there are

very small errors that create disproportionately large problems, I also step in. And if they're high on both, you absolutely want a human in the loop. A really simple example of this is

loop. A really simple example of this is cold email templates and then outreach sequences. So I do a lot of these,

sequences. So I do a lot of these, right? It's part of my day-to-day as

right? It's part of my day-to-day as part of leftclick. I find that when you have an AI do 100% of this, performance is pretty trash. And the reason why is because I could actually graph this.

There's basically like a really uncanny valley essentially where let me see

if this is the let's just say quality and then this is the perception.

If this is zero and then this is one.

Notice how it doesn't really matter how much quality we put in until we reach some like phase change level and then all of a sudden it goes boom and then it becomes really really

really good. So for my cold email if I

really good. So for my cold email if I have AI right AI it's gotten better over the years. Maybe it started over here

the years. Maybe it started over here and now it's over here and now it's over here and now it's over here here here.

It doesn't really matter how good AI is at this process because the sensitivity of the perception of my email campaigns

is very very high. And so there's this uncanny valley effect over here where like a tiny little improvement in quality massively improves the perception. And so in situations like

perception. And so in situations like this where the model just can't seem to get up this thing, obviously it makes sense for me to like review it really quickly, change up two or three words, and then boom, all a sudden the quality is up here, right? It's like, did I

objectively change the quality a ton?

No. But did the perception massively change? Yeah. And that might have taken

change? Yeah. And that might have taken me a few moments of work. So, I find stuff like that is really, really important on um, you know, cold email templates, outreach. I would always, you

templates, outreach. I would always, you know, given the volume of the task, the fact that I'm sending this stuff out to tens of thousands of people, I would almost always at least have a person looking it over before it runs because it's like, well, what if I'm just like

off by one degree here? I just wasted 10,000 emails. I might have as well like

10,000 emails. I might have as well like spent 2 seconds to fix that up and then sent to 10,000 and then gotten much better results, right? [gasps] Same

thing with financial documents like invoices and even proposals. I mean, I automate the hell out of my proposals, don't get me wrong, but I have a human in the loop stop. I will take a look at the proposal before I send it out cuz imagine what if you accidentally added

an extra zero or something. It's very,

very unlikely, right? But even if that occurs like 01% of the time, you screw up on some number because your AI system just misinterpreted what you said or maybe your voice transcription tool was wrong or whatever. The point I'm making

is like the time savings that you get by not looking it over are not at all equivalent with the negative impact to you, your reputation, and your business if you do not look it over. So anywhere

where there's a few percentage points of quality making a massive difference to the impact, generally anytime the impact over here and then the quality over here

has this sort of relationship. Pardon

me, I didn't draw that cuz I think my tablet's malfunctioning. Um, you

tablet's malfunctioning. Um, you generally always want a human in the loop. On the other hand, there are a lot

loop. On the other hand, there are a lot of tasks out there that are really low sensitivity. And when this happens, it's

sensitivity. And when this happens, it's like the volume of this thing is a lot more important than being perfect. So,

you might as well just let it run completely autonomously. Good example of

completely autonomously. Good example of that is web scraping. Like, this is not a really high sensitivity task. Models

are pretty great at this. Creating

multiple drafts or variations for later selection is a design pattern that I use all the time. And it's like I don't actually need to steer it that much cuz the whole idea is I just want it to like generate me a bunch, right? So, that's

really simple. Generally anything that sales linearly with quality, right?

Where it's like the amount of quality here and then the amount of impact sort of at like a onetoone relationship, I'm okay with it going autonomously because even if I'm up here, okay, and it's over

here, the amount of time that I save having it automated, you know, at like 70% of the full thing versus 100% of the full thing is typically way better than whatever the the actual impact

improvement is. Now, some things should

improvement is. Now, some things should not be automated at all. I don't

actually think that you should have voice agents doing any sales calls for you. And this is something I see so many

you. And this is something I see so many people do. Like if you're offering a

people do. Like if you're offering a call, you clearly care a lot about the outcome of the call, right? It is a hightouch sales conversation. And you

know, if there's even a.1% chance that somebody thinks that there's not a real human being talking to them, it's like a robot. That's going to have a much

robot. That's going to have a much bigger impact on the quality of that deal than 0.1%. Right? So it's not a linear relationship between that at all.

And you know, some things I just don't automate. Like would I automate the

automate. Like would I automate the calling of my client or something? No, I

I wouldn't. At least not right now at current levels of tech. Maybe if um agentbased calling becomes better and like more socially acceptable later. But

for now, no. What I would do is I would like automate the process of coming up with a bunch of information and context about the client. I would automate the process of doing research on the client.

These are all things that scale pretty linearly as I was talking about, right?

So, I'd have some big dossier of information in front of me to save me from having to manually go through hours and hours of LinkedIn research, but um I would actually just make sure that the actual calling part is me, right? It

just doesn't make sense. It's too

sensitive of a process. Research, on the other hand, a lot more linear. There's

some situations that do require empathy, judgment, but you can convert situations that require empathy and judgment into situations that you just like automatically say yes or no to. A good

example of this is um Amazon. Amazon has

like basically automatic refund dispersement. If uh you have asked for I

dispersement. If uh you have asked for I think less than like a 2% refund rate or something like that. So if there's an issue with your order and like for the most part you don't ask for refunds very often and you say, "Hey, there's some issue with this. Could you give me a

refund?" Like they will automatically be

refund?" Like they will automatically be like, "Yes, refund granted." And then you're like, "What the hell? I didn't

even tell anybody about like I didn't even give a photo or anything. It's

fully automatic." It's like, "Yeah, see how much time and energy they save by doing that." So you can just reconstruct

doing that." So you can just reconstruct um sensitive customer situations and like quantify them and then you can like totally automate them. But in situations where like you genuinely can't. Let's

say this is somebody with sort of a shakier refund rate and stuff like yeah, you're going to need to find a way to pass that off to somebody that has empathy and judgement. So yeah, I mean I would not automate things just for the sake of automating them. I'd only ever

automate something if like it actually made a bottom line difference to my business. And things like lead scraping

business. And things like lead scraping for instance, research, accumulation of large data sets and stuff like Like all this stuff in videos make a large difference to my bottom line. So I'm

happy to automate it. But the calls and whatnot, it's all just me, baby. At the

end of the day, your goal is supervised autonomy. It is not babysitting. So I

autonomy. It is not babysitting. So I

just talk to them like Slack messages. I

do not use formal syntax or precise technical language. I just DM my

technical language. I just DM my colleagues and then just replace the colleagues with my agent. You know, uh I was running a YouTube workflow just the other day to edit one of my videos and I said, "Hey, could you run the YouTube editor for the new file? Make the cuts a

little bit tighter." and it took the average cut distance and then it just like decreased it a little bit and then it just reran the YouTube editor and then I said I liked it so then it updated the flow so I would just use that the next time. Same thing with voice transcription in general. Just

just speak naturally and then send it.

It'll understand you. Okay. So manually

triggering these workflows is actually just the beginning and that may be frustrating for you because there many hours through the course but that goes a lot deeper than this. Right now what we're doing is we're opening our IDE.

We're talking to our agents and then we're starting the flows yourself, which is fine if you have like ad hoc tasks, one-off requests. It's fine when you

one-off requests. It's fine when you work 8 hours a day and between, you know, 9 to5 or whatever when you're at your desk, you can you can get things done. But as I'm sure you'd imagine, the

done. But as I'm sure you'd imagine, the automatic part in the word automation, like the auto is pretty important, right? So, how do you actually have

right? So, how do you actually have these things run automatically without your involvement? Well, these are called

your involvement? Well, these are called event- driven workflows. For instance,

let's say a new lead fills out your website form. You want a workflow that

website form. You want a workflow that automatically replies and books a meeting, right? But what if the new lead

meeting, right? But what if the new lead comes in at 5:30 and and you leave for home at 5? What if a customer sends a support email? Your agent does the

support email? Your agent does the triage, write the draft, and writes to the right person for sending. I mean,

that's great and all, but like what are you going to do? Like wait until the next day, um, look at your inbox and then do the triage, then that defeats the purpose. So, how do we actually

the purpose. So, how do we actually build these things? There's also

schedule driven workflows. Maybe it's

9:00 a.m. on Monday and you want a weekly report to generate itself. So, do

you really want to come in every Monday and then be like, "Hey, generate my weekly report." I mean, of course, you

weekly report." I mean, of course, you can, but it's nice if some of these things are done automatically for you.

Maybe the weekly report is summarizes your work and then sends it to your boss or something or your client with your timetable, right? Same thing for these

timetable, right? Same thing for these other things. These are uh specific

other things. These are uh specific schedules. Well, that's what we're going

schedules. Well, that's what we're going to learn about next. Web hooks and scheduling. Now that you know everything

scheduling. Now that you know everything that you need to know about agentic workflows in order to build them and then use them, it's time to take these things which up until now have been

constrained to your own device or your integrated development environments, then put them in the cloud where they can be triggered through means other than you actually prompting. So in order

to do this successfully, which I'm going to call cloudifying my workflows, we don't actually upload the orchestrator itself. Remember in the loop where we

itself. Remember in the loop where we have the directives, the orchestration layer and then the executions. What we

don't upload is the orchestrator. All we

really do is upload the execution scripts themselves which are the deterministic parts. You can also upload

deterministic parts. You can also upload the directives too if you wanted to provide context to a a model later on in case it wanted to edit or or whatever.

So for the most part just upload the execution scripts. I'm going to show you

execution scripts. I'm going to show you guys how to do that and some alternatives. The way that you can think

alternatives. The way that you can think of it is as creating many APIs that do one specific thing reliably. And the

same concepts apply whether you're using DO or other frameworks like cloud skills or whatever. Now you may be wondering

or whatever. Now you may be wondering Nick what is fundamentally different about this versus what we were doing before. Well, what's fundamentally

before. Well, what's fundamentally different about this versus what we're doing before is there is no LLM.

Instead, all we're really doing is we're just creating our own API and we're using LLMs to do it really really quickly and easily with some sort of defined input and output. The reason why is because you need to remember

stochasticity or sort of randomness. The

tendency for models to eventually diverge from what it is that you wanted them to do over time given enough time steps. So because of this, LMS are very

steps. So because of this, LMS are very probabilistic and they sort of have randomness in every direction. When

they're working in your IDE, for the most part, you're around, right? Whether

you're not looking at it right this second, you'll probably look at it at some point over the course of the next hour. And because of that, if it has an

hour. And because of that, if it has an issue, you're watching. You can course correct. But if it's 3:00 a.m., okay,

correct. But if it's 3:00 a.m., okay,

and this is running unattended with full system permissions, this level of variability is a liability. And so we're taking the AI just out of the cloud loop entirely.

Additionally, instead of having slightly different routing decisions like we see here, we're just going to force them into one routing decision every time using what's called server side logic.

So because your execution scripts do the same thing every time, you never actually have to suffer this. Instead,

it's always just, hey, we start by executing node one, then we move to executing node two, and then so on and so on and so forth to node n. And all

we're doing is we're taking those execution scripts, deploying them as standalone cloud functions. No LLM in the loop, just an API on a schedule or responding to web hooks. The

intelligence that we use during this process is just used to build the execution scripts, not to actually run them. In this way, you can consider this

them. In this way, you can consider this like basically deploying your own mini app. A good way to think about this is,

app. A good way to think about this is, you know, like your agent is the architect and your cloud workflow is the building. Architects design buildings

building. Architects design buildings all the time, but it's very rare that they actually live in the buildings they design, right? So, what our agent is

design, right? So, what our agent is doing in this point is just architecting our beautiful building and we're going to put execution scripts to live in there instead. This obviously loses a

there instead. This obviously loses a fair amount. I mean, this takes our

fair amount. I mean, this takes our agentic workflows and changes them back into traditional workflows or procedural workflows. It means that they can't

workflows. It means that they can't adapt to unexpected situations on the fly. They also can't self-anneal or ask

fly. They also can't self-anneal or ask clarifying questions when things get weird. You know, you are going back to

weird. You know, you are going back to that old school traditional automation behavior and it just does exactly what you told it to do. Nothing more, nothing less. But if you think about it, by the

less. But if you think about it, by the time your workflows deploy, they should be pretty battle tested as I was mentioning earlier from having run dozens of times locally and you've probably already worked out all the

kinks in your IDE locally where the debugging is easy. So if something breaks, you are still going to get error notifications. And the really cool thing

notifications. And the really cool thing is you can just fix it with your agent.

If you're using a modern platform like modal, um models can read the errors from modal really easily. So you can actually just say, "Hey, this workflow I think is broken, fix it." And I can actually just do the debugging process for you. So you get all of like the

for you. So you get all of like the ability to debug and stuff like that.

It's just you're not doing it on like a live loop because if you were doing it on a live loop, results, assuming that it doesn't do what you wanted to do, could be catastrophic, go all over the place. And I mean like I could sit here

place. And I mean like I could sit here and I could give you guys a way to do this that includes the orchestrator directly in the uh environment. I could

have the agent actually like listening and constantly modifying things. But

I've tried this now in a in a few actual businesses. And despite the fact that

businesses. And despite the fact that it's very shiny and it's very sexy and people like, "Wow, I can just query my LLM um you know on some cloud container somewhere and have it do whatever I want via web hook." Despite the fact that it seems really cool, we're just not there

yet. I'm pretty sure we'll be there some

yet. I'm pretty sure we'll be there some point in the next couple of years, but for now we're just going to leave the orchestrator out of it completely and basically just use our agentic workflow building skills to build APIs really quickly that we can then call. So the

platform that I use for all this is called modal. Modal is not the only

called modal. Modal is not the only platform out there. There are many others like trigger.dev etc. I'm not associated with any of these. Um but

modal is just a good product.

Trigger.dev is a good product. We've set

up some workflows there and there are a couple of other builders too that like essentially do this function. But

essentially the way that u modal works is it's really simple. You just take a Python script and then you turn it into a cloud function. It's also pay-per-use.

So when your workflow isn't running, it'll spin down and it'll cost nothing.

You'll get a web hook URL just like you would from make or nad. And it's also very cheap, especially for Python based execution scripts. They gave me $5 of

execution scripts. They gave me $5 of credits the beginning of this month and I think so far I've used like 3 cents.

So very very very affordable. The best

part is you don't need to know anything about any of these platforms to be honest. They're built for agents and so

honest. They're built for agents and so agents know how to crawl them and traverse them and set things up really easily because their documentation is fantastic. All I really had to do in

fantastic. All I really had to do in order to do this, which I'll show you in a moment, is say turn this into a cloud function. And then it did everything

function. And then it did everything else. Now, the web hook URLs that modal

else. Now, the web hook URLs that modal gives you can be called from anywhere, including by other agents. And then it also allows people at regardless of whatever skill level you are to set up this sort of web hook or event- driven

flow. It's sort of like nadn or make.com

flow. It's sort of like nadn or make.com or you know gumloop or zapier any one of these platforms these will expose these little web hook urls right and you take these web hook

urls and then you give them to services like I don't know um clickup or instantly or pandadoc or whatever the heck you want right well this is exactly what modal does it's just instead of giving it to you in sort of this visual

way um we just do it through natural language we're like hey set this thing up and then give me a web hook URL so that I can call here's what the request body is going to look like. Cool. We

done. Awesome. Thank you very much. That

said, wanted to take a couple steps back here just in case people didn't know what web hooks are. If what I just said made no sense to you, that's okay. I'm

going to cover it. First of all, a web hook is literally just a URL that triggers your workflow when something hits it. So, an external system like a

hits it. So, an external system like a CRM or website form or make or n can actually just call a URL like this automatically. It's just like a

automatically. It's just like a doorbell. When somebody presses it, your

doorbell. When somebody presses it, your workflow will wake up and run. Um, you

don't necessarily have to be there to do it. If you guys have ever done any home

it. If you guys have ever done any home automation stuff, any sort of like, I don't know, switches or whatnot, it's the same it's the same idea. There's

like some URL somewhere, some destination, it could even be your website, and when somebody visits it, it triggers something that does something else. Obviously, the something else in

else. Obviously, the something else in this case is going to be our automated workflow. If I had a URL like this,

workflow. If I had a URL like this, let's say it's my nick-thbot.webhook.com,

nick-thbot.webhook.com, I could do anything with this URL. Like

I I could literally just like enter this into my browser and press enter, and it would trigger a flow. Or I could send an HTTP request which is um like a web request through make.com nada and any other noode builder. I could do it through my terminal. I could do it

through an agent. But basically this is just a destination on the internet.

Okay, that's like a node and when somebody accesses the node, this thing does some logic and depending on whether or not the node input fits its specifications, it'll continue and then call whatever the heck you want. So web

hooks really are just like URL with some logic attached to them. That's more or less it and they're very very common in any sort of automated scenario. All

right. So, what is the agent doing behind the scenes in order to set this up for you? Well, it'll review our agents.mmd and our claude MD and our

agents.mmd and our claude MD and our gemini.mmd and so on and so forth. Just

gemini.mmd and so on and so forth. Just

to understand the setup first, ideally somewhere in there, you would say, "Hey, you know, as part of your work, one of the things you do is you set up cloud web hooks or cloud scheduled workflows on modal. Here's how to do so." What it

on modal. Here's how to do so." What it then does is it looks at your existing execution scripts for the workflow that you want to deploy. It'll wrap

everything in a simple format that modal really likes proper decorators and whatnot and then if there are any prompts or API keys or whatever it'll actually like ask you for them although I find most of the time it's plug-and-play it's just like oh you know

I have the keys let me convert them into modals format once deployed you get a simple URL this is the you know node that it calls um this is the phone number that other systems can give a ring in order to make something happen

and then in whatever service you're using because this is obviously being triggered by some service by some notification from Slack or some some incoming web hook from instantly or whatever, you just give them the web hook URL. And a lot of the time there's

hook URL. And a lot of the time there's like a field or something and it'll say, "Hey, what's the web hook URL you want us to send results to?" And then you just put it there. The request just needs to match the format that the agent expects. It's usually in what's called

expects. It's usually in what's called JSON or JavaScript object notation. You

don't actually need to know JSON nowadays. Um, all you need to do is be

nowadays. Um, all you need to do is be able to recognize it. Typically starts

with some curly braces and then when your agent sees this, um, you know, you can just copy and paste whatever you see in the web hook documentation. It'll go

from a demo to actually doing stuff really, really quickly, which is fantastic. If you don't know how to

fantastic. If you don't know how to connect stuff, you literally just ask, "Hey, how do I set up, you know, ClickUp to call this web hook when a new lead comes in agent or Claude or Gemini or whatever you're using, we'll actually walk you through all that step by step, especially if it's a platform specific

UI thing. I find a lot of the time

UI thing. I find a lot of the time they'll just pick, oh, um, here's the link. Just go to this link and then

link. Just go to this link and then you're done." You don't need to spend

you're done." You don't need to spend hours Googling stuff or chatbing stuff.

This is exactly what the tools are good at. So, don't sweat it. And to take that

at. So, don't sweat it. And to take that one step further, if you wanted to, instead of making it web hook driven, have it schedule driven, you just use something called cron. Um, again, this is something that's very native that is supported by Modal and our agents out of

the box. Instead, you just say, "Hey,

the box. Instead, you just say, "Hey, can you run this thing at, you know, 5:00 p.m. every single day, and it'll do

5:00 p.m. every single day, and it'll do it. No complex configuration. You just

it. No complex configuration. You just

describe when you want something to run.

It'll handle all the syntax and deployment details." That's just kind of

deployment details." That's just kind of annoying for me because I spent a lot of time learning cron way back in the day when I wanted to schedule simple things.

But, um, yeah, it's just like setting a recurring calendar reminder. You're just

doing it for your workflows. So, God

bless the fact that we are at this point where technology can do all that for us because good lord do I not want to have to learn another scheduling syntax again. Okay, so some example prompts.

again. Okay, so some example prompts.

You just say, "I want my weekly workflow report to run automatically every Monday at 9:00 a.m. It'll actually set up the cron for you. Deploy it to modal and so on and so forth." You know, agent will figure out the rest. Whatever your

timing is, whether it's every minute, every hour, every year, every 2,000 years, whatever, like you can set this stuff up really, really easily. Don't

sweat it. Um there is some like misunderstanding usually in modal about like API keys and tokens and credentials and stuff like that. Um inevitably you will need obviously to connect one platform to another and there is always

going to be some inherent risk in uploading a secret to the server. So

just keep that in mind. By making things cloud accessible you are introducing a little bit of risk. You're basically

setting up a server on the internet right like anybody can theoretically access it if they know your credentials, password, whatever. So your agent will

password, whatever. So your agent will prompt you naturally. It'll say hey this script needs your Apollo API key. Should

I use what's in your env? All you do is you just say yes. You just say no. You

say hold on, use this one instead or or whatever. The way that modal works

whatever. The way that modal works really is they will store these credentials as an encrypted secret which is separate from your code and then the credentials only actually run when somebody calls the the web hook. So it's

never actually like in the codebase or whatever. It's kind of similar to how we

whatever. It's kind of similar to how we separate our code from thev file in um you know our IDE. Very very common. It's

not specific to Asian workflows, but yeah, it's the same way that professional engineering teams do this sort of thing. And then what happens to your IDE is it basically just becomes your command center. I mean I obviously do both um cloud workflows and then I

also do local workflows. And I actually just like have all of them operate from my IDE. Like I will say hey run this

my IDE. Like I will say hey run this workflow and it'll be like okay this is a cloud workflow so I'm going to call this web hook URL. Then it'll actually create its own request and then send it to my own server which is kind of cool.

Um although keep in mind that when you do that as I mentioned earlier you will remove the agentic kind of part the self- annealing and so on and so forth.

What's really cool though is your IDE helps you get this done too. And then

what you end up with is you actually end up with specific agentic workflows made to automate the process of uploading things to modal which is pretty sweet.

What are my recommendations around when to actually turn something into a cloud workflow? Um just scheduled workflows.

workflow? Um just scheduled workflows.

If you guys have stuff that is like a daily report or a weekly summary or some sort of like recurring scrape or HTTP request, like you can do that in modal, no problem. If it's event triggered, aka

no problem. If it's event triggered, aka um it's very timely, you need to do something within a few moments of some other requests coming in, then set up the web hook functionality like I talked about and then boom. But if it doesn't fit one of these two categories, believe

it or not, probably is best to stay local. If it does not need to run when

local. If it does not need to run when you're not around, it's probably better to like run it while you are around because as I mentioned, these agentic workflow things, they uh they multiply your leverage like crazy right now, right? But they also multiply the error

right? But they also multiply the error bounds. So you should probably be around

bounds. So you should probably be around to see in case it does something you don't want it to do. Now, if you're just hanging around by your computer for 3 or 4 hours a day or whatever, keep in mind you are now doing like 3 or 4 hours a

day of work, keep in mind that like you are now capable of doing 30 to 40 hours of work in the 3 or 4 hours with aentic workflows. Um, so it's not like you're

workflows. Um, so it's not like you're really losing too much here. You're

multiplying your leverage as all technology is done. But there are of course some instances and automations where you just always want to run the thing automatically and and that's that's what this is for.

Last thing I really need to mention about this is logging and monitoring.

Now, if something happens in your IDE, it's typically pretty easy to see where it went wrong. Why? Because you have little reasoning windows that you can pop open, right? It's very easy for you to like see and poke around and be like, "Okay, I could see that there was a

problem here with this HTTP request and so on and so forth." But right out of the box, um, in the cloud, you don't have access to that and most of this logging functionality is not around. So,

cloud deployments don't have that. What

that means is your agent action needs to explicitly force the logging in the code. It won't always be able to do this

code. It won't always be able to do this and um when it can't do this, the debug process can take quite a while. That

said, okay, if you learn how to build in some form of observability, that's what this is called in programming. I'm in

from the start, it becomes a lot more straightforward. My own personal

straightforward. My own personal monitoring setup is I actually have a dedicated Slack channel called Agentic-Cloud-LOG for all cloud workflow updates. So every

time a workflow runs, it'll actually automatically send an update to my own Slack channel letting me know if it was successful or not. I have like a pretty superficial highle version of interpretability now and observability.

If something happens, I know that it worked. If something doesn't happen, I

worked. If something doesn't happen, I know that it didn't work. Uh it's not as like super in-depth as it could be, but it's simple enough that I could just look at that and then go to my agent and then say, "Hey, you know, I noticed this thing isn't working. Can you double check to see what's going on?" And then

it can do its loop on its own. I don't

need to be around. And then, you know, I can continue working on something else while it does that. But if I didn't have this, if I didn't know, then obviously that would be a problem. I've seen some ways that people have built automated

systems where they will um automatically take an error notification and send it back to another cloud, a claude or Gemini or, you know, GPT 5.2 instance or something like that and basically say,

hey, there was some error with this thing. Fix it. And it'll just like do it

thing. Fix it. And it'll just like do it completely autonomously. I think that

completely autonomously. I think that stuff can be kind of cool. Although,

keep in mind like most people aren't building like 3,000 web hooks a day, right? So that's usually not the actual

right? So that's usually not the actual bottleneck. the bottleneck is more like,

bottleneck. the bottleneck is more like, you know, why are you building this webbook in the first place? So, I don't really want to like mislead people here and have them build these cool automatic self-fixing loops when it doesn't really matter all that much in the first place.

Not to mention like the probability of it actually entirely fixing itself without introducing more errors is pretty low. And you know, I hopefully

pretty low. And you know, I hopefully you guys understand what I'm trying to say. Okay, so pretty easy to do that.

say. Okay, so pretty easy to do that.

You just say, "Hey, when you deploy to modal, make sure to add logging that sends me a Slack message every time it runs. Here's my Slack web hook URL." If

runs. Here's my Slack web hook URL." If

you don't have that, you can ask it, hey, get me a Slack web hook URL. If

you're using Discord or something, you do the same thing there. If you, I don't know, want a text message or an email address, you can obviously set that up on your end as well. Pretty

straightforward. I also say stuff like, "Hey, could you give me a status check on all my modal deployments? How are

they going?" It'll go through all of the modal deployments, run through their logs. Um, it has access to its API. As I

logs. Um, it has access to its API. As I

mentioned, the docs are pretty straightforward. And so, you end up just

straightforward. And so, you end up just getting everything that you need from a a check-in like this. So, you can do it manually, you can do it based off of like some Slack notification, you can do it based off the email notice that you get. There are a lot of um ways to error

get. There are a lot of um ways to error handle this. The reality is you just

handle this. The reality is you just need to like know to do this. If you

don't do this, you're going to have a bad time. In the future, we will have

bad time. In the future, we will have cloudnative agents, right? Instead of

leaving the orchestrator out of this, we're going to actually be inserting the orchestrator in. And so, we're going to

orchestrator in. And so, we're going to minimize that agent accuracy as models get more intelligent and people design better frameworks to deal with us. It'd

be pretty cool, right? If you think about it, what you could do is you could just send a natural language query to, let's say, nyx-agent.com.

This is my agent, with a question mark, which is a query parameter that says, "Run the lead scraper." It would then go through the agent PTM MRO loop. It would

do planning. It would do tool use. It

would check its memory. It would do some reasoning and reflection before finally doing the orchestration. But as I mentioned, now we're just at the point where the error bars are a little too high. It will be pretty cool though

high. It will be pretty cool though because once you're done with that, you'll be able to set up a whole ecosystem of just cloud agents that talk to each other and hang out. So, you

know, you'll have one agent here, Nick's agent, then you'll have Peter's agent, and then Sam's agent. Then Peter's agent will say something Nick's agent, which will query Sam's agent for more information. and they'll decide on

information. and they'll decide on something together and then I don't know, you could even introduce payments into this sort of structure and more.

So, early versions of this do exist today. I published some videos exploring

today. I published some videos exploring some of them. Just check out my channel.

They're just a little too high risk right now and it just doesn't really make too much sense to do that all yourself. Okay, so I'm going to walk you

yourself. Okay, so I'm going to walk you through actual modal web hook deployment. Now, I have a bunch of

deployment. Now, I have a bunch of prompt templates and stuff like that.

You can obviously get all of that stuff in the link at the very top of our description. Um, let's actually go

description. Um, let's actually go through setting up uh web hooks in modal. All right, now let's talk about

modal. All right, now let's talk about how to take your directives that are inside of your IDE and then put them on the cloud, specifically on a service called modal.com. Now, in case you guys

called modal.com. Now, in case you guys were unaware, modal is basically what's called serverless infrastructure, which is where they have these virtual servers that they spin up on demand on the fly

every time that you want them to do something. What's really cool is most

something. What's really cool is most the time these serverless infrastructures sort of bend into one of two camps. One is they're like online

two camps. One is they're like online all the time and then they're always charging you some usage per minute, second, week, month, whatever. The

second is they're offline, but then they have to start. This is termed a cold start. And cold starts typically just

start. And cold starts typically just take a lot of time and energy. So that

if you have a flow that requires like instant reaction like a lot of the uh you know executions that you realistically want to host in the cloud um you know it takes a fair amount of time and you don't actually get it instantly. You get it after like a

instantly. You get it after like a minute or two. So, what's really cool is modal solves both both of these problems. And what you can do is you can just take the execution scripts that you developed and then put them on modal so long as you have the right system prompt

uh and have it work essentially instantaneously. So, what you do is you

instantaneously. So, what you do is you create an account on this service and I should note that I'm not affiliated with them. Do whatever you want. There are

them. Do whatever you want. There are

variety of other ways to do this, but this is definitely the simplest one.

They give you a bunch of free credits, at least as of the time of this recording. And it's worth me noting that

recording. And it's worth me noting that I've used Modal now for like at least two weeks, maybe three, and I've used 4 cents out of the $5 available. Like

realistically, you're not going to run out of this credit usage. Um, just as a test. I can't imagine how much $30 in

test. I can't imagine how much $30 in free credits would take you. If you're

just using like a Gentic Workflow for yourself or for like a small to-size business, this will take you really far.

So, it's I mean, not free, but it's virtually costless. Once you're done,

virtually costless. Once you're done, because we added all the information into our um cloud MD and our agents MD and and so on and so forth. If we want to push one of our flows to Modal, it's actually really easy. All we need to do

is just get some authentication going and then obviously find the specific flow that we want. So I want to do the create proposal. I'm going to speak to

create proposal. I'm going to speak to my agent. Hey, I'd like to create a

my agent. Hey, I'd like to create a modal web hook for create_proposal MD. I

basically just want to be able to replicate the functionality of that and just do it on the cloud instead.

Get me a web hook URL for this.

So now it's going to go through read my pre-existing system prompt which will include a bunch of information all about this. All right, this is almost done

this. All right, this is almost done working through the modal web hook. As

part of the system prompt, we set up what's called a web hooks.json. This is

just a giant list of all of the different web hooks we have. I should

note that before it was empty, so all we did is we just populated it. Now getting

some information about the web hook that we set up and it looks like it was deployed successfully. So, we actually

deployed successfully. So, we actually have a web hook now available at this URL here, nick- 90891-cloud- orchestrator-

directive and so on and so forth. It

looks like this takes all of our information in as follows. So, I mean like we could hardcode all of these. We

could also have AI generate them. So,

what I'm going to do is I'm actually just going to have it run. Okay, great.

Could you run a brief example then return the URL when it's done? Okay. And

it looks like at the end of it, we got our proposal which is right over here.

Let's take a look and see how it did.

Demo Corp AI automation pilot has some brief problem areas, has some brief solution areas. You guys remember we um

solution areas. You guys remember we um built this earlier on in the course. And

uh yeah, we now have essentially an automated proposal generator. Obviously,

I wouldn't just like send an HTTP request to this with this information.

This is a little bit short. I'm not

going to call something demo corp, nor am I going to call uh manual data entry taking 20 hours per week. I'm going to go in a lot more detail. So just for the purposes of this, I'll say great, please update the documentation. Every time I

call this, I want to make sure that the demo that I'm providing is really complete. So lengthen the paragraphs for

complete. So lengthen the paragraphs for the benefits and the solution statements. Make things longer in

statements. Make things longer in general and significantly more realistic. Then rerun the test.

realistic. Then rerun the test.

And opening up the new proposal. Let's

see what this one looks like. Cool. I

mean, we did write uh I guess it took my description of long to mean that we should write the title long, too. But

these look significantly better. Check

this out. We now have way more customized information here. Yeah, this

is uh much much better. Awesome. So,

that's great. So, what did we learn today? We learned that it is actually

today? We learned that it is actually really easy to set up a web hook. All we

really need to do is we just take our flow which um you know in our case was the creation of a proposal and then send it to our agent alongside um some system prompts that describe how to upload

agentic workflows to the cloud.

Obviously we need to add our documentation and so on and so forth.

Really cool thing about modal is it's just one click takes like two seconds.

You just go get your modal API key and then post it in here. It'll ask you to do so. In terms of how to create the

do so. In terms of how to create the token, you just click on that new token.

The token secret is on the right. So

that's what you copy and then you just paste it directly in here uh when it asks you for the modal token and boom, you're done. And yeah, that's how to do

you're done. And yeah, that's how to do it with web hooks. Okay, now that we've set that up, let's actually go through setting up scheduled um triggers in modal as well. This is different from web hooks obviously because now we wanted to do so on a schedule, not just

like based off of some event that comes in. So last time we did this with web

in. So last time we did this with web hooks. Let me show you instead how to do

hooks. Let me show you instead how to do it with some sort of schedule trigger.

Maybe instead of running this via web hook call, what I want to do is I want to run a really simple workflow, probably some lead scraper or something like that, uh, every 5 minutes. So, what

I'm going to do is I'm just going to tell it which thing I want to run and then how often I want to run it. And

then everything baked into the system prompt is super easy and it'll just tell Modal to run this using what's called cron. Hey, could you send a welcome

cron. Hey, could you send a welcome email to nickleclick.ai every 5 minutes and I want you to set up a modal cloud scheduled trigger to do this for me automatically.

Cool. So now it's setting up the modal scheduled function to send the welcome email every 5 minutes. First it's going to check the existing schedule function pattern. Realizes that there is no

pattern. Realizes that there is no schedule function pattern. So now it's just going to add some scheduled welcome emails. Cool. And now we have it.

emails. Cool. And now we have it.

Scheduled welcome email is live.

Schedule every 5 minutes. So that's what that looks like in cron. What we're

going to do now is we're going to send.

What's really cool is when you add them, you can actually see the the various schedule triggers. So, there's one here

schedule triggers. So, there's one here with a little clock icon that says every 5 minutes UTC. If I click on this, you'll see that there are no scheduled calls um that have gone out yet, but there is one in 1 minute and 9 seconds.

And modal's cool because it actually allows you to run in between a schedule.

So, you can just click on that little run now button, and when you click the run now button, it'll actually do the thing. You can see here that it took 3

thing. You can see here that it took 3 seconds to start up the server and 1.47 seconds to actually send. Finally, if I go to the email address that I specified, you can see that it's

actually sent the email. I mean, in this case, I just used a basic kind of onboarding email template, or rather, it created an basic onboarding email template. If I wanted to update this, I

template. If I wanted to update this, I just tell my agent, hey, you know, change this so that it's like a welcome email from whatever to whatever. I could

even give it a template. I could give whatever I wanted to.

And just so that you guys could see it actually run, I'm just going to wait until this counter goes down to zero so you guys see what occurs when you set up a schedule. It's pretty straightforward.

a schedule. It's pretty straightforward.

I mean, at the end of the day, since we're no longer using directives in our cloud um, you know, servers, all we're really doing here is we're just running a Python script, right? Because it's a Python script, these things execute nearly instantly. And that's really,

nearly instantly. And that's really, really helpful rather than, you know, have to wonder about whether or not this thing is sent, rather than have to wait a really long startup time or send and receive things to or from Anthropic, we

execute pretty quick. And as you see, because we just finished the previous query, I think within like 3 or 4 minutes or something like that, we didn't even have to wind down the server. So, this one took 0 milliseconds

server. So, this one took 0 milliseconds and this execution time um was under 1 second. So, I mean, we just did this

second. So, I mean, we just did this whole thing in like less than a second flat, which is really cool. Heading back

over here, you see that we now have the same email. This is your scheduled

same email. This is your scheduled welcome email. And then we also have

welcome email. And then we also have that 5-minute block that we talked about. Uh it's almost 1000 p.m. UTC,

about. Uh it's almost 1000 p.m. UTC,

which is why that time says that. Cool.

So, hopefully I've convinced you guys that setting up these sorts of web hook based triggers and schedule based triggers is actually really easy. That

definitely isn't the bottleneck here.

Before with uh no code platforms like Zapier and NADN and make.com and stuff like that, you had to be a lot more precise. Now you just get the URL and

precise. Now you just get the URL and what can we do with the you know web hook URL? Well, now I can just connect

hook URL? Well, now I can just connect it to whatever service I want. I could

very easily set it up so that let's say when one of my prospects moves to the send proposal stage in my ClickUp CRM for instance, which by the way I can control completely um agentically using

the agentic workflow that I set up previously as an example. uh you know we then trigger the web hook and maybe that occurs automatically as well. And so in this way we build a full endto-end completely automatic flow with web hook

URLs that I could share within my organization or give to other people.

And that's it. You now know how to build workflows that essentially run without you. The next step is to take this to

you. The next step is to take this to the next level. Right now we've been running agents sequentially which just means one at a time. But imagine a future where you could actually run multiple agents simultaneously. That's

what this next chapter is going to be about. It's going to be about

about. It's going to be about parallelizing your work to multiply your output. Essentially, you're going to go

output. Essentially, you're going to go from one employee to a whole team.

Instead of doing things like this where you finish task one and then you do task two and then you do task three, we're actually going to in one fell swoop actually do tasks one, two, and three.

Then we're just going to recombine the outputs. And we can um do this

outputs. And we can um do this arbitrarily basically all the way to n service workers or threads or or or instances of an agent so long as you set up the environment right. Okay. Okay, so

how do you set up multiple agents simultaneously? Well, spoiler alert, all

simultaneously? Well, spoiler alert, all you're really doing is just opening multiple terminal instances. Nothing

super magical here. In VS Code or anti-gravity or any terminal based workflow, they all provide you the ability to open multiple panes, which allows you to run Gemini, GPT, Cloud Code, whatever the heck you want in

different terminal windows. My favorite

way to do this right now, and sort of my optimal, is three. I don't really work with more than three simultaneously unless we're doing long background tasks just because I find that my attention starts wavering and I start losing effectiveness at like remembering what

the heck I'm doing. I always just do this vertically, left, middle, and right. I'll show you guys examples of

right. I'll show you guys examples of all that stuff in a minute. So instead

of just doing all of this within a single IDE, you can also be kind of smart about it. Uh most models are basically at approximately the same level right now. Like if this is three different models, they're basically all capping out at similar levels of intelligence. There are model

intelligence. There are model differences between them, but most of them are trained in the same data, trained in similar ways, and so they're all kind of like reaching same levels right now. So if you find yourself with

right now. So if you find yourself with an IDE or a model, I should say like um Gemini within anti-gravity that is stricter rate limits or higher costs, instead of running like three instances of let's say Claude against each other,

you could run one instance of Claude, then you could run one instance of Gemini, and you could run one instance of like GPT 5.2 or something. By doing

all this stuff simultaneously, the frontier models will remain at a similar intelligence level. You're also going to

intelligence level. You're also going to get some slightly different ways to do work which can be beneficial for you if you're still in the building stage or the doing stage not necessarily running this sort of stuff um really high scale and then because we have the same

initialization files agents MD cloud MD Gemini MD etc there's no functional difference for the model as a result instead of let's say like this is the the the threshold here where you know

you pay $200 a month for the plan of claude I think this is like a the claude max plan or something like that and then you have to pay another I don't know $100 in credits after you hit this threshold, right? So, instead of being

threshold, right? So, instead of being like this, what we basically do is we get to use three models instead and keep them below that threshold the entire time. I'm going to show you guys this

time. I'm going to show you guys this and a bunch of others um in anti-gravity and then uh you know, have you guys run through practical ways to do this. Um,

another thing I wanted to mention was practical limits on parallel agents. So,

I find that in practice, two simultaneous agents is probably the average baseline that I like sticking at. Four agents is what I consider to be

at. Four agents is what I consider to be my soft max before things start getting counterproductive. Like it seems really

counterproductive. Like it seems really cool when you have a million tabs open and all these agents are working on things. You feel like a superpower,

things. You feel like a superpower, right? But you're not actually being

right? But you're not actually being productive. You're just feeling

productive. You're just feeling productive. So instead of like being in

productive. So instead of like being in a situation like that where most of the agent time will actually be spent waiting for you to like see the tab and like do something with it. I want you guys to know that feeling busy is not the same thing as actually being busy.

Feeling productive is not the same thing as being productive. So this is a good way to just like help monitor that. I

stick to three to four. Any more than that, you're probably just shooting yourself in the foot. Okay. Okay, so

I've talked a little bit about this before, but you know, when you don't know how to build a workflow, you have a couple of approaches here. You can

obviously just say, "Hey, can you build a workflow for me that does this?" And

it's like a first pass. That's fine. But

an advanced way to do it is actually say, "Hey, can you give me three approaches to build this thing?" What

you do is you take those three approaches and you give them to either separate models or separate instances.

Then what you do is once they're all done, you test to see which one scores the best. So maybe this one here scores

the best. So maybe this one here scores 75%, this one here scores 84%, this one here scores 99%. What are you going to do? Obviously you're going to use this

do? Obviously you're going to use this one, right? This one's the best

one, right? This one's the best combination of speed, cost, accuracy, and so on and so forth. In doing this, rather than having to um, you know, get a subpar solution and then slowly like make a bunch of changes to get to this point. You can actually just run these

point. You can actually just run these three agents in parallel and get three times the total search space instead of like manually going through this process one by one by one. I want you to imagine dividing this into three sections,

having three of these little snakes go at the same time, which is just much, much faster, and then ultimately build something that is way better and way more scalable. How do you do this?

more scalable. How do you do this?

Really straightforward. Just send that brief list of bullet points describing what you want to build to one agent.

Then say, can you generate three distinct approaches with in-depth steps for each because I'm going to send this over to another model. Also, give me some pros and cons so I can understand the trade-offs up front. And you know, this will take you a few minutes up front, but it'll also save you a lot of

time because if you go with a subpar solution initially, two or three hours down the line, you may still be working out some bugs or kinks or ways to make things faster. Whereas, if you just

things faster. Whereas, if you just started with the right architecture right off the bat, you would have had all that stuff solved. Once you're done with that, it's pretty easy. Just open

three separate instances of your agent, one for every approach. Give each agent a dedicated working folder. I like doing this in TMP. So I do like uh temporary folder SL1 temporary folder SL2 temporary folder SL3 and actually just

copy a prompt and I'll say hey you're currently working in this folder. The

reason why is because we're creating three copies of a similar build with three different approaches. I want to do it here so that we're not, you know, crisscrossing files and so on and so forth. I'll show you guys a brief

forth. I'll show you guys a brief example what that looks like in a moment. Once you're done, you just

moment. Once you're done, you just review all three outputs side by side.

Pick your favorite approach based off the actual results and the theoretical assumptions. Then you move the winning

assumptions. Then you move the winning solution into DO or whatever it is that you're using, cloud skills and so on and so forth. Once it's moved over, you

so forth. Once it's moved over, you obviously also have to retest everything. And the reason why is

everything. And the reason why is because if you don't retest everything when the files are moved over, there may just be some issues with file references and that sort of thing. So this lets you do three builds in the same amount of

time. Best one wins. You can obviously

time. Best one wins. You can obviously do exactly what I'm talking about, not just for the building, but also for the doing. You can run dozens of agents. And

doing. You can run dozens of agents. And

there are also things like background tasks which allow you to run agents sort of like in the background so that you could still do something else in parallel on top of it within a single thread. So I've talked a lot about

thread. So I've talked a lot about building agentic workflows until now.

But what I wanted to do here is just give you guys a brief demonstration of what using agentic workflows looks like in my day-to-day. So to be clear, I personally do a few things with my

day-to-day. Number one is I run

day-to-day. Number one is I run leftclick which is a growth/outbound AI enabled agency. We basically help you go to market for a product or service or

scale up an existing product's outreach using AI and lead scraping mechanisms like you see here. We let you build completely autonomous outbound pipelines that don't rely on you or your team. You

just end up with a bunch of booked meetings to sell your service in you or your salesperson's calendar. The other

main thing I do is I create content like this. So I make YouTube videos. I write

this. So I make YouTube videos. I write

big long guides on how to, you know, build with agentic workflows and stuff like that. And so I'm constantly

like that. And so I'm constantly juggling between these two things. The

third thing is I run a school community, actually a series of school communities.

One called Maker School over here and one called Make Money with Make over here. And so I have a fair amount that I

here. And so I have a fair amount that I have to do on a daily basis as I'm sure you can imagine. You know, I have to do things for Leftclick that are kind of older school agency things. I need to create proposals and, you know, I need

to scrape leads from my clients and onboard them and stuff like that. Then I

have to do things for school like I have to manage replies. I have to, you know, send and receive DMs. I have to answer people's questions and so on and so forth. Plus, I have to do things for

forth. Plus, I have to do things for YouTube, like I have to create scripts and monitor YouTube for competitors and stuff like that. So, let me just give you a brief example of what me doing all three of these things simultaneously would look like in an Agentic workflow.

So, the first thing I'm going to do is I'm going to have this run through basically my end to-end agency flow using a demo kickoff call transcript that uh I'm pulling up from my TMP

folder. This is just plain text. Um, you

folder. This is just plain text. Um, you

know, I could pull this up from like Fireflies or any other like transcription tool if I wanted. I've

just stored this plain text inside of TMP for simplicity. So, I'll say run the post kickoff flow for demo kickoff call transcript over here. you know, maybe I'm just

over here. you know, maybe I'm just getting started for the day and I want to see what sorts of YouTube outliers there are. Uh, with those YouTube

there are. Uh, with those YouTube outliers, I'm going to be able to ideulate a new video or something like that, come up with an outline and so on and so forth. So, I'll say run the YouTube outlier workflow and find me

between 10 to 20 outliers for agentic workflows.

This is what I'm going to be doing a fair amount today because, as you guys could see, I'm recording a video on agentic workflows and, you know, it's sort of like the hot topic now. And over

here on the right, I'm obviously managing my school community. And so I built up some agentic workflows to help me pull relevant questions and comments and stuff like that from school. Pull

the top 10 most recent school posts from Maker School. And so now I have these

Maker School. And so now I have these three clawed code instances basically running in the background for me. And

all I'm going to do as somebody that is, you know, attempting to be economically productive is I'm just going to sit here and then watch over these and then, you know, add and chime in where necessary.

So over here on the left hand side, it's asking me some simple questions. I'm

just because I'm doing a demo here, say Nick at left uh leftclick.ai AI

do the lead genen with modified query and then everything else too. Cool. Over

here on the right hand side I see that we're done with my school post. So now I have a bunch of information about this.

Looks like Suam recently posted a cold email guide. So I'm going to say Suam's

email guide. So I'm going to say Suam's cold email guide. Run me through his step by step. This over here in the middle is using the tube labab API which is part of one of the agentic workflows

that I put together to go and then scrape me a bunch of um outliers. So one

of our members was kind enough to share with us how he made $500,000 in about 6 months or so using instantly which is a cold email tool and then a lot of the same um you know principles that we talk

about here. So he ran through and

about here. So he ran through and actually provided a ton of info and I mean I'm just curious what that looks like. I could of course use the school

like. I could of course use the school UI. I could log into school and then

UI. I could log into school and then scroll through the post myself and stuff like that. But I set up an agentic

like that. But I set up an agentic workflow to do this. Why? Because it

becomes really easy to do really cool things with agentic workflows inside of school. Like hypothetically, I get a lot

school. Like hypothetically, I get a lot of questions, right? And what I did was I built a rag or retrieval augmented generation uh tool that essentially looks every time somebody asks a question to see if something similar has

been answered in the community before.

If so, it actually goes and it gives me the link. Then what I can do is as I

the link. Then what I can do is as I respond to them, I could just copy the link over and say, "By the way, if you want a much more detailed explanation, check out this post or so on and so forth." So, what I'm seeing here on the

forth." So, what I'm seeing here on the cross niche outlier sheet is it's looking like we're not including all um AI based uh results. And that's probably because realistically there just aren't

any competitors for agentic workflows yet because I've kind of coined the term. So, that's great for me. What I'm

term. So, that's great for me. What I'm

going to do now is I'm just going to have it run some sort of outlier scraper for terms like AI agents instead. That

should give me a fair amount of stuff to work with. Anyway, on the right hand

work with. Anyway, on the right hand side here, now we're done with this.

This is great.

Fantastic.

Comment extremely valuable guide. So,

what I'm going to do is use my school system to go through this, get all of the post ID and stuff like that, and then actually send a comment on that saying, you know, excellent or extremely

valuable guide. If I open this up and

valuable guide. If I open this up and then scroll all the way down to the bottom, you can see that I just left a comment here saying super valuable guide. And so, I basically get to

guide. And so, I basically get to communicate with school, which is a service that previously required a graphical user interface, just entirely through an agentic workflow instead, which is fantastic. I'm sure future

versions of Aentic Workflows will be able to recreate the UX any flavor or way that I want, but for now, this is pretty cool for me. I don't mind. Over

on the left hand side, you can see we came up with 15 leads. The reason why I did 15 and not say 1,500 just because it was trying to be mindful of my token costs, knew that I was doing this as part of a demo. Um, we've actually

already gone through and and got what I think is nine emails, which is cool. And

then after that, if we scroll a little bit further down, this actually went through and uploaded leads to the campaign, which is pretty sweet. It then

even added things to a knowledge base and then even went as far as to send a summary email to my client, which in this case I just used my own email for um basically telling them, hey, you know, we're done with the campaign and so on and so forth. What's really cool

is it also gave me three links. So, I'm

just going to open up these three links, which take me directly to my cold email tool um where I can actually see the um campaigns that it came up with. So, this

might sound crazy, but hear me out. I

want to generate 50,000 in revenue for company name in the next 90 days. If I

don't hit that number, I'll work for free until I do. How? LinkedIn thought

leadership. I run a company. We spent

six years helping 200 partners at professional services firms turn LinkedIn into a revenue channel.

Counting firms, consultancies, financial adviserss, executive coaches. Our

clients regularly close 50K deals directly from LinkedIn. Some see 3 to 10x follower growth and most start getting two to three inbound leads per month once the content machine is running. I know this is bold, but I'm

running. I know this is bold, but I'm confident we could do something similar for you when you open to a quick chat.

No pressure, just a conversation. I

mean, this is just one of three campaigns with two split tests each.

Obviously, while this copy is uh I would consider very punchy and probably [snorts] higher quality than like 80 85% of all of the copy that other people are running for campaigns like this. I'm

going to like take a look at the copy, maybe make some minor changes before I actually go through the process. Um, but

it's still pretty great, right? I did

notice that there was an issue here where the Gmail MCP was not authenticated. So, um, because I was

authenticated. So, um, because I was showing you guys how to authenticate MCPS in another video here, it was a demo that I did a few hours ago. um it

unauthenticated my MCP. Obviously, if

this occurs, you need to reauthenticate, right? So, what I would do in this case

right? So, what I would do in this case would be reauthenticate MCP and then it would just go through that process together. On the right hand side here,

together. On the right hand side here, I'm going to say something like, hey, what sorts of questions have been asked in the last 24 hours that I can answer.

So now I'm going to get a list of questions the right hand side here.

That's pretty straightforward. While I'm

doing this, I'm reauthenticating my Gmail MCP. That's going to trigger OOTH,

Gmail MCP. That's going to trigger OOTH, which is pretty cool. in the middle here. We're still scraping more

here. We're still scraping more outliers.

Would you give me the highest priority ones over here? We now need to restart the

over here? We now need to restart the Gmail MCP server. So, I'm just going to restart cloud code. The new O flow should capture a refresh token. Let me

know once you've completed the browser authentication and then I will start again. Cool. So, I'm going to do is I'll

again. Cool. So, I'm going to do is I'll go new. Just going to go /mcp.

go new. Just going to go /mcp.

We'll say off my MCP, off my Gmail MCP.

Over here on the right hand side, you see some people have asked some questions. So, Emil's asked some

questions. So, Emil's asked some questions about client delivery when you're offering a lead genen system. For

how long should you sign up the client for and how long can you keep on providing new leads for the company? For

how long are you guys typically running campaigns for clients? On average, I run campaigns for a minimum of 90 days. I

didn't used to do this, but I found that 90 days was sort of the sweet spot as it typically takes some stopping and starting before you figure out the right offer combination and the right lead targeting. When I started, I went

targeting. When I started, I went month-to-month entirely. I'd probably

month-to-month entirely. I'd probably recommend that in your case just to keep friction low, but hopefully this helps give you an understanding of the various ways that you could put something like this together. And we have another

this together. And we have another question here about 400 bucks. Well,

first off, nice job on the 400 bucks.

the JSS score tanking is hard to hear.

My recommendation would be to send him a message letting him know that immediately after you finished your contract, you had a massive JSS dump. This is something

about Upwork. And softly implying that

about Upwork. And softly implying that this will unfortunately have serious consequences as to your ability to get future work. I would also ask him if

future work. I would also ask him if there's something or anything that you can do to improve that job success score, whether it's going back and providing free or additional work etc. It looks like on the third he put some

copy together. So I'm just going to say

copy together. So I'm just going to say show me the copy.

Cool. And now this is going to go through top to bottom and then send that info. What's cool is this also formats

info. What's cool is this also formats my text for me. So I can just dump all this in. It's now going to authenticate.

this in. It's now going to authenticate.

So I'm just going to head over to my email. Looks like it's successful. So I

email. Looks like it's successful. So I

can go back here. This looks pretty solid. I would probably

solid. I would probably remove the just because this doesn't offer a lot of value. If you work with

value. If you work with somebody in your niche, I would recommend that.

This is usually considered positive social proof. The would you be open to a

social proof. The would you be open to a 15-minute call about this as the last question is a little weak. I would

probably be hyper specific with the times that I'm asking for. I.e. could

you do?

Okay, over here on the left hand side we have the Gmail MCP. So I'll just say send me a hello email to nicholas@gmail.com.

Over here we have the output of our agent. So let's take a look at this.

agent. So let's take a look at this.

Looks like it's saying that a lot of these are related to ICE agents, which is sort of a political thing that's going on right now, which is why we're getting these outliers. Obviously,

that's not, you know, that's not what I'm going to be doing. I really care about looking for those outliers, but I do see some of these are more agent related. So, a agents that actually work

related. So, a agents that actually work the pattern anthropic just revealed. We

have the thumbnail right over here.

That's cool. Google Workspace Studio between these two. Sam Alman looking quite menacing.

These are pretty funny, honestly. Uh,

cool. Yeah. So, I have some reasonable outliers here, which is nice. Um, you

know, I'm probably not going to be able to do the political ones, and I'm not really making content like that or talking head, so I can avoid those. But

hopefully you guys see that, you know, now I have some outliers that I could work with that have just been released in the last few days. Um, but, you know, maybe I could start modeling my content around or something like that.

Meanwhile, the MCP now works. So, we did fix that. And then I've also sent three

fix that. And then I've also sent three um messages within school. So, I'm just going to take a little peek at that.

Cool. also just sent that just sent that and then right over here just said that and you can see it's also formatted my text for me and stuff like that. Okay,

so I don't do this because I think any of these three particular ones that I'm running are super powerful or super incredible or whatever, but these are just things that I had to do today, you know, and I just figured I would run through them with you guys. Um, this is like a practical look at this the

day-to-day work that I do within my Agentic Workflow IDE. Um, and hopefully you guys see how this is a very simple and easy way to like multiply your leverage, right? I mean, I just did like

leverage, right? I mean, I just did like a whole endto-end workflow for uh, admittedly a demo client, but a demo client nonetheless on the lefth hand side. In the middle, I ran like outlier

side. In the middle, I ran like outlier detector and on the right hand side, I even interacted and engaged with school posts much faster than I could do manually. Um, that auto automatically

manually. Um, that auto automatically formatted my text, found like good questions for me to answer and so on and so forth. You guys can use Agentic

so forth. You guys can use Agentic Workflows in your ID in the exact same way for whatever the knowledge work is that you need to do. Whether you're

copyrighting campaigns, whether you're scraping leads, whether you're just like organizing your CRM or adding things to a record, like it is now entirely possible. And I hope you guys also see

possible. And I hope you guys also see that there is a split between the building of a workflow and then the using of the workflow. The building is something you do once and then the using is an opportunity to make a return on

investment on the building time over and over and over and over again basically every day. I don't really think it's a

every day. I don't really think it's a far cry to say that most people could probably automate 50% or more of their day-to-day work using flows like this and at minimum at least make it 50% more

enjoyable or easier to do. So, next I want to talk a little bit about sub agents. Why sub agents? Because context

agents. Why sub agents? Because context

windows fill up really, really quickly.

Most people don't realize this, but current models have a context window of around 200,000 to around 1 million tokens in certain instances. And that

sounds like a lot, but when you add tools, all of this context disappears much faster than you would think.

Specifically, detail oriented tasks burn through context really quickly because of that loop that I was telling you about. Debugging burns through context

about. Debugging burns through context very quickly because of the loop I was talking to you about. Any sort of MCPs burn through context really quickly. And

before you know it, half of your whole context window of let's say 500,000 tokens or something is filled with intermediate garbage that significantly reduces the probability of a successful output. Now, this phenomenon where

output. Now, this phenomenon where there's a bunch of garbage in your context window and that leads to poor quality outputs is called context pollution. And pollution is essentially

pollution. And pollution is essentially where that intermediate memory, that sort of midterm memory that I talked about way back at the beginning of the course, gets cluttered with a bunch of irrelevant noise. Now, scientists have

irrelevant noise. Now, scientists have been working with these models for quite a while. As I may have mentioned to you

a while. As I may have mentioned to you at some point in the past, AI models these days are more grown than they are built. And so, it's very much like a

built. And so, it's very much like a natural phenomenon that we are testing.

And what they've found is consecutively across thousands and thousands and thousands of tests, the more tokens in a context window, typically the poorer the quality is. And the relationship looks

quality is. And the relationship looks something like this.

And the reason it looks like this is because over here on the very left hand side, you probably have zero tokens, right? And so if it's fresh and you ask

right? And so if it's fresh and you ask it to do something with no context or whatever, it'll do an okay job. If you

add a bunch of context and you tell it, hey, you know, I'd like you to do this.

Here are a couple of examples of past instances of this run correctly. Uh

here's a bunch of context. Here's a

bunch of links and whatever. Performance

actually goes up in the short term. What

you'll notice is as you go on and on and on and you start filling it with more, you know, irrelevant garbage and whatnot, performance and quality and outputs go down a lot. Now, back in the day with GPT2 and GPT3 when I was

starting 1 second copy in my content writing business, you know, this was super super important and it was so important that I actually trained all of my writers not to use more than 256

tokens at a time. So, imagine that we had to stick under 256 tokens with our prompt. Essentially, if we went any over

prompt. Essentially, if we went any over that, we found um quality went off a cliff. In our case, now we can use

cliff. In our case, now we can use significantly more than 256 tokens.

Obviously, this point here is probably somewhere closer to like 10k or so, not 256. So, we're sort of blessed in that

256. So, we're sort of blessed in that way. But still, there is that

way. But still, there is that relationship between more stuff in the context window and then poor quality.

So, we need to make sure that uh you know, if all else is held equal, we try and minimize the amount of tokens in our context as much as possible. Now that we understand that, onto sub aents. The way

that sub agents solve this is through isolation of context. Now the idea is in order for something to be a sub aent and not a part of the main agent, it gets

its own fresh clean context window to work in. So all you do with a sub agent

work in. So all you do with a sub agent is basically you give it a task. You let

it do all the messy work in its own space and then you return only the relevant findings. So just as a quick

relevant findings. So just as a quick little demonstration here, let's say this is a chat back and forth with you

and you know your agent. So this is you over here. This is your agent over here.

over here. This is your agent over here.

Any every time you ask it something, it sends something back and so on and so forth. Imagine what happens every time

forth. Imagine what happens every time you send a call. Essentially what is occurring is we stack up all of these.

And so our total context, if you think about it, is that block up there plus this block over here plus that block over here plus that block over here plus that block over here. So how many blocks

is this? We're just counting. That's

is this? We're just counting. That's

five blocks. And let's say everyone's a thousand words. You're actually sending

thousand words. You're actually sending like a,000 words. So what that means is on the next query, what we're doing is we're sending a total of five blocks of context plus the thing that we asked. So

maybe 6,000 in total. What sub aents allow you to do is instead of doing this um you know having this 1,000 here, let's pretend that this over here is actually a sub aent loop. What we do is we actually just eliminate this

completely. Okay, and then we eliminate

completely. Okay, and then we eliminate that completely. And so what ends up

that completely. And so what ends up happening is basically the model instead of storing the results directly in the context, okay, only stores the outputs

of that response. So all we're really doing to make a long story short is we ask the sub agent to do something. It

deals with all of that stuff sort of internally in its own head and then just spits us out a brief summary plus the results that we asked for. If you guys are keen, you'll notice that this is very similar to how reasoning tokens get

discarded after use to keep the total token countdown. Remember how there's

token countdown. Remember how there's that sort of like thinking tab and you can open up the thinking tab if you want to see what's kind of going on under the hood. Well, those tokens aren't actually

hood. Well, those tokens aren't actually added to what I talked about here. Those

tokens disappear. So, it's the exact same thing. Whether it's reasoning,

same thing. Whether it's reasoning, whether it's sub aents, both of these strategies are meant to reduce the total amount of stuff and garbage polluting the context window. And the data backs this up. Anthropic, a company that sort

this up. Anthropic, a company that sort of not coined sub aents, but is definitely the leading force behind them with clawed code. Um, it ran a test where opus was the lead and then opus essentially controlled a bunch of sub

aents and had those sub aents do a variety of smaller tasks before reporting back their findings. And it

found that it outperformed single agent opus by over 90% on research. based

tasks. Now, I should note that's research, right? Not all tasks are

research, right? Not all tasks are research related. Obviously, research

research related. Obviously, research involves a ton of tokens. And so, sub agents here obviously did way better than they probably do on most other tasks relative to, you know, the standard. But, there are some

standard. But, there are some circumstances where sub agents do perform significantly better even in day-to-day use. And that's why I'm

day-to-day use. And that's why I'm talking about it. You'll know that I uh I really haven't really given a crap about sub agents or anything like that.

This is a very recent phenomenon for me.

People have been talking about sub agents for the better part of the last two years. And every time they are like,

two years. And every time they are like, "Nick, why aren't you using sub aents or whatever?" I'm always like, "Because

whatever?" I'm always like, "Because it's pointless." Like sub agents as an

it's pointless." Like sub agents as an architectural addition just complicate things. They don't actually make things

things. They don't actually make things easier. Models for the most part can

easier. Models for the most part can handle tasks on their own. It's okay.

You don't need to like, you know, try and develop some big fancy framework.

Well, model intelligence has gotten to the point where we can actually make use of these things now. So long as you're nuanced and kind of smart about how you do it. So the catch between this is

do it. So the catch between this is there's implementation complexity because you are now inserting your own biases and how you think the model should operate. Then you're also

should operate. Then you're also compounding errors. What do I mean by

compounding errors. What do I mean by compounding errors? I mean, you know, if

compounding errors? I mean, you know, if you think about it, there's a step here where in order for my parent agent to send something off to a child or sub agent, it needs to summarize what it is that it wants the sub agent to do. And

so that right there is a step. And that

step might be like 99% accurate. But as

we know, if you have a bunch of things that are 99% accurate, if you add enough steps into the process, eventually that turns out into something that is much less than 99% accurate, right? It might

be like uh I think my example was 99.9% stretched out over a,000 tasks was 36% accuracy at the end of it. So you know the more uh steps you have like summarization steps sending to this this

does some summarization sends back the more area you're inserting in the process and the higher the variability is. So basically what you need to do is

is. So basically what you need to do is you just need to find a situation where the added error as a result of the additional steps is outweighed essentially by the beneficial effect on the context. And there's no real

the context. And there's no real non-trivial way to know this right off the top of your head. Like you need to test this. You need to try this. Now

test this. You need to try this. Now

since I've tested this and trying this, my recommendation is to stick to two sub aent types for now. And there's in in particular just two that I'm going to talk about. Before I tell you what those

talk about. Before I tell you what those two are, the other two big wins from sub agents are there's context management.

Your main agent will stay super clean and it'll only have things that are highly relevant to what it is that we want. So let's say you delegate to a

want. So let's say you delegate to a bunch of sub aents that have MCP access.

Those sub aents are the ones that load up all the context and other MCP. Then

they do the job and then they report back. If your sub aents are atomic

back. If your sub aents are atomic enough, obviously we can do that over and over and over again and we can actually make some real headway without polluting the context window. The second

is parallelization. So sub aents can actually run all simultaneously. What

you'll find when you delegate to sub agents like I'll show you later is a single agent can spawn multiple and then those multiple basically all run on their own and report back whenever

they're individually finished. So if

you've ever seen, you know, Gemini or Claude sort of do research, typically what'll occur is it'll spin up, you know, three or four research sub aents because that's native to their

architecture and they're basically just going to wait until all three or four of these are completed. But these don't occur top down. It's not like this finishes first, this finishes second, this finishes third, this finishes

fourth. These are all individual

fourth. These are all individual processes. So this one might finish

processes. So this one might finish first and report back. This one could finish second, this one could finish third, and this one could finish fourth.

It's a very interesting phenomenon that you guys have probably seen but not fully understood where that comes from yet. A good example of that

yet. A good example of that parallelization is if you want to scrape a bunch of leads. I do tons of lead scraping, hence why it's always my example. But um you know, you don't need

example. But um you know, you don't need to scrape all these one by one. You

don't need to scrape, let's say, 30,000 independently through some big serial thing. You can actually just have your

thing. You can actually just have your parent agent, okay, spin up three sub aents and maybe every sub agent itself uses some form of parallelization to do a task. And so now what you're doing,

a task. And so now what you're doing, and I know this sounds really fancy, you're probably like, does it actually work? Now what you're doing is you're

work? Now what you're doing is you're basically just cutting the total amount of time it takes to do this thing down.

And then what what occurs is once these are all done, okay, if you kind of like check mark these, they report their results back to the main agent. Then the

main agent's task is really just consolidating these, putting them together, which if you think about it like the act of I don't know stitching together three lists of things is a lot easier of a task to ask a parent agent than you know actually going through the orchestration of scraping that many

leads. If something previously takes 3

leads. If something previously takes 3 hours sequentially with the spin up, the uh scraping and then the wind down. This

might only take 30 minutes in parallel because you are consolidating those fixed costs uh in terms of spin up and then wind down and then your parent agent just gets the results. In terms of like the technical and logistical bits

where sub aents live, they're defined as markdown files. Exact same thing as the

markdown files. Exact same thing as the directives. Nothing really different

directives. Nothing really different here. Uh in clawed code specifically,

here. Uh in clawed code specifically, they're included/ aents. So this is a tople folder with

aents. So this is a tople folder with another folder underneath it. And then

if you want to go global as in have that accessible like across your entire project directory, then you put it in your current directory. Claude/ aents.

The disambiguation there isn't super important. If you want sub agents to

important. If you want sub agents to only have access to a specific workspace or project, this is how you do it. But

if you wanted to have access to everything, uh then you'd put it over here and that way sub agents can work across your workspaces. Now, other

agenda coding tools do follow similar patterns. There is no consensus, at

patterns. There is no consensus, at least not as of the time of this recording, how Gemini is organizing its sub aents, how Codeex and so on and so forth are organizing their sub aents.

But rest assured, everybody has their own little framework and it's all about like the system prompt, right? You can

absolutely just have these models spin up the equivalent of the claw code version of sub aents. It's just a matter of doing a little bit more heavy lifting up front. The anatomy of a sub aent file

up front. The anatomy of a sub aent file right now is again you have the name then you'll have the description and then also really important you have the permissions. So which tools the sub aent

permissions. So which tools the sub aent can access tools in our do framework for instance are going to be directives and executions. After that, you have the

executions. After that, you have the system prompt. And just like we do

system prompt. And just like we do system prompts across the entire workspace, we also have a sub aent specific system prompts. Um, you guys don't actually need to know any of this.

I just say make me a sub agent that does X, Y, and Z. And this sort of stuff is just baked into um at least the Claude family of models as of the time of this recording. It'll most certainly be baked

recording. It'll most certainly be baked into other ones as well. So yeah, you don't need to create these yourself. You

can just ask the agent to do it. Um

here's an example prompt. literally just

create a sub agent called document that gets called after every workflow to update to consolidate changes in the directive and execution scripts. It'll

go through a process of creating the thing. I'm going to show you what that

thing. I'm going to show you what that looks like in practice and yeah, you're done. Your agent will generate a file,

done. Your agent will generate a file, put in the correct folder, and then it's immediately available. Talk about

immediately available. Talk about something recursive, huh? It's agents

creating agents. I should note that agents can create the definition of an agent, but an agent can only spawn an a sub aent. Sub agents can't spawn more

sub aent. Sub agents can't spawn more sub agents themselves. And this is like a memory constraint. They don't want sub aents to be able to spawn more sub aents to be able to spawn more sub aents because essentially what you're going to do is you're going to end up with a

situation where you know your parent agent spins up two sub aents your sub aents spin up two sub aents your two sub aents spin up two more sub aents and so on and so on and so on and so forth

until basically your I don't know CPU is as hot as the surface of the sun not to mention you know some safety and security concerns and stuff like that so um really what happens is we sort of limit it to if we just cut all this

stuff out these too. And so your parent agent can spin up however many sub aents it wants, but they all report back to that parent agent. So what are those two sub aents that I talked about that I personally find genuinely useful?

They're not required to be clear. You

can absolutely use DO and whatever other framework um it is that you want to build with without sub aents. But I

found that these actually improve the accuracy and quality of my execution scripts and they are a joy to use as opposed to something that is you know laborious and time inensive and so on and so forth. The first is the reviewer

sub agent. So a main issue with building

sub agent. So a main issue with building directive orchestration executions or cloud skills is your orchestrator will write a bunch of code. And so if you ask it, hey, how's this code looking? It's

going to be biased towards thinking that that code is correct because it just, you know, probably ran it a bunch of times and it sees some correct runs in its history. The unfortunate thing is

its history. The unfortunate thing is that's kind of like asking somebody to read their own essay right after writing it. Um, any experienced writers will

it. Um, any experienced writers will know what you want to do is you want to take a little bit of a break. You want

to like take a deep breath, go sit down somewhere else, you know, like do not look or read that essay. Come back to it maybe an hour or two later because when you come back to it an hour or two later, your mind is no longer polluted

by all the biases and your own flavoring of thought surrounding, you know, how good that essay is. When you come back to it, you basically come back to it with fresh eyes and you can tell by definition whether or not it is a good essay or a bad essay, whether it's some

of your good best work or maybe some sort of mediocre work. And so reviewer sub agents work basically the exact same way. Instead of the orchestrator which

way. Instead of the orchestrator which remembers all its decisions, what we do is we give it to something that can actually see a lot more clearly. What

occurs is the reviewer gets loaded with completely fresh context which is just the directives and just the executions that we built. We then ask it to evaluate the script purely on its quality. In short, it acts like a second

quality. In short, it acts like a second pair of eyes. We give it no context about what this thing is for. And the

idea is it needs to like determine the context through the code. Meaning the

code has to be documented. It has to be pretty straightforward to understand and read. Has to be written simply. And then

read. Has to be written simply. And then

if you think about it, if it has no context whatsoever, it'll be able to look at it and be like, hm, that seems kind of weird because most other code like this will probably have some error handling, but this one doesn't. I think

this should probably build in some error handling and then it can provide suggestions back to the main agent who is sort of biased to actually go and and build the thing. How do you do this?

Well, your main agent just calls sub agents automatically when you define them in the system prompt. So in

agents.mmd, after you create any script, use the reviewer sub agent to check for its quality. That's a totally okay thing

its quality. That's a totally okay thing to write somewhere in your agents.MG um

G or system prompt. Um while it won't be 100% accurate, aka it's not going to do this every single time, you know, it will do this up until the context window gets polluted enough, which is a pretty reasonable thing uh to do. And I find

just having this probably improves my accuracy a good 5 10%. In addition, you can obviously also ask the model to do things manually. So you could say, "Hey,

things manually. So you could say, "Hey, uh that's great. Call the reviewer sub agent, just make sure everything's okay." Or, "Call our reviewer and ensure

okay." Or, "Call our reviewer and ensure that you know this is fine. Hey, I want you to make some edits after you're done making those edits. Ping reviewer,

double check that it's okay. If it's

okay, then give me the thumbs up. These

are all just flavors and variants of things that you can ask your agent.

Obviously, your mileage varies and it's up to you. The second sub aent that I recommend building is a document sub agent. So, this one updates directives

agent. So, this one updates directives based on what the system has learned over time. You know, after your workflow

over time. You know, after your workflow self anneal for a while inside of your IDE, sometimes the agent will forget to update. That's just because, as I

update. That's just because, as I mentioned, it has a ton of context and so it's going to forget some of the things that you mentioned initially in the system prompt like, "Hey, I want you to update your thing." So, what the document does is it just reviews scripts and then it updates the directives to

reflect their current behavior. A lot of the time in practice, what happens is you'll have some um issues with your script and so the agent will go and update the script over and over and over and over again. And then the directive will be untouched despite the fact that

you spent all this time um updating the script. And then on a fresh instance of

script. And then on a fresh instance of a new agent, maybe tomorrow or the next day, you try running the workflow and then it goes like, "hm, this is weird. I

tried running the execution script, but it looks like it wants different parameters. What's going on here? I I

parameters. What's going on here? I I

followed the directive." And then, you know, there's a big debugging step and then it fixes it. But it takes like, I don't know, 5 or 10 minutes. Well, just

call your document sub agent and have it just rectify everything right then and there instead. What you do is you give

there instead. What you do is you give it read access to all files and then write access just to your directives.

So, it can read through all of your execution scripts, but it can't make any updates to that. And then it can update the directives to match the execution scripts. This is pretty simple, too.

scripts. This is pretty simple, too.

Create a sub aent whose job is reviewing scripts and updating documentation so everything aligns and just call it whenever you update a script. Anytime

you make a change, your main flow will then call the document sub agent. Just

do some review. The document will review the scripts and summarize the changes automatically since it's sort of like trained to do so with its prompt. Now,

as I mentioned before, the really cool thing about sub aents is they don't just work in sequence. Um, they can work in parallel. What I mean by parallel? Well,

parallel. What I mean by parallel? Well,

just like opening new tabs, sub aents let you run tasks in parallel. Just like

opening three or four instances of Gemini and then asking each to do a different thing. You could just run

different thing. You could just run three or four sub agents within a single window. Now, your parent agent has the

window. Now, your parent agent has the ability to run multiple agents what's called synchronously and then wait for the results of all of them. And so, as I've talked to you guys many times, you know, if you have some parent A, this

can now whip up C, B, and then D, and then it can combine the results into some result E, loop that back around, and then just use that result to, you know, proceed instead of doing everything sequentially. Because this

everything sequentially. Because this this can take a fair amount of time, right? If every single step here takes,

right? If every single step here takes, I don't know, 20 minutes, that's 20 minutes here, 20 minutes there, 20 minutes there. Why not just like

minutes there. Why not just like consolidate them all and then only have one 20-minut step? Parallelization is

probably one of the freest wins in computing to be honest because most of your CPU cores and GPU cores are literally just left idle 99% of the time. This is a good way that you can

time. This is a good way that you can make use of them. When you do this, the context window will also stay really small. It's usually under a couple

small. It's usually under a couple thousand tokens in the main thread to do the thing. And then every sub aent works

the thing. And then every sub aent works independently without cluttering your primary workspace, assuming that you know you you you give it the right system prompt so that it can do that.

Hey, I want you to store intermediate research results in, you know, tmp/ressearch instead of polluting my uh parent agents context window. Now,

obviously when you give sub agents autonomy, okay, and keep in mind that that autonomy is also given by the parent agent. So, it's like you're

parent agent. So, it's like you're multiplying autonomies just like you're multiplying probabilities. Obviously,

multiplying probabilities. Obviously, safety becomes pretty important, right?

And so, what I recommend is giving each sub agent different tool access. You

need to specifically say you can only do X, Y, or Z. So, your guardrails have to be a lot stronger than let's say the guardrails on, I don't know, some other sort of agent. I'm just going to draw my

little bowling ball analogy over here, but it is very much one of those things.

You do need to have some sort of guardrail. I think of it like giving my

guardrail. I think of it like giving my intern, you know, readonly access to my production database. Production database

production database. Production database being like my live actual database that, you know, people are really using. I

don't know. You know, I've had some issues in the past where people that aren't very skilled come into my organization and then they start screwing around with databases they probably shouldn't be touching and then I don't know, they drop my tables and then all of a sudden everything's all

crappy. So, you know, an SOP that I and

crappy. So, you know, an SOP that I and I think a lot of other people probably use is, hey, you know, if you're new to my organization, you only get read access to things. You can only like look at it. If you want to make changes, ask

at it. If you want to make changes, ask me. Well, sub agents are very, very

me. Well, sub agents are very, very similar. And this is obviously an

similar. And this is obviously an architectural pattern that we're borrowing from hierarchical organizations. This is called lease

organizations. This is called lease privilege. It's where you give each

privilege. It's where you give each agent only the resources it needs for a specific job. If you think about the

specific job. If you think about the document sub aent that I was telling you about, the document sub agent only really needs to be able to read the executions. It doesn't need to be able

executions. It doesn't need to be able to write them. The only thing it needs to be able to write, which is sort of like the really scary thing is the directives. And so in that way, we

directives. And so in that way, we ensure that it's only really ever, hey, information from executions goes into directives, not really the other way around. I could of course create like a

around. I could of course create like a hypers specialized optimized coding agent which has a bunch of context about the best ways to do code. Then maybe I give that read access to my directives and write access to my executions or something. A couple of other limitations

something. A couple of other limitations about sub agents that I want to talk about because I think they're really shiny and they're fun and everybody likes being the top of some big organization. They add some overhead and

organization. They add some overhead and they also add some latency. So spinning

up a sub agent and getting some results back does take extra time is not instant unfortunately because you are literally spinning up like a separate entity. So

for simple tasks, your main agent will almost always be faster just doing it directly. And so like most simple tasks,

directly. And so like most simple tasks, it'll just do the main thread. I'm not

going to spin up a sub agent to do my research for me. Even though some of that is just built into the way that these agents now work, uh I'm just going to be like, hey, you know, look up this and get me the results. I'm not going to be like, spin up the research sub agent

and then feed that into the decision-making sub aent and so on and so forth because I think that's just kind of BS. So yeah, I don't really use sub aents for most things. The time cost often isn't worth it. I'll only really use it in the context of like a hypersp

specific framework like directive orchestration execution like cloud skills and so on and so forth. So let me show you how to actually create one of these sub aents. I'm using sub aents in cloud code just because cloud code is

currently like the defined sub aent pattern. So I could just say hey make me

pattern. So I could just say hey make me a sub aent it'll do it. I want you guys to know that you can build sub aents or at least things that are analogous to sub aents in whatever model uh structure you want. All a sub aent really is

you want. All a sub aent really is doesn't have a formal definition yet, but I'm going to define it is something that does not have context aside from the input that it is given by a parent agent. So, I want to create a reviewer

agent. So, I want to create a reviewer sub agent, right? In order to create a reviewer sub aent, I'm just going to like voice dump my um my requirements directly in. Hi, I'd like to create a

directly in. Hi, I'd like to create a reviewer sub aent. The whole idea behind the reviewer sub agent is it will look at the execution scripts that another agent develops and it will look at it with totally fresh eyes and just

determine if this is done in as effectively or efficiently a manner as humanly possible. It will then provide

humanly possible. It will then provide instructions to the top level agent which can then take that guidance and review to improve the quality of the build.

I'm just going to feed all that in directly. It's then going to do some

directly. It's then going to do some tinkering and some thinking.

Then it's going to ask me a bunch of questions. My main goal here is I want

questions. My main goal here is I want you to be able to call the sub agent as required. So set it up in whatever way

required. So set it up in whatever way allows you to do the calling.

I also want you to check everything. All

of the above. The output format should just be whatever is most amendable or convenient for you since you are going to be the one that is calling it. Okay.

Funnily enough, I ran into a limit um earlier when I tried finishing that. So,

I went and I added um what's called additional credits, which is pretty easy to do essentially in Claude. Anyway,

your current session eventually hits a cap. I'm using the Claude Max plan, so I

cap. I'm using the Claude Max plan, so I have a fair amount of usage, but yeah, I eventually do run into some sort of issue. Uh and so what I did is I enabled

issue. Uh and so what I did is I enabled the extra usage toggle and then I said, "Hey, just use this to pay for any extra usage whenever I do." I set a very low spending cap because I very rarely run into sessions. It's my fault for just

into sessions. It's my fault for just doing like 20 demos today. Anyway, um

after that I then had this run on a test. So I said, "Hey, run the reviewer

test. So I said, "Hey, run the reviewer on scrape_cross_nicheoutliers.

py." So it's now actually running a test. It's saying, "Hey, read the

test. It's saying, "Hey, read the directive first. Understand the

directive first. Understand the criteria. Read the script completely.

criteria. Read the script completely.

Produce the structure of view output specified in the directive. Be

ruthlessly honest and specific." And so this thing is only going to have read functionality. And it since found me a

functionality. And it since found me a bunch of information that I could use to improve it. script is functional but a

improve it. script is functional but a significant efficiency issues. Excessive

API calls, no rate limiting and potential quota exhaustion. Here they

are. Wonderful, wonderful, wonderful.

This is really cool. An O squared string matching for 175 niche terms. Full transcript load only 8K characters used.

So now we can do basically a fix. I'll

say great, try this on the create proposal flow. I'm doing this because um the

flow. I'm doing this because um the create proposal flow is pretty solid, but it's also quite simple and I actually want to see how this would work doing a review on create proposal. It's

now spinning up base sub agent. Now the

way that sub aents work at least in cloud code is there's a defined structure. They live include/comands

structure. They live include/comands inside of the commands is the sub aent tool spec. As you see, we haven't

tool spec. As you see, we haven't actually done that. There is no um you know reviewer sub aent here. That's

because the model typically defaults just doing this in the directive orchestration execution framework way by just like having a directive called hey you're the agent but we want to do this in claude format specifically just

because the probability of this working is a lot higher on like totally fresh u roles so what I'm going to say is excellent work before you proceed create

an actual claude command for this right now you are using a directive to spawn the sub aent but I instead want you to search through theclaw pod folder and

see how it should be done. After you're

done, update the execution script with the reviewer sub agents thoughts.

This is fantastic. It found a bunch of discordant issues that probably significantly increased error rate. Now

we have correct paths. Everything here

is much more on board with uh uh the directive. And we've even gone as far as

directive. And we've even gone as far as actually creating the claude command. So

this is fantastic. What I will now say is great test create_proposal.

py with the demo sales call transcript intmp. It found it. Now what it's doing

intmp. It found it. Now what it's doing is generating all of the information.

This is the same thing that I ran in an earlier demo in case you guys are aware.

It's going to use a plausible email.

Create the JSON input and then test.

Cool. And this actually significantly improved the functioning of create proposal. Previously we had to do some

proposal. Previously we had to do some some polling. Now what it does is it

some polling. Now what it does is it waits for the document to be ready before returning the link. Um so we actually have this um ready and we've significantly improved the effectiveness of the script as well. It's a welcome

surprise. I wasn't actually expecting to

surprise. I wasn't actually expecting to improve this. Looks like the one issue

improve this. Looks like the one issue here is it just titled this with the company name which made that spill over to a second line. I can obviously change that anytime I want. But yeah, the rest of this looks pretty solid. I'm not

seeing any major issues here. So

fantastic work. Hopefully it's clear.

You can use a reviewer sub agent and a document sub agent to significantly increase the effectiveness of not just the DO framework but your agentic workflows in general. And that's that.

Thank you very much for making it through the agentic workflows course. If

you guys have made it through the many, many hours of content, you are now in a position where you can use and leverage aic workflows better than probably 99.9% of the rest of the population. The skill

set that you guys have is extraordinarily in demand right now.

Whether you want to use it for your own business, maybe a software business, maybe an agency or service business, an ecom business, or in a consulting business to help other people with their businesses through Agentic Workflows.

So, whatever category you're in, take the knowledge that you've learned today and use it to produce great things and accelerate the transition to a more efficient economy. If you guys like this

efficient economy. If you guys like this sort of thing and want to learn how to implement agentic workflows in other people's businesses, please check out Maker School. It's my 90-day

Maker School. It's my 90-day accountability roadmap that guarantees you your first customer for AI automation or agentic workflow consulting businesses. That means that

consulting businesses. That means that by the end of the 90-day period, you will have your first customer or I'll give you your money back. More

generally, it's just a great community.

We have over 2,000 fantastically talented and capable people in there.

It'd be great to add another. Aside from

that, want to thank you from the bottom of my heart for making it to the end of the video. Have a lovely rest of the day

the video. Have a lovely rest of the day and best of luck implementing Agentic workflows.

Loading...

Loading video analysis...