CLAUDE CODE ADVANCED COURSE — 3 HOURS

By Nick Saraev

Summary

## Key takeaways - **45x compression via CLAUDE.MD**: CLAUDE.MD compresses entire workspaces into succinct summaries—a 22-line file replaced an 1100-token app.jsx, achieving 45x compression and dramatically reducing token costs as Anthropic aggressively raises prices. [22:53], [23:16] - **Fan-out/fan-in cuts costs 60%**: Spawn multiple Sonnet research agents (~$0.003/1M tokens) for parallel work, then use Opus for synthesis—Claude Opus at $15/1M tokens versus Sonnet at $3 means using cheap models for research and expensive models only for high-value synthesis saves ~60% on input token costs while improving output quality. [54:45], [55:07] - **Auto research loops compound gains**: Karpathy's auto research pattern—where AI agents iteratively make micro-hypotheses, test changes, and log failures—enabled 53% faster parse-plus-render time on Shopify's entire Liquid codebase with 61% fewer object allocations; running at 1,440 loops per day means even 2% successful changes compounds to 34% daily improvement. [46:48], [46:51] - **Claude Code is just a harness**: Claude Code is not the intelligence itself—it’s a harness wrapping the LLM that enables tool use and real-world action; other harnesses exist (Codex, Droid) but vary in security, and some will execute destructive commands like rm-rf when policy blocks them, demonstrating that the harness determines both capability and safety. [34:40], [35:16] - **Skills and sub-agents are identical**: Skills (markdown files with name/description/tools/SOPs) and sub-agents are two flavors of the same thing—different ways to organize context for agents; as the field matures, these organizational approaches will likely merge into unified context management strategies. [19:18], [19:57] - **Monoculture farming applies to AI**: Relying 100% on Claude Code is like monoculture farming—if it goes down (as happened with Opus 4.6 for an hour), 95% of developer productivity collapses; diversification through Conductor, Codex MCP servers, or parallel agent platforms stabilizes output like index funds stabilize investment portfolios. [08:07], [08:44]

Topics Covered

How CLAUDE.md compresses your entire codebase into instant insight

Full Transcript

Hey, this is the definitive Cloud Code course for advanced users. I use Claude Code and AI agents in my own business every day to generate over $4 million a year in profit. I also teach around 2,000 people how to use Claude Code and

other tools to improve their lives both personally and business. Okay, so this course is going to assume a foundation of Claude Code experience. It's not for total beginners, but if you are a total beginner and you happen to stumble on

this course, that's okay. Just look over my left shoulder here, click that button, and then I have a 4-hour guide that will walk you through everything you need to get to the point where you understand what I'm about to say.

Assuming you're still here, no fluff.

Here's what we're going to cover. We'll

start with an advanced look at cloud.mds

and system prompts and learn how to optimize these to actually improve quality, which is simpler than you think. We'll then cover agent harnesses

think. We'll then cover agent harnesses and how to build larger projects with Claude code. After that, we'll chat

Claude code. After that, we'll chat agent teams and other examples of extreme task parallelization. Then we'll

do skills, sub aents, and other forms of organization. After that, I'll cover

organization. After that, I'll cover Karpathy's auto research approach for improving stuff progressively over time and a few actual use cases you can apply this to, not just fancy demos. We'll

then talk browser automation, uh the major players. We'll do computer use,

major players. We'll do computer use, browser use, and which tools to apply to different use cases depending on what you want. I'll then cover how to deal

you want. I'll then cover how to deal with performance fluctuations in cloud code because they do happen as well as some alternatives that you guys could use and ways to bundle in multi-agent orchestration into your workflow. We'll

then cover workspace organization, so for personal business, and then even client projects, assuming you're selling this sort of thing as a service.

security for larger projects. We'll chat

stuff like the recent auto mode. We'll

talk a little bit about OOTH. And at the end, I'll finally round it out with a discussion about where I think Cloud Code is going and the future of work more generally. Hopefully, you're as

more generally. Hopefully, you're as excited as I am to level up your Claude Co skills. Please use the bookmarks and

Co skills. Please use the bookmarks and chapter headings as needed to jump around the course. Subscribe to the channel and let's get into it.

So, for most of the course, I'm going to be building directly using the Claude Code extension inside of Anti-Gravity.

That's this over here. If you don't have anti-gravity installed, this is an installation tutorial, but get that from Google's official anti-gravity.google

website. Then head over to extensions, click on Claude Code for VS Code, give that an install, and then everywhere you go, you'll have this little Claude logo that you can use to spin things up.

After a brief login, you'll have more or less the exact same layout that I do. I

want you to know though that the Claude desktop app is also getting better and better by the day. And because Cloud is attempting to get you obviously on their infrastructure as opposed to on your own, um they're just continuously adding

new cool features that allow you to do things like mobile development and so on and so forth. So everything I'm going to show you today works in both the Claude code tab of the cloud desktop app. Also

works natively inside of a claude codes extension with an anti-gravity or some other you know IDE like thing. So if

you're intimidated at all by the way that I've laid things out, what all these different folders mean and how they collaborate in order to improve your workflow, I'm going to cover all that in this course. First though, we're going to cover claw.md and other

advanced system prompts. Basically, how

to set up your system prompts in a very efficient and effective way. Both to

save you uh financially, but also to improve the quality of your outputs and significantly minimize the amount of time it takes to build anything. So,

what is a cloud really? Well, as far as I could tell, it's four things. The

first is it's a form of knowledge compression. Okay? And when I say

compression. Okay? And when I say knowledge compression, what I mean is instead of Claude having to read through your entire workspace, you know, file by file, like for instance over here,

instead of having to open up every single folder here, every single one here, read through all of the files and so on and so forth to be able to reason and then make highle declarations about

your codebase or folder. What your

claw.mmd does, okay, is it basically just compresses all of that down into a highly succinct summary of what the heck is going on in your freaking folder. So

that the next time you say, "Hey, what was that file I made a couple of weeks ago about X, Y, and Z. Claude doesn't

have to look through every single file in your codebase. You don't have to spend a tremendous amount on tokens, and you also don't have to wait a long time.

It's just sort of baked into the cloudmd, or at least a reference to where the file lives is baked into the cloud nmd. um so you can actually like

cloud nmd. um so you can actually like reason with it um at a superficial level at a bird's eye level as opposed to actually going down through the weeds.

So that's sort of like the very first thing that I'd say, you know, a claw.md

is. The second thing that a claw.md is

is it's obviously your own preferences as a user. And what you'll find is, you know, more or less every time Anthropic updates claude code, you have better and better baked in native preferences and

conventions for things like, you know, delivering you file paths or uh how to deal with like documentation or debugging or how to update itself and so on and so forth. But obviously claude

code lags behind these preferences a little bit um because they have to see what users are actually using it for and and you know like they collect that information and figure out what ways to make things more effective. So if you're an advanced user as I am, you'll have a

list of these preferences and conventions that improve your user experience. And advanced users will

experience. And advanced users will always have just some better preferences that kind of adapt their own workflow as well as you know programming conventions, um ways to organize information, structures and and that sort of thing. Okay, so it's both a form

of knowledge compression, but it's also preferences and conventions that are not natively baked in that you get to u decide on. The third thing that clawmd

decide on. The third thing that clawmd is is it's a declaration of capabilities. Now, I don't know how many

capabilities. Now, I don't know how many times this has happened, but if you do not have a substantiated enough clawmd and then you have let's say a skill somewhere in your your workspace or you have just some knowledge that's sort of

floating around in a few files and you say, "Hey, claude do xyz thing for me.

So, you know, find some knowledge on XYZ person or go do some research or uh you know, compile a plan using XYZ framework. Half the time, okay, if it's

framework. Half the time, okay, if it's not in your cloud and cloud will just look at you metaphorically, obviously it doesn't have eyes yet. Uh and it will say like, oh, like I don't have a built-in way to do this. Um sorry, what were you referring to? Do you want me to

build something from scratch? I'll happy

to do it. And this this sort of slowdown loop is completely unnecessary. And so

what cloudmd allows you to do is it basically allows you just to itemize okay uh you know everything that your agent can currently do within your workspace. And you can make that really

workspace. And you can make that really clear. You could say hey you currently

clear. You could say hey you currently have access to this functionality. You

can do this. Hey uh you know you can build a a fullstep plan that lasts 10 or 15 minutes and then executed on an autonomously. In fact that's my that's

autonomously. In fact that's my that's my preference or the convention that we're using. You know you can call this

we're using. You know you can call this API you can call this database. You can

retrieve all this information. You can

act autonomously using browsers and so on and so forth. The reason why that's important is because as agentic as Claude is, hopefully we're we're all still on the same page here um about this fact, Claude still lacks a lot of

agency. Okay? If you ask it to, you

agency. Okay? If you ask it to, you know, help you do something or if you ask it how long would it take to do something, it'll often significantly underestimate or overestimate because it's not really factoring in its own agenda capabilities. Like I asked it the

agenda capabilities. Like I asked it the other day, um hey, you know, how long is would this XYZ thing take to build? And

then it was like about 3 months or so because you would have to build this, you'd have to build that, you'd have to build that. And it's obviously like,

build that. And it's obviously like, "No, I don't have to build that. I'm

asking you to build it. You could build it in 5 seconds. So, why don't you just go ahead and do it?" Or, you know, you're having it do some API stuff and then it sends you a little command line interface prompt and it's like, "Hey, just pump this into the terminal." It it

sort of needs reminders that no, I don't have to do this. That's why I'm asking you to do it. So, you can actually do all of this stuff, Claude. Declaring

capabilities in this way, whether it's your own personal like tooling or workflows or whatever, or it's um, you know, uh, Claude understanding that it has the ability to do things that it might not realize at first glance is pretty important. And then finally, the

pretty important. And then finally, the fourth thing that a cloud NMD is is it's a log of failures and successes. What I

mean by this is as you accumulate various files, as you accumulate, you know, bits of code through your project and stuff like that, every single one of these things is hard one. You didn't get them for free. Realistically, you spent

tokens and then your time, which are soon to be two of the world's most valuable resources. And so because you

valuable resources. And so because you spent all this time and energy, it is more efficient for you to take all of the learnings basically from every single piece of development or every

single action cloud does and then insert it in its next system prompt than just have it restart kind of from scratch every time. You know, viewed another way

every time. You know, viewed another way uh mathematically, if this is the total space of all of the different possible things that Claude could do when you say, "Hey, do x."

What this log of failures and successes is doing is it's basically carving out big chunks of this theoretical solution space and it's saying hey no you you don't do anything over here because

we've already tried all this stuff over here kind of looks like a planet meaning the only things that you can actually try and the only things that you should try are kind of the things that exist in between. Okay. So basically what this

between. Okay. So basically what this log of failures and successes does is it just allows you to immediately cross out like 80% of all possible

things cloud could do because it knows it's actually tried that in the past and then in that way focus its time, effort, your tokens, your money and then your your energy on the 20% that actually

matters. So these four will exist in

matters. So these four will exist in different sections in your cloudmd.

They'll also exist at different levels both global and local. So, what I'm going to do next is run you guys through high ROI ways to combine these four sort of principles behind system prompts and then apply them u both in global local

and then also give you guys sort of like a little workflow loop that you can use in order to understand how to update this effectively. And this isn't just

this effectively. And this isn't just going to be some big long system prompt that I'm giving you guys like I think we've probably all seen floating around various sources in the internet. The

reality is like cloud entities are highly personal uh devices. Um, but

these are going to be a list of short principles that will almost certainly help you design better projects and then get more done whether economically or or otherwise. So, the way that all this is

otherwise. So, the way that all this is organized within cloud code is using two different scopes, global and local. And

if you didn't already know, basically there are a variety of different places that cloud code upon initialization will look to to get the prompts that is injected at the very top of its um

context window. Okay, the two big ones

context window. Okay, the two big ones for us are the user over here, which is equivalent to your global, and then also the project over here, which is equivalent to your local. And so,

basically what this means is if you have a file called claude.md all caps, that exists within this folder on your computer somewhere, it'll load that up on every cloud code session, whether or

not you're working in the same workspace or another one. Now, if you have a cloud MD, capital cla.lcasemd lowercase MD located within a dotcloud folder within your specific repository directory then

it'll also be loaded and in this way you know you sort of have like a global precedent that's always injected at the top of every single thing okay no matter what and then you also have sort of a smaller little um you know local cloud

denomin that's also injected and collectively when I say you know system prompts from here on out really what I'm referring to is I'm referring to both of these I'm not just referring to one of these and because global is injected on

every single run they're sort of like different strategies in order to divide the four things that we just talked about. Um, basically on your global

about. Um, basically on your global claude.mmd it makes more sense to put

claude.mmd it makes more sense to put highlevel reasoning and then your own personal beliefs and then in local cloudmd makes more sense to insert local low-level knowledge. So stuff like I

low-level knowledge. So stuff like I just talked about with the um workspace itself. So you know if I were just

itself. So you know if I were just enumerating all of these things up here okay you'd put your preferences like your global preferences. These could be things like, hey, you know, when you return a file, I want you to return the

absolute file path to click on it because whatever editor I'm using doesn't really have take that into account. You know, it could be things

account. You know, it could be things like uh programming conventions. Hey, I

want you to program using, I don't know, object-oriented programming or hey, I want you to do like functional programming in in Rust. Hey, when I ask you to develop a new project, I always want it done in Rust as opposed to, you know, Python or or something like that.

Alternatively, it could be stuff like, uh, hey, you know, if I ask you to do something using a tool you're unfamiliar with, always go and read the API documentation first before attempting to start because every other time that you've attempted to do something without

the API documentation, you typically run out of loops, you waste XYZ tokens. So,

make sure to load the API docs. By the

way, if you can't load the API docs through uh, you know, HTML, then make sure to like load up a Chrome DevTools MCP server to go and get that stuff, even if it's dynamically loaded through

JavaScript. Okay, so these are highle

JavaScript. Okay, so these are highle reasoning strategies. These are your own

reasoning strategies. These are your own preferences. These are your own

preferences. These are your own conventions. And then also um these are

conventions. And then also um these are going to be just sort of like agency capabilities. So stuff like hey Claude,

capabilities. So stuff like hey Claude, you can actually do X Y and Z. If you

believe that you can't for whatever reason, you're wrong, you can absolutely, you know, go and do whatever you want. The local low-level knowledge.

you want. The local low-level knowledge.

Okay, this is going to be stuff like backslashit, which I'll show you guys in a second. So this is going to be like a

a second. So this is going to be like a compressed version of all of the knowledge on your workspace. Instead of

claude having to in the future go through every single file, it'll just be able to read the cloud and sort of have a loose understanding like okay, what's where why have we built this? What's the

purpose of this workspace and so on and so forth. Some additional things you can

so forth. Some additional things you can do are things like context about you and your goals and your own reasoning strategies, your own communication styles. So I'm going to give you guys

styles. So I'm going to give you guys examples of my own cloudm in a moment where you guys see that I actually give it a lot of context about who I am and why I want what I want. Um, I'll run it through, you know, reasoning strategies that I personally use that have, you

know, yielded me a lot of success in the past that may actually not necessarily be the optimal reasoning strategies, but which I tend to understand. And because

I'm communicating with this thing at every freaking every 5 seconds nowadays, I'm I'm better capable of understanding what it's putting across if we use those principles. Um, and then yeah, those

principles. Um, and then yeah, those high level preferences and then generally good token conservation strategies. Whereas with the local, you

strategies. Whereas with the local, you know, it's a description of the project, where everything is, low-level preferences like specific API docs and usage. If you are using you know the go

usage. If you are using you know the go high level API to do some project or whatever you can actually just like have the whole go high level API existing within your project that'll minimize the number of tool calls that um cloud has to make to you know some sort of

research sub aent go and do the thing for you instead it can remain local reduce total token usage and then also just be faster and then more accurate and then capabilities within the project and then that takes me to the local

workflow so and then that takes me to workflow so there's two sort of workflows here that I want to talk about there's the local workflow and then there's the global workflow and the local workflow is going to be responsible for updating our local cloud.mmd and then the global workflow

cloud.mmd and then the global workflow is going to be responsible for updating our global cloudmd. Like it would be nice if I could just give you on a silver platter a bunch of stuff to put in your cloudmd, right? I think that's what a lot of people want. But you're

going to end up a much better both developer and then a much more productive person if you understand the principles at play here and develop your own. So initially to start um anytime

own. So initially to start um anytime you're developing anything in in cloud code or whatnot obviously you need to plan your feature and I say feature here loosely. You know, I use cloud code as

loosely. You know, I use cloud code as basically like my business assistant nowadays. And so I use it to do anything

nowadays. And so I use it to do anything from reading my emails to grab me new news summaries in the morning to to communicate with XYZ people to design me, you know, websites and so on and so forth. So feature here is really loose.

forth. So feature here is really loose.

I'm not just talking like about a vibe coded project. I'm talking about

coded project. I'm talking about anything. But what you do is you start

anything. But what you do is you start by planning a future, right? And then if you think about it logically, what claude does next is it instantiates the future. However, over the course of

future. However, over the course of planning and instantiating, okay, it will fail a bunch. it'll also succeed a bunch of other times and ultimately

there'll be a giant list of learnings between you know step one to step two and so what you do after you instantiate is you actually compile all those learnings okay into some efficient um high information density thing that

doesn't seem a lot of tokens then you use that to update the cloudMD and so this is your local workflow for managing your system prompt and you basically just do this every time you plan something it'll do a bunch of failures

in the way then you'll instantiate it you'll take all those learnings update your cloud NMD. That way the next time you plan a feature, it'll already have all the benefits of the failures plus, you know, any additional things that learned along the way. And so the first

time around this loop, you know, it might take uh I don't know, let's say X time to develop a feature. The second

time around this loop, you know, maybe it'll take like 0.9x because now, you know, you've shaved off 10% of the the the search space and it's a lot faster.

The third time you go, maybe it takes 0.8x. Okay? Okay. And so like the time

0.8x. Okay? Okay. And so like the time will just get faster and faster and faster every time until eventually you develop things um using claude in a similar way that you would develop if you were not using cloud. Now here's

where it differentiates between the global workflow. What happens is you

global workflow. What happens is you know as you accumulate a variety of failures, successes and learnings and so on and so forth. Your current local cloud gets really really good after all that's done. What you do is you know

that's done. What you do is you know after hundreds of these runs, okay, you can either pull a slash insights feature or you can run that yourself to show you guys how to do. What this will do is

this will compile not like at a local level, but at a global level all of the things that Claude attempts uh pretty consistently and then struggles with pretty consistently. You know, it's

pretty consistently. You know, it's like, oh, hey, I noticed that not only on that one project, but also on more or less every project, Claude consistently goes down silly rabbit holes it doesn't need to um and then tries coming up with its own stuff instead of just consulting

the docs. And so, you know, after this

the docs. And so, you know, after this is done three or four times, obviously there's a trend, right? So, what you can do is you can take that information and then you can pump that in your global.

Um, after that, what I'd recommend is is I'd recommend you manually review because Claude is an agent at the end of the day. And the more AI steps you have,

the day. And the more AI steps you have, the more you compound probabilities and the less likely that it becomes that Claude itself is making like the right call. You know, if like Claude is

call. You know, if like Claude is independently 90% successful on a task and then you give it to another claude which is 90% successful to a task and then you give it to another claude, you know, what you're really doing mathematically is you're going um 0.9

raised to the 3. And if you just do a little bit of math there, that's not 90%, right? 0.9 to the 3 is 73%. And so

90%, right? 0.9 to the 3 is 73%. And so

I guess what I'm trying to say is um the more steps you have without a human in the loop here uh the lower the likelihood that your total determination will be correct. And because this is your claude MD, it is your global

preference and convention file, it will be applied to every future project.

Meaning if there is a place you should spend human time on, it is this exact step here. So I'd recommend manually

step here. So I'd recommend manually reviewing that. Once you manually review

reviewing that. Once you manually review that, then you can add some high ROI bullet points to your cloud NMD and so on and so forth. you know, just like a high information density version. And

then you can actually update the the cloud in MD. And then you can repeat this loop a few times if you'd like before finally going back to the local loop. And so I mean, it's kind of like I

loop. And so I mean, it's kind of like I don't know what you want to call like an infinity sign. Okay? Kind of starting

infinity sign. Okay? Kind of starting here, you're going kind of like this.

And then you're kind of looping back and then you're just doing this over and over and over and over and over again.

Obviously, you're going to spend a lot more time in this loop, but eventually you're going to go down to this loop.

And this is how I personally develop using Cloud NMD. This is why my workspaces are super tight. And then

instead of me, you know, using a vanilla version, asking it, hey, go do X, Y, and Z. And then it like stumbles around,

Z. And then it like stumbles around, uses 20,000 of my tokens and God knows how many of my dollars, when I say, hey, I'd like you to do X, Y, and Z. I'd like

you to go scrape some laser over. It

already has all that stuff baked in while still being flexible enough that I could change them anytime that I want.

Okay, so next I'm going to show you guys basically my workflow every time I start with a new project, uh, assuming that I've already done a little bit of work in the project. I don't have a cloudmd, and I don't really have any of that like advanced tooling or system prompt harness and stuff, uh, set up. This is

exactly what I would do step by step.

So, first of all, you need to open up a folder. Um, I was just learning about

folder. Um, I was just learning about toatillos earlier. That is sort of

toatillos earlier. That is sort of embarrassing. But anyway, in

embarrassing. But anyway, in anti-gravity, just go open recent. And

then I'm just going to open up something. Why don't I do, you know,

something. Why don't I do, you know, anti-gravity example right over here.

And, you know, when I'm in this folder right over here, obviously, there are a bunch of different files and, you know, configurations. This one was using

configurations. This one was using Gemini for a while. So, what I'd like to do next, I'd like to open up a Claude code. And so, I'll click on that button

code. And so, I'll click on that button over here. Let's close out the agent

over here. Let's close out the agent window because I'm team Claude, at least for the moment. Thank you, Space Invader. And really like the first thing

Invader. And really like the first thing that you do is you know you you develop on your own. I always recommend just like don't try baking in any opinions into a cloudmd until you've at least developed with without a cloud or some

sort of advanced system prompt for a little bit. And the reason why is

little bit. And the reason why is because like you'll find cloud's actually really good out of the box. As

mentioned they are incorporating more and more of these features natively within it. And so like it's it's great.

within it. And so like it's it's great.

It's not like the harness that makes the intelligence. It's obviously the

intelligence. It's obviously the intelligence inside of it that sort of you know communicates with your system prompt to get good. But right now it's already fantastic. Anyway, after you've

already fantastic. Anyway, after you've done some developing for a while, and this is obviously some sort of website here. It's like a template using VIT.

here. It's like a template using VIT.

Just go slashinit just like that. And

basically slashinit will go through read every single file in your workspace, which I'm currently doing with fast mode if you're wondering why this is probably faster than than what you're doing. And

then at the end of it, it'll come up with basically like a highly optimized claw.md file that succinctly and

claw.md file that succinctly and effectively summarizes the placement of everything here. And you can see it just

everything here. And you can see it just generated one called claw.md. So comes

with the built dev lint commands. Note

that no test framework exists. Some

architecture review key dependencies and their roles. Then some style conventions

their roles. Then some style conventions as well. So now I'm going to open up

as well. So now I'm going to open up this cloud NMD. Okay. And why don't we just move this over to the main window so it's a little bit easier to see. And

you can see that more or less it it just at a very high level summary takes every single line in my entire workspace. Then

it just significantly increases the information density at a cost of like total comprehensiveness. So what I have

total comprehensiveness. So what I have now is is I have a summary of everything. So, what that means is the

everything. So, what that means is the next time that I ask Claude anything about my workspace, okay, the next uh the next go around, um I don't actually have to like have it like run through every single thing in the file. Like for

instance, what I'm going to do here is I'm just going to call this like um I don't know XYZMD or actually, you know, why don't I just delete this for now. You know, if I had

asked this clawed version something about dark mode, hey, what are my opinions on dark mode? It's going to check its memory for notes on the preferences. Not going to find anything.

preferences. Not going to find anything.

And notice how it's just going to say there's there's there's nothing at all.

So, what I could say is read through whole project and find my preferences.

And now what it'll do is it'll, you know, essentially launch some sort of agentic search with readmes and so on and so forth until it finds something about dark mode. In this case, it was in the gemini.mmd. Um, but I want you guys

the gemini.mmd. Um, but I want you guys to know that, you know, whether or not you have it in a gemini.mmd or it's just sort of written, it'll eventually figure it out. Now the issue is you know how

it out. Now the issue is you know how what what sort of usage did we just uh do in order to get that? If I just scroll all the way up here typed /context you know the system prompt

was6% free space was messages are 0.9%.

So that last message chain there with the tool calls and everything like that might have realistically taken like five or 6 thousand tokens. I don't need to do that sort of thing ever again. You know

if I bring that back and go claw.mmd and

then if I just open up a new instance and I say hey you know what are my opinions on dark mode? Obviously, it's

going to read the claw denimd and you know, instead of me having to use god knows how many tokens, if I go back to slashcontext and you see that now you only use 02%. So, basically saved myself what's that like 6,000. And let me tell

you, these cloud tokens ain't free, man.

Andropics increased on the price uh pretty aggressively, especially recently when they realized 99% of the world is now operating using their infra. So, I

guess what I'm trying to say is I'm spending like literal like money, but I'm also spending time. And to me, the bigger thing is time. What are some other things asked? I mean, like think about deployment. If you have any sort

about deployment. If you have any sort of like front end or full stack experience, you'll know like you know usually the flow is you start with a dev server. You use that dev server via npm

server. You use that dev server via npm rundev or equivalent to like figure things out on your uh you know develop various features and so on and so forth.

Then you'll build you'll do some sort of linting and then once you're done you'll actually like preview it. you'll you'll

push to production or or sorry you'll push to staging and then verify that and then eventually you push to production right like obviously this is something that it could have learned just by going through the folder structure seeing source public node modules all these

things but you know I'm just listing them out over here so that instead of you having to actually read any of that filing or tooling you know you can do it in god knows how many what's like five tokens six tokens or something immediately likewise you know I see

where things are laid out so in this case this is obviously a single page application the entire app lives in a single component nav hero services projects and footer sections. All markup

and logic is here. Um, it is evident if you were to actually click on app.jsx

and then scroll through that that is the case. But look at how many more tokens

case. But look at how many more tokens app.jsx is versus, you know, just that

app.jsx is versus, you know, just that brief little description in uh clamd. If

I were to copy and paste the entirety of this into something like a word counter, you can see it's 827 words approximately 1100 tokens. Okay. If I go back to my

1100 tokens. Okay. If I go back to my claud, like how long is how long is this? It's 22. So that's a what 45x

this? It's 22. So that's a what 45x compression ratio. That sort of

compression ratio. That sort of compression is how you ultimately get a significantly better and more effective clot because you are not shoving a tremendous amount of tokens at the beginning of any query. Um, and you

know, as we hopefully know, uh, token length tends to scale inversely with the quality of the output. The more tokens in a context window, not only the more money are you spending, but typically the lower quality the results are. So,

just avoid all that by initializing and then storing a bunch of information about, you know, what the project is on.

Uh, you'll be you'll be much happier for that. But, you know, slash in it isn't

that. But, you know, slash in it isn't the only thing that I would do. From

here, I'd actually start importing a couple of my preferences and then things that it's tried. So, I don't know, let's just say I'm going to remove the gemini.md for simplicity. Let's just say

gemini.md for simplicity. Let's just say I'm developing a new feature and um actually, why don't we just visualize app IDK what it looks like. Let me

actually take a look at this thing. So,

it'll run the dev server so I can see it in the browser. And immediately I'm thinking like, hey, you know, this is actually kind of inefficient. When I say visualize app, I basically just want you to launch it. So, store in your cloud.MD

that when I ask you to run the dev server or open the app, I just want you to open it in my Chrome instance as well. I don't just want you to run the

well. I don't just want you to run the dev server. You know, basically next

dev server. You know, basically next time I say this, I don't want it just to like say, "Hey, the dev server is here.

Give it a click." And then I'm like, "Okay, can you just open it cuz I'm already here." Um, I just wanted to open

already here." Um, I just wanted to open it automatically, right? Okay, cool. And

I see, you know, it's kind of over here.

Um, so that's nice. Definitely not a fan of the design. I don't like how it scrolls through. I'll just say, "Hey, I

scrolls through. I'll just say, "Hey, I want you to significantly improve the perceived visual quality of the application. Go and look up, uh, you

application. Go and look up, uh, you know, the Apple website and then compare that to our website. Make some changes.

That just improves both the perceived visual cohesiveness, quality, etc. Must have been a Gemini website. Anyway,

let's uh see what it does here. It's

fetching Apple's site for design reference, reading the current app code in parallel. And now it's just doing a

in parallel. And now it's just doing a bunch of updates, editing the CSS, nav-link, hero- background, and so on and so forth. And you can see that it is actually updating the site. I mean, it's doing it currently in real time, but uh it's looking significantly better. It's

also picked up some new colors and so on and so forth. Now, what's cool is it actually just opened up a second, uh project for me right over here. Um

because earlier on I'd stored my preferences that I don't want it to just give me the link. I actually want it to, you know, open open stuff up. So, that's

nice. So obviously we have better designing and stuff like that. But the

key part here is when I say, "Okay, great. Nice job. How could you have

great. Nice job. How could you have arrived at these conclusions and done everything I just asked you to do faster?" Okay. And now look, we're

faster?" Okay. And now look, we're already at the point where we're capable of optimizing a fair amount of these design issues. Instead of 20 edit calls,

design issues. Instead of 20 edit calls, which is what it did before, what it could have done is just do one write call. So the reason why that took like

call. So the reason why that took like 30 seconds or so because it was editing the CSS file line by line across 20 sequential tool calls. I should have read the file, rewritten the whole thing in my head, done a single write to replace index.css in one shot. Yes, I'd

replace index.css in one shot. Yes, I'd

like you to save this in the local cloud.MD do it as a user preferences

cloud.MD do it as a user preferences section. So asking it questions like how

section. So asking it questions like how could you've arrived at those conclusions and done everything I just asked you to do faster and for fewer tokens is pretty powerful. Um doing this consistently as you develop and design a project and then having a running log of

changes to the cloud NMD is also quite valuable. Another thing you can do is

valuable. Another thing you can do is you could set a meta prompt in the cloud NMD and that's personally what I always do that basically says like when you have made a mistake I want you to update the cloud NMD with a running log of

things not to do next time. When you've

made a mistake, I want you to update the cloud.MD with a running log of things

cloud.MD with a running log of things not to try next time. Essentially, I

want this to be almost like a mini experimenters or research person's notes that shows what a future Claude instance should not do uh while working on this project. Update the claude.mmd to

project. Update the claude.mmd to

reflect what I just said at the very bottom. Okay, now it has a section

bottom. Okay, now it has a section called lab notes, what not to do. This

is going to show a bunch of failures as well as learnings and successes and so on and so forth. And we're already honestly like halfway down the loop.

Now, this is a very contrived example because I'm literally just building a website. But imagine that, you know,

website. But imagine that, you know, instead of just a website, you're building a workspace that is meant to contain all of your business uh basically entirely. All of your SOPs,

basically entirely. All of your SOPs, it's meant to contain all of the work that you do on a daily basis. It's meant

to contain your to-dos and so on and so forth. Having information like what I

forth. Having information like what I just showed you for this project would be invaluable across more or less all levels of both development and then also productivity. And that's personally what

productivity. And that's personally what you should ultimately be working towards. So anyway, we can make this as

towards. So anyway, we can make this as complicated as we want obviously, but hopefully you guys see that loop at work. We plan a feature. So we just did

work. We plan a feature. So we just did this. It was simple enough that we

this. It was simple enough that we didn't need to use a dedicated plan mode, but obviously I still oneshotted it. After it implemented the feature

it. After it implemented the feature along the way, it did a few things that realistically it could have done better.

So what do we do after? We take those learnings, we compile them, and then we update the cloudmd. And this was sort of a meta example since I literally was doing it while I was building the cloudmd, but hopefully you guys at least understand conceptually sort of what you

do after four or five of these runs.

There's probably a fair amount of stuff here that you can take advantage of and that's where an insight run would make sense. So, let me actually zoom in and

sense. So, let me actually zoom in and then just delete this so you guys could see. In case you didn't know, insights

see. In case you didn't know, insights is a simple slash command that basically runs a bunch of sub aents across all of your cloud conversation history. The

benefit to that is now not only are we running you know and changing our local cloudmd we're also evaluating all of like the patterns in communication that we've had with cloudmd over the course of the last I don't know could have been

like a few days could have been months could have been I mean years depending on how soon or late rather you are watching this video. So um just like we optimized our local cloud MD now we can start optimizing our global and while

it's chewing away because insights does take a fair amount of time. Okay, I'm

just going to create a new file here and I'm going to call it globalcloud.mmd.

And I'm just going to give you what I would consider to be at least as of the time of this recording probably like some of the higher ROI principles to make sure to include. Um I include this in my own global cloud and MD because I

think it's just very very valuable. So

I'll say global cloud and MD. This is

inserted at the beginning of any conversation with Claude across all of the users workspaces. So first I have a profile section. So this is a bit about

profile section. So this is a bit about Nick. So, you know, I don't know. It'd

Nick. So, you know, I don't know. It'd

be like Nick is a 30-year-old and J high performing internet entrepreneur.

He runs a YouTube channel at 350 better 350 by the time I make publish this video. 350,000 subs, an Instagram

video. 350,000 subs, an Instagram channel, and so on and so on and so forth. Okay. Okay. And so I have a bunch

forth. Okay. Okay. And so I have a bunch more information which I've taken from just a couple of other systems I've built. Um this one here is Nick is a

built. Um this one here is Nick is a 30-year-old NJ. Here's his revenue. So

30-year-old NJ. Here's his revenue. So

here are all the different things that contribute to my revenue. Here's some

churn math. Um some of the companies that I'm currently owning. Uh some

teams, right? So it's me. It's an

editor. It's a LinkedIn newsletter person. It's a bunch of AI agents. Bunch

person. It's a bunch of AI agents. Bunch

of information on YouTube as well as my goals. And then ultimately some on

goals. And then ultimately some on Instagram as well. Now you're thinking like Nick, this is crazy. Why would you insert all this information in your global cloud NMD? Well, the reason why is because I want this on every conversation that I have with it to

understand who I am and to take that into consideration when discussing things with me. I can't tell you how many times I'm having a conversation with Claude and because I don't have context like this cuz I'm in a naive thing with no personal system prompt

check in the context window. I say

something along the lines of, "Hey, what's the best solution for X, Y, and Z?" And then it says, "Oh, you're going

Z?" And then it says, "Oh, you're going to want to do this solution." And then I say, "Why?" And then it'll say, "Oh,

say, "Why?" And then it'll say, "Oh, because it's the cheapest, right? It

only cost 0.2 whereas the all the other solutions cost $5." And I'm thinking, well, if you knew a little bit about who I am, you'll know that money is not the primary bottleneck right now. I prefer

you to exchange my my money for my time.

Um, so just giving it some like highle principles like that is very important.

Anyway, while it was doing that, um, the actual sharable insights report, um, is ready. So I'm just going to tell it to

ready. So I'm just going to tell it to open it so I can take a look at it with you guys. And now you'll see there's an

you guys. And now you'll see there's an HTML page basically that runs through everything about Claude, all of the insights across all of the sessions.

Looks like 1,849 messages across 200 sessions. I don't know where this

sessions. I don't know where this chooses the cutoff. It looks like it's like about a month or so. Um although

keep in mind that like this is cloud code specific and I don't know if this encapsulates all the conversations I've had with it on the desktop app, but pretty good. And you can see here that

pretty good. And you can see here that you know there's a bunch of context about what I work on and and so on and so on and so forth and how I use it and all this stuff.

So the the important thing to do is existing features to try section. You

can just copy this into cloud code and add in your cloud NMD. So for instance, when using Chrome Dev Tools MCP or browser automation, always kill stale Chrome processes in a clear profile before starting. If the MCP tools fail

before starting. If the MCP tools fail twice, stop and ask the user before continuing to retry. Never waste tokens on repeated failing browser calls. This

is actually quite valuable just given how many times I have tried to have it run, you know, Chrome DevTools MCP and it's failed. Um, same thing over here.

it's failed. Um, same thing over here.

Same thing over here, you know, with some face swap information and stuff like that. You can copy all this in

like that. You can copy all this in cloud and it'll set it all up for you, which is pretty valuable. as well as it can even go and build like new skills based off of things that you consistently ask. So that's that's more

consistently ask. So that's that's more or less what I'm doing here. Um anyway,

the the value with this is basically to like copy the entire thing, go back here, paste it in and say this is my claw insights file. It describes at a high level a few of the obvious design

patterns in my thinking and then a couple of the issues that I've had communicating with you and other versions of you. I'd like you to distill this into a list of high information density snippets that I can paste into a global cloudmd to be both tokens

conservative but also avoid most of the mistakes that you typically make. Then

I'll just press enter. It's going to give me some information about that.

Then over here we actually have the changes and this is very high information density, right? It basically

took a bunch and said don't overexlain, overengineer or add un requested improvements. When making widespread

improvements. When making widespread changes to a file, use one write instead of many sequential edit calls. Speed

matters. Don't fetch well-known websites. again a rerun browser

websites. again a rerun browser automation and you know so on and so forth just some just some highle stuff it looks like it just inserted that in here which is quite nice so now what do

we have we have if you remember some context on me in the global clouded MD we also have some highle reasoning rules and principles and really what we're just missing is some um token

conservation strategies and you can see this by uh you know you can go back uh rewind the video if you'd like some more on that but basically you want context about you your goals and your reasoning strategies some high level preference references about, you know, what it is

that it is currently doing that is wrong that you would like it to fix and then some good token conservation strategies like knock docs first. So, I'm going to do is underneath interaction rules, I'll also just say, oh, and what's really interesting that I'm seeing one of my

rules are actually directly contradicting some of the other rules.

Don't fetch well-known sites. Actually,

just remove that. Um, that's the human in the loop part, right? Just look to see if any two rules directly contradict each other. Then I'll say when a user

each other. Then I'll say when a user asks you to use a non-trivial platform, one for which you do not have context in always look up the documentation first.

You can do so by looking into API documentation plus platform name. After

if for whatever reason you can't access the docs for JavaScript reasons, launch a Chrome DevTools MCP Chrome instance so that you could still copy and paste all that data. No matter what, if you're

that data. No matter what, if you're working on a project for whom API documentation is available, you should always go through the API documentation to avoid 99% of the errors. The tokens

we spend reading the docs will save us a lot of tokens and trying to use things that don't work. Cool. So, I'm going to copy that. And now I have my global

copy that. And now I have my global cloud NMD. And you know, I could

cloud NMD. And you know, I could obviously just have Claude actually insert that into the global cloud MD.

Um, I could also just like go and find the find the finder. So, I'm going to go to finder on Mac. Basically, you can find your global cloud NMD just by going to your Mac. In my case, users on my next surf. And then um there's a hidden

next surf. And then um there's a hidden folder here which you can't actually see. Just right out of the gate, you

see. Just right out of the gate, you should be able to go uh shift command. I

think it's comma or period. There you

go. Shift command period. Once you're

done with that, you can scroll all the way down where it says cloud. And then

over here, you'll see there is a cloudmd that lives within that cloud. So what I can do now is I can just reveal this folder in my finder, compare it to that folder in my finder, and I can actually

just go drag and drop this in. I have

global cloud and I can just remove this claude and replace that with this cloud.

Awesome. So now all future conversations that I have with claude across all of my uh workspaces and all of my folders will include the information that I just provided. And hopefully you guys see how

provided. And hopefully you guys see how simple it is to run that loop. Um

granted this is an informal loop. I'm

not really showing you guys like a simple formal streamlined process. But

hopefully you see how easy it would be to build that in again as like a meta claude.

Let's talk a little bit about agent harnesses. So agent harnesses, the term

harnesses. So agent harnesses, the term anyway has gotten a ton of interest over the last couple of months because it's sort of new and exciting, but very few people actually understand what it refers to and what it means. An agent

harness, to be clear, is just claude code. Claude code is the harness around

code. Claude code is the harness around the model claude that enables it to do things like call various tools and get actual economically valuable work done.

For those of you that don't know, um, all that like AI models are are just text interfaces, right? It's just text in, text out. A harness is what turns something that can only communicate in

text into something that is ultimately capable of like controlling our computer. So the way that I personally

computer. So the way that I personally think about the question, what is a harness? Is a harness is just everything

harness? Is a harness is just everything that wraps around the LLM that is not the actual LLM itself. So in our case, it's claude code. It's the system

prompt. It's the hooks. It's the tools

prompt. It's the hooks. It's the tools that it has access to and it's the parameters therein that control things like when the memory autoco compacts, how many messages you can send in a

turn, what the total number of token limits are, and so on and so forth. For

the purposes of this demo, let's pretend that this over here is our clawed space invader. Okay? And so this is sort of

invader. Okay? And so this is sort of like the the the large language model itself. This is actual Claude. And so

itself. This is actual Claude. And so

Claude is obviously like a galaxy brain intelligence. It's been trained on god

intelligence. It's been trained on god knows how many books and blog posts and encyclopedias and so on and so forth.

But you know Claude is sort of it sort of exists in this boundary where it can't actually do anything outside the real world unless it's given the tools

and the ability to do so. And so um one example of things that Claude has access to are set tools. So that's things like uh I don't know the ability to use bash

aka use a terminal. The ability to use I don't know GP which is how it finds things on your computer and so on and so on and so forth. Another thing that it has access to kind of gone back and

forth is some form of memory right what it can do is it could read so it could read things that are stored in this memory and then it can also write so we can add and sort of update things as needed. You know, there's obviously also

needed. You know, there's obviously also a variety of other things here that it has access to. And, you know, if it didn't have access to all these things, again, it would just be like an agent or a model, sorry, that exists in the box.

And so, that's really the difference between, you know, LLMs and agents.

Agents are LLMs plus a harness, whereas LLM by themselves, large language models, they can't really do anything.

They obviously um operate entirely in the domain of knowledge. So, just given the fact that it's called a harness, you can kind of think of it as, you know, I'm going to draw a really crappy dog here. Put another way, here's a really

here. Put another way, here's a really crappy rendition of what I initially wanted to be Canadian dog sledding and what ended up being uh looks more like Santa with a big fat beard riding a

questionable reindeer. Uh, but

questionable reindeer. Uh, but basically, you can imagine that like this right over here, this is your LLM.

This is the actual model intelligence.

And then you over here, okay, this is your harness. This is actually like the

your harness. This is actually like the the code part of clawed code that sort of controls it. And so the LLM wants to go in a bunch of different ways. It

wants to do a bunch of things. What the

harness does is it just sort of narrows down its um direction. And you know, you can kind of almost think of it like uh the barrel of a gun or something like that, right? Whereas, you know, back in

that, right? Whereas, you know, back in the day, you might have had like cannons and then you might have loaded those cannons with big uh massive cannon balls and they're huge and what you do is

you'd stuff some additional gunpowder underneath and stuff like that. And

those cannons would kind of, you know, despite the fact that they were operating off the same fundamental technology, which is gunpowder, they might not really be able to go so far. I

don't know, let's just say 50 m.

Nowadays, obviously, we have um this is my really crappy gun drawing with, you know, more or less the exact same technology, you put some sort of bullet in there, right? But then because of the

technology that surrounds the core thing, which is the gunpowder, you know, the bullet can go a lot farther. And so

maybe instead of 50 m, now it can go, I don't know, 250 meters or so. So, this

is how I this is how I think about harnesses, okay? And I don't mean to

harnesses, okay? And I don't mean to just show you a bunch of silly grade school analogies, but it is important to realize that like that is what um now cla code really is. And because Claude code is a harness, obviously there are a bunch of other people out there that

have tried making their own harnesses as well, just like we have frameworks like React and Vue and then Nex.js and and and Nux. We also have a bunch of

and Nux. We also have a bunch of different harnesses that have been developed that supposedly work on and then improve on on specific aspects.

What are some of those aspects? Things

like security, right? Automatic

permissions. So, plan mode versus default mode versus the new enable auto mode and then bypass permissions mode.

You know, there's some harnesses out there, okay? There's some agent SDKs and

there, okay? There's some agent SDKs and stuff like that. Not going to name any names, but um there are some of them that are probably a little bit less secure than others such that if they were to read a Twitter thread that look

like this, maybe they would actually execute pseudo rm-rf and delete your entire hard drive. Right? Bunch of

examples of people screwing around with this. This is an example of codeex which

this. This is an example of codeex which um you know being an extraordinarily competent model I can't really talk down too much on but this is an actual conversation that uh you know it had with somebody that I found on Twitter.

You know the model basically tried running something that was like rm-rf which to make a long story short in case you didn't know just deletes everything.

And uh here it says well the shell policy actually blocked the raw rmrf. So

what I'm doing is I'm removing those generated directories aka the shell policy with a python cleanup instead.

same effect, less policy friction, right? It's just going to go end up

right? It's just going to go end up deleting the entire thing. You know, the the harness impacts a model's ability to get things done. It also impacts ultimately the safety. It impacts like the memory and so on and so on and so

forth. And so hopefully at least now you

forth. And so hopefully at least now you guys understand what the harness is before I show you guys some examples of different versions of it. Obviously,

Claude code is the major harness today.

Um, but there's a great blog post over here by Langchain that more or less describes a way to create different harnesses. uh the model gets a certain

harnesses. uh the model gets a certain type of context injected into it prompts memory skills or conversation then you also have orchestration things like Ralph loops which was really big a while back that was a different type of

harness um you know there's a certain persistence of data actions and then the ability to both observe and verify say with screenshots and stuff like that um one harness that a lot of people are

using now is this sort of droid idea which was built by factory AI so droid is like a publicly available harness that you can run and download today pi P.DEV is also exploding in terms of

popularity. So whereas claude code, you

popularity. So whereas claude code, you know, obviously needs to run with claude um infrastructure, right? Cloud is the model underlying cla, this PI coding agent is sort of like the open- source provider of it, you can feed in more or less anything that you want, including

claude, and then just have it operate inside of this this harness. And you

know what this does is it just changes the way that we store memories. It

changes the way that we store certain files. It sort of like modifies it's

files. It sort of like modifies it's almost like an alien or bizarro version of cloud code in so far that it changes a few of like the fundamental constants like how long before context compaction you know how to try different types of

solutions and stuff like that various bakedin behaviors regarding a cloud code and and so on and so forth and the reason I'm covering this is because you know this is something that was very fundamental to anthropic that back on November the 26th 2025 they wrote a big

long blog post called effective harnesses for longrunning agents which at the time kind of changed the game and I would say this is the beginning of the kickoff of cloud code superiority over most other harnesses. And so, uh, you

know, here it describes various different ways to work on longrunning coding projects and manage environments and stuff like that. And so, obviously, this is something that's like very fundamentally baked into cloud code. If

you want to understand cloud code in an advanced level, uh, you can't get better than getting it at a harness level.

Okay. So, you know, obviously this is a cloud code course. This is not another harness course, but you should at least know what agent harnesses are before you proceed to the rest of the course because you know the more understanding of harnesses you have, I think the

better you'll be able to appreciate and then digest and ultimately execute on what I'm about to show you.

Next, I want to chat a little bit about parallelization, about things like agent teams, about sub agents, and a couple of other ways of distributing work uh to minimize the amount of time and effort that goes into things while also

increasing the quality of the output.

Okay, so I say agent teams here, but let's start with parallelization. A big

question that I think a lot of people have is, well, first of all, what the heck is parallelization, which is just doing multiple things simultaneously instead of waiting for uh sequential things to finish. And then the second one is like, Nick, why the hell should we paralyze our agents to begin with?

And to that I say, have you ever, you know, sent a longunning task request to cloud code and actually had cloud execute on something for more than a few minutes? For the vast majority of the

minutes? For the vast majority of the time, you're just sitting there twiddling your thumbs. Twiddling your

thumbs is not very economically productive. So if I have ways to not

productive. So if I have ways to not twiddle my thumbs, I will do so. And I

really I guess mean is that autonomous agents just take a long time to finish tasks. You know, when we started with

tasks. You know, when we started with this stuff, or at least when I started with this stuff last year, you know, Claude could realistically work on things for 30 seconds. The other day, I had Claude work on something for over 15

minutes. And so, if all I'm doing is

minutes. And so, if all I'm doing is just sitting there waiting for it to do this 15-minute task, you can imagine that my productivity is basically going to be punctuated by me just sitting around watching it. It does something, I get the result, make some minor changes,

wait for another 15 minutes, and so on and so forth. that's not very efficient.

So, parallelization allows us to reduce the total amount of time by a factor of at least a few from 15 minutes to maybe a couple minutes. So, it'll be able to work on smaller, more more self-contained things. Um, but two,

self-contained things. Um, but two, it'll also just get higher quality.

Another thing is that many tasks feature independent steps that can be broken down. So, for instance, let's say I'm

down. So, for instance, let's say I'm doing some sort of task. Okay? And this

is just like how long it would normally take if we go serally. And so the option A is just to do what most people do, which is where they'll do, I don't know, they'll do step one, and then they'll do step two, and then they'll do step

three, and then they'll do step four. So

that's one, two, three, four. This task

over here takes 5 minutes. This task

over here takes 5 minutes. This task

over here takes 5 minutes. And this task over here takes 5 minutes. What's the

total amount of time kind of collectively? Well, it's 20 minutes,

collectively? Well, it's 20 minutes, right? So that's sort of a um you know,

right? So that's sort of a um you know, the serial way that most other people do things. Well, guess what? Turns out a

things. Well, guess what? Turns out a lot of tasks don't need to necessarily be like that if I just copied all of this stuff over.

Okay. And then instead ran a couple of these in parallel. So I actually had uh I don't know three of these simultaneously and then kind of combined all of them if I did something maybe

more akin to this instead. Hopefully you

guys could see um now okay instead of everything taking you know 5 minutes 5 minutes 5 minutes and 5 minutes maybe what I'm capable of doing realistically is this takes 5 minutes. This takes five

minutes. And then the integration step

minutes. And then the integration step about these three which were two, three, and four only takes two minutes. So what I'm doing is I'm basically converting a task that previously took 20 minutes and I'm

turning it into one that took 12 minutes, which you know, if you just did a little ratio 12 over 20 is equal to 3 over 5. And so what I'm capable of doing

over 5. And so what I'm capable of doing is getting it down about 40% about 60% of the total. Hopefully you guys see when you have tasks that can actually be broken down in this way, aka a task that you can expand and run simultaneously

through some form of parallelization, just makes more sense to do all three of these things simultaneously rather than one parent agent being responsible for everything like doing one then doing two then doing three and doing four. What we

can do is we could take two, three, and four, stack them on top of each other, add an additional step five called a synthesizer, and then take the results of these in and do it do it in like a fraction. Um, another big reason is that

fraction. Um, another big reason is that agents are what are called stochcastic, aka they don't always return the same answer. So if I ran, you know, Claude

answer. So if I ran, you know, Claude five times on basically the exact same thing, every single time I have a slightly different response. Okay, every

time I have a slightly different response. And just to show you guys what

response. And just to show you guys what I mean by that, I'm going to open up um my Claude code over here. And I'm

actually going to open up three different tabs. Let me just visualize

different tabs. Let me just visualize this. Stick this right in the middle.

this. Stick this right in the middle.

Okay. And then over here, let me just make sure that all these are operating the same. I'm going to say, I'd like you

the same. I'm going to say, I'd like you to determine five ways to improve this codebase.

I'm just going to paste this across all three of these. I'll paste and I'll paste. Now, I'm just going to run all

paste. Now, I'm just going to run all three of them. And I just want you to notice sort of what's going on here.

Obviously, the first thing it's going to do is try reading the key files. But

check out the different uh solutions basically that it's coming up with on all three of these different runs. So on

the first run, okay, uh, brokage im broken image paths, missing title and metatags, nav links hidden inside mobile with no replacement. Project cards

aren't actually links. No keyboard focus styles or skipped to content link. The

second one was broken image paths, missing metatags, no mobile nav, but now look, placeholder links everywhere, typo in footer. Okay, and you can see that,

in footer. Okay, and you can see that, you know, basically the more times we run these uh, you know, agents and then the further away they get from the beginning, the more they tend to diverge. And there's a statistical

diverge. And there's a statistical reason for that, right? Like at the very beginning, this is sort of like um I don't know the total answer. At the very beginning, you know, red is pretty similar to

black, but eventually it diverges a fair bit. Green's similar to red, and it

bit. Green's similar to red, and it diverges a fair bit. Blue is similar to all these, but it diverges a fair bit.

And I guess the point that I'm trying to make is like, you know, over here, uh let's pick another color. So, it's

pretty obvious that these are all bit different. We'll do purple. over here.

different. We'll do purple. over here.

These this is sort of like the zone of similarity, right? But then after you

similarity, right? But then after you make it to a certain point, uh because of the multiplicative nature of how large language models work under the hood, they're basically multiplying the statistical probabilities of like one token after the other after the other

after the other. Um you have massive divergence in the end result. And so you know this might go ABC, this might go B

C D, this might go AB, this might go ABC, and this might go a CQ or something like that. What you can do is you can

like that. What you can do is you can actually just run five times. And now

notice if I ran this once, I'd only get ABC. But because I've ran this another

ABC. But because I've ran this another time, I got all the way to D. I ran this another time, I got all the way to E.

You know, if you just count up all of the different unique answers here, I have A, I have B, I have C, I have D. I

also have E. I even have Q. And then I have zed. So you can see here that like

have zed. So you can see here that like I'm basically getting 2.5 times the total number of possible answers by running things multiple times and then just averaging out and taking all the unique outputs, right? That's really the

the principle of stoasticity because they don't always return the same answer. If you parallelize your agents,

answer. If you parallelize your agents, you can actually run multiple times with the same or similar queries and then you can actually have different answers given to you that just sort of live outside of the distribution uh or

average run, which is pretty amazing.

So, I'm going to show you guys how that works specifically with um debate and stochastic consensus models. If

anybody's seen my agentic AI course on that, you'll know more or less what I mean by that. I'm also going to show you some fan in fan out um researching flows as well as some some sequential pipeline handoffs. Uh but really the the fourth

handoffs. Uh but really the the fourth and final reason is because model performance degrades as context increases. So the shorter and the

increases. So the shorter and the cleaner your context windows are typically the better the results are as well. Uh what I mean by this is you know

well. Uh what I mean by this is you know because the parallelization aspect typically involves sub aents which I'm going to show you guys a little bit about. You get to avoid the problem

about. You get to avoid the problem where the increasing length number of tokens um leads to poor performance. And

so, you know, if like on average, this is more or less the relationship between the number of things in your context window and then the performance of the model, uh, we are going to we're going to end up just almost always staying

right around here, which is the zone of good. By the way, I just made that up.

good. By the way, I just made that up.

It's not actually called the zone of good. Hopefully, you guys understand the

good. Hopefully, you guys understand the distinctions there, though. Um, when you parallelize and then feed tiny chunks of a problem to multiple agents, they can all be at the zone of good. You don't

actually have to like go all the way down here. It's not just one agent

down here. It's not just one agent that's doing all the work. Okay. So, so

what are examples of how to parallelize in the first place? Well, there's like a built-in function called agent teams now in cloud code, which does a fair amount of this. So, I'm going to be showing you

of this. So, I'm going to be showing you guys some ways to do that, but I just wanted to chat a little bit more generally without even going into agent teams first before I show you some demos um of like different ways that I personally approach problem solving. And

I've seen some of the best and the brightest use cloud code for this sort of parallelization. And I'm going to

of parallelization. And I'm going to call them common team patterns. Okay. Um

essentially, there are three main things I want to cover. The first is the ability to fan out and then fan in. And

so that's where you actually spawn a bunch of different research sub aents.

And then you have a synthesizer sub aent which takes all of their outputs. And

then um based off of the outputs of that synthesizer, you can do either more fan out fan in flows or you could do some form of final synthesis step. Okay.

Okay. So what I mean by that is like let's say before you have a query and it's you know I want to find the best okay absolute best APIs

for my feature whatever the feature is it's X feature I don't know it's like some app that generates things whatever so I want to find the best APIs out there for this feature that you know allow me to very quickly and easily do

the things that I want to do. Well, you

can imagine like if you were to do this um in the old school linear path, what would happen is claude code would spin up, okay, in the same thread research on

site number one and then go on site number two and then go site number three and then go on site number four, right?

And what would be occurring the entire time that we're going through all these different websites? Well, the length of

different websites? Well, the length of our total contacts would increase, meaning our performance on average would also decrease. Okay. In addition, it's

also decrease. Okay. In addition, it's taking time. So, it's five minutes here,

taking time. So, it's five minutes here, it's five minutes there, it's five minutes there, it's 5 minutes there, and so on and so forth. Then, at the end, what it would do is it'd have a final synthesis step, which I'm just going to call S, which would basically combine 1

2 3 and four together, which could take a certain other amount of time, maybe another 5 minutes before finally giving you your answer. And so the cost of the answer, okay, if you think about it as

like almost like a line item, the cost of the answer is, you know, first of all, 25 minutes, which is obviously non-preferable to instant. And then, you

know, a fair amount of tokens on poor quality outputs.

you know, you're probably going to end up spending a similar amount of tokens regardless, but you're spending those tokens on poor quality outputs because you're kind of you're kind of down here as opposed to up here, right? You're

you're here where you don't want to be.

Now, what fan out and fan in is is very similar to what I showed you guys earlier. You have a research query and

earlier. You have a research query and that's, you know, find best APIs.

And so, what it does is cloud code basically goes in and then immediately spawns. Okay, let's just say four

spawns. Okay, let's just say four research agents. And so now we have

research agents. And so now we have research agent one, research agent two, research agent three, and then we have research agent 4. Okay? And so what we're doing is here is we're we're we're fanning out.

These all operate totally independently, accumulating their own context windows.

Because they're new agents, they're almost always in the zone of good. Maybe

they'll push a little bit farther beyond that, but they're still pretty good.

Once we're done with that, what we do is we do the opposite, which is the fan in, and we feed all of those into a final synthesizer agent. That synthesizer

synthesizer agent. That synthesizer agent now has a different prompt. The

prompt is not, hey, go do this research.

The prompt is, hey, here's a bunch of context from a bunch of other models that have already done the research.

Meaning, the prompt gets to be shorter.

We then apply highle reasoning strategies and principles to make that a synthesizer as smart as possible and say things like we want you to integrate um anything that overlaps as well as any

outliers and then score them slightly differently. And so, you know, rather

differently. And so, you know, rather than being all the way over here with our big thing, you know, probably we're somewhere over here in the middle, which means the performance is going to be a little bit better. And then obviously the synthesis step can occur in approximately the same amount of time as the actual research because you can

spawn almost an infinite number of sub agents to go do research for you. And so

really what happened now is you have 5 minutes here, you have 5 minutes here, you just add these up, it's 10 minutes.

And so not only are we significantly faster, we're also a lot higher quality because now we have all the the data and information laid out to the synthesis agent. More importantly, there are

agent. More importantly, there are different models that are better at different things. And so within Claude,

different things. And so within Claude, you have not only your, you know, heavy lifter, which is usually the opus models, but you also have, you know, your sonnet models. And then although not a lot of other people use them these days, you also have your haiku models.

And so what you can do now is for the research which consumes a massive number of tokens but realistically doesn't usually need like a ton of reasoning for it. It's more of like data extraction

it. It's more of like data extraction you use something cheap like haiku and sonnet. And then for the synthesis you

sonnet. And then for the synthesis you use something like opus because you're applying different models at different steps. Not only is it um going to going

steps. Not only is it um going to going to occur much faster cuz sonnet works faster than opus. So maybe instead of 5 minutes here, it's actually I don't know 3 minutes. But then the cost is going to

3 minutes. But then the cost is going to be a small proportion of the money that you normally would have spent just because of the way that pricing on claude works, right? Pay attention here to the fact that claude opus, you know,

in this case 4.6 is five bucks. Sonnet

4.6 is three. So we immediately save 60% right there. And that's just your base

right there. And that's just your base input tokens. That's not taking into

input tokens. That's not taking into account um you know, like the the the the massive difference in also output token cost and so on and so forth. And

obviously things get even better if you go down ha coup and and so on and so forth. And so you can formalize this as

forth. And so you can formalize this as a skill if you would like. Okay, I'm not going to I'm just going to feed it in a simple prompt, but um this will illustrate what I mean. Let's say I'm right over here in my project. Okay, let

me just delete this um global cloudmd because we don't need that anymore. Then

I'm going to essentially let me just go back here and then copy the actual text.

I say use a fan out fan in and researchers synthesizer approach to research the question

how best should I optimize this codebase minimum five sub aents use sonnet to do the research and contemplation individual contemplation opus to

synthesize so now what's going to occur is rather than us just waiting non-stop for all of these what this is going to do is it'll fan out six sonnet research agents. Each

are going to investigate a slightly different optimization axis. They're all

going to focus on slightly different things. Then they're going to synthesize

things. Then they're going to synthesize all of those results back together with Opus. If I zoom out, so you can actually

Opus. If I zoom out, so you can actually see all six of them running simultaneously.

Despite the fact that we're not using this agent team feature, we're just using the um sub agent feature right now. Uh you know, all of these things

now. Uh you know, all of these things basically immediately are generated.

Their contexts are quite short. So I

mean in the grand scheme of things, this is a much shorter context than we would ultimately accumulate in our main agent.

All of them are focused on slightly different things which are obviously autonomously managed by that orchestrator. And then finally, these

orchestrator. And then finally, these six agents can finish in a linear amount of time as opposed to you know like multiple one. So this just finished the

multiple one. So this just finished the architectural research. It's going to

architectural research. It's going to wait for the remaining five agents now.

All right. And it looks like it just finished all six research runs. So now

it's going to synthesize all the findings with Opus. It's then going to also be able to take advantage of things like its planning features and so on and so forth before synthesizing. And here

it is. Okay. High impact, easy fixes.

gives us a big list. It's also writing the um high to medium impact, easy to medium effort. And so, I mean, you know,

medium effort. And so, I mean, you know, obviously um I'm not just pulling this out of my my ass here. Um Anthropic has done a lot of research on the best way to solve problems. And um you know, Opus with a bunch of sonnet sub aents

massively outperforms opus both on time, but then also quality specifically because of, you know, sonnet's longer context window as well as just like general usability. Um that's what I care

general usability. Um that's what I care about. I just care about my own

about. I just care about my own usability here. I could spend as much

usability here. I could spend as much money as I want on these things at this point. What I care about is like how can

point. What I care about is like how can I extract the maximum quality with the minimum amount of time and that's the design pattern that you want to use. So

I mean like use this anytime you're contemplating problems and you don't just have to contemplate like specific API problems or development problems as well. Like I use stuff like this anytime

well. Like I use stuff like this anytime I'm designing um business systems, anytime I'm designing process optimizations. I mean I did this the

optimizations. I mean I did this the other day when I was doing product differentiation. Basically coming up

differentiation. Basically coming up with different ways to price and package products for a company that I now own that does this sort of thing. The

opportunities here are basically limitless. You do this for competitor

limitless. You do this for competitor research. You can do this for whatever

research. You can do this for whatever the heck you want and I I commonly apply it. Okay, so that's fan out and fan in

it. Okay, so that's fan out and fan in where you basically spawn and researchers usually using a cheaper dumber model like sonnet and then you have a a larger synthesizer model that actually combines the results. Um that's

how you get you know some of the best quality and then also the best quantity.

Next, I want to chat debate and stochastic consensus because it's kind of similar similar but um you know it's also a little bit different. I use

debate and stochastic consensus to basically like hammer out nuanced arguments and nuanced quality discussions. You know earlier how I said

discussions. You know earlier how I said we had one agent come up with ABC, another one come up with CDE, another one come up with like ABQ and so on and so forth. Well, basically with

so forth. Well, basically with stochastic consensus and then later debate, what we're doing is we're having different sub aents come up with different um lists of solutions and then

we have something else go through identify all of the mode identify the mode which is the uh frequency of you know the the number of

times that a solution pops up. So let's

say solution A pops up twice. Okay, this

synthesizer agent would say, "Okay, there's two A's. Uh, B pops up twice, so we'd go 2 B. C pops up twice. We go 2 C.

D pops up how many times?"

One, so we'd go D. Then counts E. That

also counts Q. And so in this way, you could see statistically speaking, uh, you know, a lot of agents think these three are great solutions. One agent

thought this was a good solution.

Another agent thought this was a good solution. And finally, another agent

solution. And finally, another agent thought this was a good solution.

basically the votes of confidence here are fewer. And then what you can do is

are fewer. And then what you can do is you could use this. It's almost like um like a weighted average to tell you what approach to take. You know, if it's like an equation where like my final uh I

don't know decision, which we'll just say decision is kind of like this. It

would equal 2 a + 2 b + 2 c + d + e plus q. And I know this is math, but don't

q. And I know this is math, but don't get scared here. The point is not to actually calculate the final solution.

The reality that I'm attempting to convey to you is that because so many models came up with A, so many other models came up with E and B and Q and so on and so forth, you can quickly

determine consensus between a number of agents that come up with ideas. And then

you can also determine which ideas are genuine outliers in so far that you know only one out of three models actually came up with this thing. One out of 24 four models suggested you should do X,

Y, and Z thing. And so you get to farm both like the statistically most likely answers to solutions, but also like the massive outliers which can make you quite um I want to say competent at

solving problems in a very short period of time. And this works in a really

of time. And this works in a really similar way to what I talked about earlier with like the total solution space, right? You know, if uh there are

space, right? You know, if uh there are really a fixed number of ways to solve something and the reality is there are a fixed number of ways to solve something and there also a certain number of ways not to solve something. Well, what you want to do is you just want to like

cover that ground as quickly as possible. And in reality, what you could

possible. And in reality, what you could do is you could quickly spin up an agent to do all of to figure out all the ways not to do something. Okay? And then you could have, you know, one sub agent slowly figuring out, "No, this doesn't

work. No, this doesn't work. No, this

work. No, this doesn't work. No, this

doesn't work." All simultaneously. And

then what you end what you end up with is you just end up with like this beautiful field of like highly differentiated green which tells you what you can actually do. And I

understand this is more conceptual, but just bear with me here. I'll show you guys an actual example in a moment. Now,

stochastic consensus is cool as sort of like a first go, but debate is even cooler because now what you do is you basically take all of these points, okay? And then you feed them into an

okay? And then you feed them into an open like conversation or chat room where all other models can weigh in on solutions that might not actually be very obvious. So now, okay, if I just

very obvious. So now, okay, if I just recreate this solution, we have agent one come up with ABC. Agent two come up with BC, I don't know, let's just say E.

Agent three come up with ABQ. Okay, what

we do is we divide this into time steps.

And so this is time one, this is time two, this is time three, and this is time four. What we do at every time step

time four. What we do at every time step is we allow all other agents to look at all of the uh conversations and and all the thoughts that all the other agents have had. Okay? And what occurs as we

have had. Okay? And what occurs as we move through is agent one gets to see agent two and agent 3's responses. And

so it gets to differentiate. Maybe now

it goes a b c e zed because it comes up with some additional solution by comparing its two, you know, two and three. Maybe this one comes up with bc,

three. Maybe this one comes up with bc, but then it eliminates E because it just doesn't think that made much sense and then it comes up with an F. You know,

this one comes up with with uh I don't know, two different letters and then ends up uh you know also identifying some of the previous solutions, but then combining them in new ways and stuff like that to come up with better ones.

And so what we do with a debate is it's not really a debate in a practical sense. It's not like, hey, your job is

sense. It's not like, hey, your job is to try and convince other people why A, B, and C are the best solutions. What it

is is every model has access to all of the other models. And so, because they have access to all of the other models, and they don't have to spend all that time reasoning, they can just see the results. They can then incorporate those

results. They can then incorporate those and come up with increasingly nuanced uh solutions and, you know, ultimately span a large search space in a very short period of time. And so, we can just proceed with this all the way down. You

can run as many of these like steps as you as you want until ultimately you have like a a list of solutions provided by a bunch of different models that are just way more complex, way more nuanced, and also just like way more interesting

than the initial ones that you know one agent might have come up with. All

right, so I'm back on my business workspace here and uh we're still doing research on toatillos, but I thought this is actually a pretty good example.

Why don't we use stochastic multi-agent consensus to come up with all of the different ways you can make a sauce using a toatio. Use stochastic

multi-agent consensus to determine all of the different ways that you could make a nice tasting sauce using tomatillos. I want every agent to come

tomatillos. I want every agent to come up with at least 10 independent responses then have them synthesized and turned into just a giant list of all of the possible things you could do. So

what this skill stocastic multi-agent consensus does if I open it up is basically it breaks down a query into n other queries. That's where it says

other queries. That's where it says spawn n agents with the same or a slightly different prompt to independently analyze a problem. Then

aggregate results by consensus which you use for decision-m ranking options strategic analysis or any problem where you want to filter hallucinations and then surface what are called high

variance ideas. So anytime I use the

variance ideas. So anytime I use the word consensus poll agents stochastic consensus spawn n agents so on and so on and so forth it'll go and it'll it'll do the thing. So, just scrolling down here,

the thing. So, just scrolling down here, you can see that it read through the skill and it spawned 10 agents all looking at slightly different angles here. And you know, these are very

here. And you know, these are very similar prompts. Brainstorm all the

similar prompts. Brainstorm all the different ways you can make a nice tasting sauce using tomatillos. This

one's here. Brainstorm all the different ways you can make a nice tasting sauce using tomatillos. This one here,

using tomatillos. This one here, brainstorm all the different ways you can make a nice tasting sauce using tomatillos. But the idea is, you know,

tomatillos. But the idea is, you know, one is a conservative tradition-minded chef. The other is an adventurous

chef. The other is an adventurous boundary pushing chef. The other

challenge is conventional wisdom. The

other reasons from first principles and so on and so on and so forth. Now

because you know this is a pretty simple and not very intellectually difficult exercise. All 10 agents have actually

exercise. All 10 agents have actually already already finished and you can see that I was able to scan a massive search space in a very short period of time despite the fact that this problem is pretty simple. So what it's doing is

pretty simple. So what it's doing is similar to what I showed you earlier with those n researchers and then um having some sort of synthesizer model.

What this is now going to do is indeed duplicate the outputs and then give me a list of pretty nuanced answers that realistically scanned most of the search space in a very short period of time.

I'm sure you can imagine you could scale this up if you had like some sort of dedicated infrastructure whether it's a local mod or something like that. You

could theoretically have stuff like this running all the time just ideulating and coming up with new approaches to solve um long-standing problems. And this is actually the exact way that I don't know if you guys have seen uh you know

they're throwing opus now or GPT 4 point or GPT or other models at like these big math questions and asking them to solve them. This is exactly how they're doing

them. This is exactly how they're doing them all under the hood. So as you guys can see we've pulled 10 agents. There

are 119 raw ideas counting for duplication. There are 52 in total that

duplication. There are 52 in total that are new. So we're going to do is we're

are new. So we're going to do is we're actually going to look at this consensus report and then ultimately its answers.

All right we have the consensus report.

Opening it up here. You can see there are 52 total. The first is salsa vera.

The next is tomatia avocado crema. The

third is aguile verde and so on and so on and so forth. I could work my way all the way down here. A bunch of different types. You know, I could have had one

types. You know, I could have had one agent come up with all of these. I

could. Okay. But um the probability that I would have been able to one come up with like a highly differentiated list like this and two scan as much of that search space in the same amount of time is very low. And so I'm sure you can imagine you can apply this to any business problem that you guys are

currently having to just come up with a bunch of lowhanging fruit solutions as well as like unique and and and outlier solutions as well. We even have like Indian influence sauces, Persian influence sauces, Caribbean Latin fusion

sauces and so on and so forth. An

outlier that I'm definitely not trying anytime soon is total blanc, which is French butter sauce using tomatilla's pectin as a natural emulsifier. No thank

you. So what would debate look like?

Debate is more or less the exact same idea. Um, in my case, I've just turned

idea. Um, in my case, I've just turned this into a skill. It's called model- chat. Basically, what occurs is we spawn

chat. Basically, what occurs is we spawn five claw instances in a shared conversation room where they debate, disagree, and converge on solutions. We

use roundroin turns with parallel execution within each round that triggers on terms like chat and so on and so forth. So, I'm going to do here is I'll say, great, this looks awesome.

I'd like you to rerun this, but with model- chat, make sure at least 10 agents are having conversations about this. And then uh you know if any of the

this. And then uh you know if any of the sauces just sound insane or terrible or crazy then obviously have them discuss that as well. Just like our stochastic multi- aent consensus took advantage of

like time basically and traded it off against total tokens. We're doing the same thing. So we're going to do is

same thing. So we're going to do is we're going to start by extracting from the user's me from the user's message the topic or problem the mode the number of agents and the number of rounds. It's

then going to run an actual script that I've set up here that automates the process of like having each of the agents look at each of the other agents responses before finally doing a synthesis. Speaking of which, I just

synthesis. Speaking of which, I just read through a couple of those and I'm actually just going to make some toss right now. So, I'll be right back. Okay,

right now. So, I'll be right back. Okay,

so me looking at the conversation over here, just asking it to like give it to me. Actually see that all the agents are

me. Actually see that all the agents are doing some thinking and the contrarian is starting with 15 ideas. It'll

immediately challenge the ideas that deserve it. They're now listing their

deserve it. They're now listing their disagreements. So, does this actually

disagreements. So, does this actually work? Is it a structurally sound

work? Is it a structurally sound technique or a restaurant stunt with an unacceptable failure rate? Is tamarind

redundant or complimentary? You know,

does tomato chocolate belong on the list? If so, where should mole verde be

list? If so, where should mole verde be in tier one or tier 2? So, they're

having discussions on an ongoing basis, which is um always really fun to watch that we can monitor and then obviously synthesize into an answer. Okay. And

then finally, we have the tomatio synthesis over here. Um, toatio's pectin content is underappreciated.

Tomatio husk tea unfortunately is not cool. The foundational tier is subtle

cool. The foundational tier is subtle and non-negotiable. And I actually look

and non-negotiable. And I actually look at the foundational tier. You can see we actually have a bunch of different highly recommended sauces. Again, some

of these are very like nuance. Lacto

fermented tomato hot sauce, tucker squeeze bottle drizzle, enchilada sauce, tomatio agua chile, and so on and so on and so forth. And you know I this is just a really shitty example but hopefully you guys understand that you

can take this to more or less anything that you want. Um whether it's you know designing a new computer programming approach to a particular problem whether it's choosing the right framework to approach or tackle a task with or

something else. Okay so I just did all

something else. Okay so I just did all of the previous example using a pretty straightforward um you know like dietary or chef sort of example. But now I want to use this on an actual app and really

just have all of these different models discussing things and doing so in a very short period of time. What I have here is I have like an algorithmic art example and this is actually something that Claude developed. It's part of their algorithmic art base skill which I

think is actually like applied or supplied I should say in um the anthropic anthropic skill directory. Uh

you can adjust some things like the the stroke weight and like the damping and so on and so forth and actually have it like come out with very unique designs.

So you can then just like save the image and then boom, now you have like a cool like wallpaper or something like that.

It's kind of neat. U but I want to I want to improve this as much as humanly possible. And the reason why I'm doing

possible. And the reason why I'm doing it like this is because I also want to show you guys how to apply the same approaches that I just showed you to agent teams instead, which are obviously a much um more streamlined version of doing the exact same things that I've

done so far. It's just streamlined in the sense that, you know, it is built out of the box to do everything, but it does so at the cost of some tokens. So,

I'm just going to go back over here and then I'm just going to look at synaptic drift.html within art. I just need to make sure to, you know, remember what folder that's in. Then I'm just going to open up another cloud instance. Now, a

lot of the advanced stuff as we know is actually only available in the terminal.

And I think agent teams are a lot better managed in the terminal. So, I'm just going to open up the terminal. I'm going

to full screen it here as well. Let me

delete that and then go full screen. And

uh, you know, I could do it in here. I

could also do it in like ghost tty uh which is probably my favorite like terminal to use within cloud. But for

now um you know I I have my agent teams idea. So I'm I'm basically now going to

idea. So I'm I'm basically now going to say hey I'd like you to optimize synaptic-drift.html and turn it into a full-fledged application. However, rather than just

application. However, rather than just do this all naively yourself, I want you to take advantage of stochastic multi-agent consensus. I want you to

multi-agent consensus. I want you to take that skill and then apply it using the agent teams feature. you'll

orchestrate a team of agents that do all of this stuff. Don't just use what's in the skill itself because that would be running it a little too simply. I

actually want you to to read through the whole skill and then use that to spawn agent teams. Okay, so it's going to start by reading the skill def and then the HTML file itself which is found in ART. It's then going to go and read

ART. It's then going to go and read through the agent teams tooling and everything that it needs in order to basically spin this up easy. So it'll

start by creating a team for the consensus workflow, spawning 10 analyst agents with different framings, then finally aggregating their recommendations and implementing the winning features. So the very first

winning features. So the very first thing it's going to do is spawn the analyst agents. And you can see now the

analyst agents. And you can see now the UX has changed a little bit. You see

down at the bottom where I have these different analysts that are running. So

if I go shift down, I can actually see all of their different stochastic multi- aent um kind of consensus threads. So

now they're all spawning and running in parallel, which is pretty neat. At any

point in time, I could press enter to view sort of the conversations and what they're doing. And I should say I I

they're doing. And I should say I I should note that um you know the stoastic multi- aent consensus applied to agent teams is basically just the debate built in because the agents actually can can communicate. Um the

team lead can also orchestrate that communication too. So you know it's not

communication too. So you know it's not actually um really independent which is neat. Uh you could spawn all of these in

neat. Uh you could spawn all of these in like different windows if you want to.

You can also just continuously hold shift and then go up and down to select.

What I'm doing is I'm just reading through a bunch of different threads and conversations. And it's clear that they

conversations. And it's clear that they all start by just reading through synaptic-drift. HTML. Um, finally, uh,

synaptic-drift. HTML. Um, finally, uh, you know, this is now returning a bunch of agent conclusions back. And more

importantly, it's also coming up with consensus, which is nice. All right.

What it's going to do is just take all these now and close them down while also um looking at the consensus, the bugs, the divergence, and then ultimately the outliers. So, the consensus

outliers. So, the consensus recommendation of our next feature is high-res exports, a preset system, URL state, and sharable links. The bugs are the race condition and regenerate

download saves mid-render, PG height not checked. Uh, divergence is one or sorry,

checked. Uh, divergence is one or sorry, six out of 10 agents suggest debounce regeneration versus a live preview. Then

the outliers have also come in. Mobile

responsive layout, live animation mode, seed history, web worker offload, mouse attractor, repeller, and kill sidebar overlay. So, this is all really cool.

overlay. So, this is all really cool.

You can see now it's coming up. It's

actually just deleting my old tomato stuff. I guess we happen to be using the

stuff. I guess we happen to be using the same file or something. Instead, it's

coming up with this giant list of different conditions and features that it can build. Okay, now it's actually shutting down all the agents implementing it. Um, just because I want

implementing it. Um, just because I want this to do so faster, I'll say use agent teams to do the implementation. And you

can see it's actually gone through here and then um added all of what we needed in order to implement the tool, the features that the model suggested. In

addition, it's also spawning review agents to see if we can improve the quality of the generated code, spot problems, and stuff like that. So, if I go shift down, I could see all those.

So, we have now reviewer bugs, reviewer features. Let's just see what reviewer

features. Let's just see what reviewer bug says. Okay, it's now sending the

bug says. Okay, it's now sending the review to the team lead. So, it's

communicating that back, taking a look at what the reviewer is saying now that it's opening it up. You can see we now have a ton more features. We have

different presets. So, ocean drift, ember storm, ink wash, neon plasma, neural fire. We have the ability to

neural fire. We have the ability to modify colors. We have 1x, 2x, and then

modify colors. We have 1x, 2x, and then 4x downloads, which I don't think you guys could see because my face is in the way. But if you just um look down over

way. But if you just um look down over here, you'll see that there's significantly more functionality. Um we

can download a PNG at 4x as well. We

have simple like space bars to reload and change things. We could change the the speed and so on and so forth. Um

ultimately, this is just a better app, right? And so we did this by basically

right? And so we did this by basically just exchanging a couple of my dollars and tokens for, you know, a bunch of different agents, all coming up with their own ideas and then ultimately executing on them. Hopefully you guys could see you can apply the same

approach to more or less anything. There

are obviously optimal token trade-offs, but um when you spawn the sub aents that are a little bit less capable, um like sonnet versus opus, typically that math works out and you end up being able to do just as much if not more in a shorter

amount of time for less money. All

right. And then finally, pipeline, which is sequential handoff between specialists. I mean, I just showed you

specialists. I mean, I just showed you guys a little bit of that earlier with um agent team sort of spawning review bugs and stuff like that, but basically that's more or less it. You have task A done by some agent which is specialized

for task A. You then pass that off to agent B which is specialized for task B and then ultimately agent C which is specialized for for task C. And so I mean like you could just have A do all

three of these things. The issue with having a do all three of these things though is one um if you guys remember earlier, good lord, this is getting a little messy. You know, we're no longer

little messy. You know, we're no longer in the zone of good because odds are it has like tons of context from literally everything that it's done before. So,

you know, like it would have started off over here and that would have been okay, but now it's over here and then now it's over here. And then two, like you

over here. And then two, like you sometimes fast and and and good development is often at odds with like really in-depth testing, let's say. And so if you think

about it conceptually like a a developer agent will have different incentives than like a testing agent. The developer

agent will be incentivized to like build things that works really quickly using you know whatever is available to it.

Whereas the testing agent will be incentivized to try and like spot all of the issues. And so like building things

the issues. And so like building things new is sort of at odds with like repairing the old things. And in that way, if you try and have one agent do everything, the probability that it will be able to do it as well as possible

versus if you just spun up specialized agents that were like highly tuned for that thing, assuming their intelligences are all held equal here, I'm talking about like non-stop opus calls, not opus sauna and so on and so forth, is is

definitely different. So my

definitely different. So my recommendation would be, you know, like what I would do is I'd have like a dev agent for A like I just did. Then I'd

have some form of like bug fix for B.

then it' have some sort of like test maybe bug in Q&A. And I'm not going to redo that example because one, I want to be respectful of your time, but two, I just showed you that exactly with the agent teams example. Um I guess the meta

example here is you combine all three of these and then um just have all of them interacting constantly for best results.

Like you have, you know, debate and stochastic consensus to come up with like the best ways to, you know, improve on a product. Then maybe you do some fan out, fan in, and researchers to go look at like different APIs and different

design patterns that you could use to fulfill that before finally handing that off to some sort of like bug reviewer, QA or tester. Uh but hopefully it's clear that yeah, all of these things do not exist in isolation. Uh they all

exist together.

Next, let's talk context management, which put really simply is just all of the files and folders and organizational methods that you put into a workspace to

allow claude code to effectively manage whatever work you have. Now, I'm seeing a lot of people try and delegate work right now, sort of like human companies

do with CEOs, you know, CTOs, CMOs, uh, claude coder agents and software engineers and stuff like that. And I

think initially when I looked at this, this one's called Paperclip specifically. It's got a pretty

specifically. It's got a pretty interesting repo that you can check out right over here. It's all about running your whole business with our agent team.

I think initially it's really easy to look at these and be like, "Okay, this is stupid." You know, I mean, that's

is stupid." You know, I mean, that's that's what I did. I uh made a couple of videos and I talked at Nauseium with a couple of my friends, and I was like, "This is dumb. Why would we try and fit agents, which think very differently than human beings, into the exact same

organizational hierarchies we've been using for the last 150 years? Just

doesn't make sense. Human brains are different than agent brains." The latter is obviously a lot more spiky and good at certain things while sucking at others. But anyway, so as as quick as I

others. But anyway, so as as quick as I was to initially dismiss this idea, what I've come to realize is that sub aents as these org charts and skill.md files,

which as we know are self-contained SOPs that exist within a file capitalized skill and MD, these are actually just two flavors of the exact same thing.

What they are is they're just different ways of organizing your markdown files.

And so just like in my case, we ran a model- chat skill earlier for me to show you guys how, you know, models debated and stuff like that. Okay, we had a skill.md within it that stored a bunch

skill.md within it that stored a bunch of information that was like hyper specific to that skill. We had model- chat.py, which was a tool that this

chat.py, which was a tool that this skill could use. So too are our sub aents organized in basically the same way. I guess what I'm trying to say is

way. I guess what I'm trying to say is like, okay, if we take sub agents on the left hand side, what was one of the main reasons why we like using sub aents?

Okay. It's because it's a clear or fresh context window, right? All right.

Awesome. So, that's one. How about the fact that it's specialized? Awesome.

That's another How about the fact that the sub agent is probably more reliable at sub agent specific tasks, right?

That's another one. And then how about the fact that it's written in, you know, markdown format with tool use? Well,

fantastic. That's another one. If we

look at like how that equates to skills, honestly, the only thing that's missing is the fact that the context window is not entirely clear or fresh. But you

know what you can do with these is because skills are so efficiently written, um they're basically a form of compression that pushes you towards a shorter context window anyway. So

basically the only real difference if I'm honest, and keep in mind like when you instantiate a sub agent, you're giving it, you know, a little prompt, right? kind of similar to the way skill

right? kind of similar to the way skill works. The only real difference between

works. The only real difference between the two is just the amount of context in the sub agent versus the skill. But I

want you guys to know that sub agents are honestly basically skills and skills are basically sub agents. They're just

slightly different ways of storing information. So why am I bringing this

information. So why am I bringing this up? Um just because I'm coming to

up? Um just because I'm coming to realize that the two are the two are very similar and they're soon, I'm sure, in the future going to be like merged even more so into a similar concept. Um

all these two point at are just different ways of organizing your context and basically organizing the way that you you get tasks done. one

delegates via CEO to CTO, CMO, CTO, all all the stuff, right? I don't know why there's two CTOs now that I'm looking at. That's kind of weird. Whereas the

at. That's kind of weird. Whereas the

other one stores things in a skilled dynam like just going back to anti-gravity right over here, right?

Like I could go to this skills folder and then I could go and find that model- chat. And I mean like the way that this

chat. And I mean like the way that this is written is basically the exact same um you know schema basically that a sub agent is written in. You know, if I go over here to Claude Code's actual documentation page on sub agents. I

mean, you actually have basically the exact same structure. See how here it says the title, code reviewer, description, prompt, tools, model. Okay,

you see over here, what do we have? We

have the name. Okay, we have a description and we also have the tools.

I mean, like the model is sort of baked in here because it's in our main thread.

It's going to be open 4.6. But hopefully

you guys are seeing like skills and subent actually similar. They're just

slightly different ways of organizing information. So, I'm making this big

information. So, I'm making this big point because I think that's important to realize as we continue moving forward with cloud code and other tools and we get more and more advanced with them.

Um, the shapes of how we're transmitting information to our models will likely end up being quite quite similar.

Whether one person decides to use a paperclip style big fleet of agents that does XYZ, which maybe, you know, just a couple months ago I might have looked at, scoffed and said like, well, that doesn't do anything. Um, you know, so

too are skills basically that the same thing. So the model intelligence is

thing. So the model intelligence is growing more and more and more capable within the harness which is what allows the the development of these really interesting organizational hierarchies.

So what are some of these organizational hierarchies? Well, I've already shown

hierarchies? Well, I've already shown you paperclip here. And the way that paperclip works or rather it's supposed to work is this is like a dashboard which um somebody developed that you know I think just praise off of maybe

praise isn't the right word but it uses people's misunderstandings of how agents work. Um it equates them in

work. Um it equates them in anthropomorphizes then makes them seem really similar to humans and then it puts this in front of you so that you feel like you're running a whole team.

And so in this way clearly it's broken down by role, right? Whereas the average skill is not broken down by role. The

average skill is broken down by function. Um also skills typically don't

function. Um also skills typically don't delegate to other skills. That's really

the main difference. But paperclip isn't the only one that's like this. Here's

another good example. Company helm. This

one over here is a very same similar sort of idea where you basically have an AI studio. Within the AI studio, you

AI studio. Within the AI studio, you define a bunch of different roles for your agents and so on and so forth. And

then that's ultimately what allows you to manage your projects. This instead of being left to right is obviously, you know, organized a little bit differently of a front-end builder, a QA runner and so on and so forth. How about open goat

which is the AI autonomous organization of openclaw agents. Again, you know, it's doing this with like CEO, head of sales, customer support-based organization, which I don't really believe is ideal. I don't really think you should have this level of direct

reports. I mean, like, think about it.

reports. I mean, like, think about it.

Why? All of these could just be Opus 4.6. They could be way smarter. They

4.6. They could be way smarter. They

could pull from some sort of shared context pool. And I think you really

context pool. And I think you really wouldn't leave that much out. uh but it is an interesting approach. This one

over here is called the system which is uh obviously using some sort of AI generated diagram here but it's 26 specialized a agents which we've thought about that do architecture, design,

product development, release, operations and so on and so forth. This one over here I think is called gas town which is basically where you have a mayor which is your AI coordinator, a bunch of different crew members and then also

pole cats or worker agents. You guys may have heard of crew AI. It's the same sort of idea. It's a fast and flexible multi- aent framework which supposedly delegates things, okay, where you have

crews that have different agents within them each with their own segregated tool calling and stuff like that and you know it's another way of organizing information. This one over here

information. This one over here swarmclaw is CEO based developer researcher and again you have delegation. So all these are different

delegation. So all these are different attempts by different groups of people to try and determine like the best organizational hierarchy of agents and I think pretty much all of them suck right

now to be clear. Um, but I just want you guys to know and level with me that these are just different ways of organizing information. Just like you

organizing information. Just like you have skills and skills are highly, you know, specific to you, it's just a collection of markdown files with names, description, allowed tools, and then like SOPs. Sub agents are basically the

like SOPs. Sub agents are basically the exact same thing. So, as the field continues to mature and there are better and more novel context management strategies out there, uh, multi-agent orchestrators essentially, uh, you know,

these things will grow more and more differentiated. Now in terms of what I

differentiated. Now in terms of what I would consider to be actually valuable delegation. Okay, there are two main

delegation. Okay, there are two main design patterns. The first is the parent

design patterns. The first is the parent researcher and QA system where essentially you have a parent model which is usually your smart one. So this

would probably be like your opus model that communicates with researchers plural. This will be dumber models like

plural. This will be dumber models like sonnet that typically do research better and more economically. And then some QA agents like opus which are basically just tuned to QA and nothing else. And

the idea here is this is a good balance between like those super bloated arg charts that we saw earlier while still allowing each type of agent to do the things that it is inherently better than human beings at. The parent agent is obviously orchestrator. Anything that is

obviously orchestrator. Anything that is up at the top you can always consider to be an orchestrator. And then what you have here is you have multiple you know sonnet researchers. And this takes

sonnet researchers. And this takes advantage of that fan out idea. Okay.

Where when opus needs something it doesn't just do the research itself because that'll pollute its context window. It goes does a bunch of

window. It goes does a bunch of research, fits in quadrillions of tokens into the context windows of these sonnet agents, then takes summaries of that and then uses that to make decisions. And

then basically the way that it works is, and I'm just going to sort of draw like the the logic flow. Opus will decide to do something. It delegates down here.

do something. It delegates down here.

Okay, that information comes back to Opus. Opus then builds something kind of

Opus. Opus then builds something kind of on its own. After it's done building something, it goes and gives the uh product of its building over to the Q&A agent. The Q&A agent returns some

agent. The Q&A agent returns some changes that it suggests it makes. Opus

then goes through makes those changes again gives it to the QA agent. QA agent

returns. This loop continues until basically everything is done. If there's

research that's necessary, it'll go down do some research here and they continue develop. And then finally you have your

develop. And then finally you have your whatever the the final product is that you're building whether it's like a business system, a development system or whatever. In this way, you're maximizing

whatever. In this way, you're maximizing the incentives of each individual agent while also allowing uh I want to say like the leanest possible setup that still recognizes that different things are better at different types of agents are better at different types of tasks.

You know, we could make this bigger of course. We could have like a testing

course. We could have like a testing agent. We could have a design agent. We

agent. We could have a design agent. We

could have a development agent. We could

have a backend agent. But, you know, the more complicated you get with the stuff again as mentioned like typically the worse that it gets. If you want to go even leaner than that, then the second system is developer and QA where you literally just have a smart parent.

Okay. And then you have a smart Q&A and then you just go back and forth between the two. And what happens is every time

the two. And what happens is every time that you want to test something, you sort of have like a claw tomd or or or just like a prompt that's baked into your parent that legitimately says, hey, after you're done every development, run

it through a new QA agent. The idea here is the QA has like literally no prompt other than, you know, you're a QA agent with no context. Read this code and

apply the following whatever like design principles to it. And basically what occurs is this QA agent since it doesn't know what the heck the project is is on.

Um it's not going to be biased like the parent agent will be in the development of the feature. The parent agent will have feedback from the QA agent and so it'll be able to incorporate into its own thread and take advantage of all of the pre-existing list of failures and

successes and things it's tried and so on and so forth, but the Q&A agent is like new and it's new spawned every time. And so typically what'll work,

time. And so typically what'll work, what the way it'll work is the parent agent will go and it'll develop a feature and then at the end of the development there'll be um something in the cloud NMD or system prompt that says okay now that you're done make sure to check it with the QA agent. So we'll

spawn a QA agent. The QA agent will then give feedback. Okay. The parent will

give feedback. Okay. The parent will design feedback. The parent will design

design feedback. The parent will design feedback. The parent will design no

feedback. The parent will design no feedback because it's now good. Parents

done. And so now we have the final product. Um obviously you know because

product. Um obviously you know because it has to do its own research and stuff like that. I personally think this is

like that. I personally think this is not as ideal but it is even simpler and um keep in mind that there is always like a time cost every time you spin up a sub agent. It's a fixed time cost but uh there's also some compound

probabilities you're multiplying because you know you are having an agent delegate something to another agent basically there's no human in the loop.

The more independent steps that an agent has to do without a human being in the loop the higher the probability that it will diverge from its sort of intended um goal or intended task. So when your

parent agent in the previous example generates you know a bunch of research queries to the you know sonnet sub aents and goes and does them there's no guarantee that the research the sonnet sub aents are doing is actually 100%

faithful to what your initial query was every step along the chain that is further from you typically the results and the quality is a little bit more diluted. So yeah I mean like it'll be

diluted. So yeah I mean like it'll be it'll be either one of these for me developer Q&A or some sort of parent researcher Q&A. That would basically be

researcher Q&A. That would basically be it though. Um, personally, I find right

it though. Um, personally, I find right now with all the org charts and stuff like that, we're just we're just going a little bit too much. We definitely don't need uh I don't know 700 layers of CEOs and customer success agents and lead

engineer agents and stuff like that.

Now, I want to talk about something that's gotten a lot of attention recently and does genuinely have the potential to significantly improve many business and programming functions. It's

called auto research. Essentially, what

I have in front of me is I have a research lab that I've spun up to improve the load speed of one of my websites. Now, the way that you gauge

websites. Now, the way that you gauge whether or not a website is loading quickly, is based off of uh three main metrics. The first is called LCP, least

metrics. The first is called LCP, least contentful paint. FCP, first contentful

contentful paint. FCP, first contentful paint. Then there's TBT. I don't know

paint. Then there's TBT. I don't know what that stands for. And then finally, there's performance score. And so this is a standardized assessment called uh the Google Lighthouse score that you've probably seen before. And basically, it

measures like, you know, when I type in 1 second copier and I press the enter button, how fast does literally everything on the page load? It also

checks for very minor things like, you know, when I when I load this website, um, does the content on the page shift around? So, my website here,

around? So, my website here, leftclick.ai, is just one of many ones that I own. And essentially, it's just a little bit too slow right now. And it's

slow for a variety of reasons. We got

this cool like glass isomorphism animation on the page. You know, there's like stuff moving around and lots of images of my team and and so on and so forth. So, um, you know what I've

forth. So, um, you know what I've decided to do is I've decided to basically take all of the load off of me to make this website faster and then just give it all to that fleet of AI agents to do so instead. Auto research

is basically perfect for use cases just like this where we have a very defined goal in my case to decrease or increase a couple of metrics. A very defined change method which uh is how you actually make the impact. So in my case

just modifying the website code and then a very standardized assessment which in my case is that lighthouse score. In

case you have never seen this before, basically Andre Karpathy, who is the one of the founding members of OpenAI, and then he also was the head of AI at Tesla for quite a while. Um, you know, he he just was doing a bunch of research on

his own for uh one of the models that he was running and he's just like, you know, do I have to do this stuff anymore? I feel like I'm at the point

anymore? I feel like I'm at the point where I could have AI actually run most of my research for me. Let me make a a quick hypothesis. if I just gave all of

quick hypothesis. if I just gave all of my changes to AI, would it be able to do the same thing that I do while I slept such that when I wake up, I'll have like a big list of improvements? And turns

out, you know, he he can. And it's not that AI agents are like better than human beings at determining these research changes, but it's actually quite standardized to do conceptually.

You're basically just like looking over a bunch of different possible things you could do, making one tiny change, and then just evaluating, hey, did that actually improve my score? Did that make things better? If so, I keep it, and I

things better? If so, I keep it, and I just move on to the next thing. And I go over and over and over and over and over again until finally, you know, you make it hundreds of iterations later. So, you

know, in my case, like we uh I just reran the test because I want to start this from scratch to show you guys how this works. Well, it's actually fairly

this works. Well, it's actually fairly straightforward. And what I'll do next

straightforward. And what I'll do next is I'll run you guys through the original way that auto research works and then how to download the repo and then set it up on your end for whatever the use cases that you you particularly have. So, it all started when Andre

have. So, it all started when Andre Carpathy uh who was a researcher, he used to work at um Tesla. I think he was the head of AI at Tesla and then he was also one of the founding members of OpenAI asked himself you know uh all

this work that I'm doing all this research stuff that I'm doing is there any way to automate it and he found that if he just broke down step by step what it is that he actually had to do it more or less always went like this you know

he he just had a little loop set up where you know he would uh make a hypothesis and the hypothesis would be like hey if I change x y and z I think

my system will run faster then he'd actually execute the change. So he'd

actually go and adjust XYZ. Then finally

he'd assess. And then if the assessment was good, aka it made an improvement, then he would just go back to this and then make another one. Then if the assessment was bad, aka it failed, then he would just get rid of it and then not change anything. And then, you know,

change anything. And then, you know, kind of start from scratch. And all

along the way, okay, what he would do is he would update this little document, which um you and I could just call like a research log. And you know, basically the first change would be like, oh, you know, this worked. It was great. Second

change, oh no, it didn't work. Then

here's why. Third change, okay, it worked and it was great. And eventually

over time, you end up with this massive massive log of all the different possible things you could do to an AI uh to whatever your task is and all the things that you have tried in the past that doesn't really change anything.

Okay, so this is made of three files.

There's a prepare.py, which in our case is kind of pointless. Then there's a train.py and then a program.nd. The

train.py and then a program.nd. The

reason why the prepare.py is pointless is because it's just about like AI uh research specifically. It's like fixed

research specifically. It's like fixed constants, downloading the training data, training a BP, bite paroding, tokenizer, and a bunch of other stuff that just isn't really relevant. The

stuff for us though is obviously we want to train this and and improve this um improve our programs. We want to improve our websites. We want to improve some of

our websites. We want to improve some of our business functions. These two files here, train.py and program.py basically

here, train.py and program.py basically

underscore how the entire thing works.

Okay, so the super important one here is called program.md. What you do is you

called program.md. What you do is you basically just tell it what you want it to do. So, for instance, hey, um, here's

to do. So, for instance, hey, um, here's what you can do as an AI agent. Modify

this file. Okay, every time you do, I want you to print a summary of the scores and then log it to this file. And

that's literally it. It just goes through that loop over and over and over and over and over again. Then the actual train.py in this case, it's just like

train.py in this case, it's just like the website itself. Uh, sorry, the um the AI model um um setup itself with all the layers and stuff like that. In our

case, right, the example that I was just showing you a moment ago, that's just my website basically. And so basically it

website basically. And so basically it just like it has a loop set up in its prompt. You tell it what you can change

prompt. You tell it what you can change and what you can't change. You give it some like sort of log file that it dumps everything to. So you have like a big

everything to. So you have like a big list of changes in progress. And then

you know after that you are you're basically done honestly. You just fire it off and let it go. And when you do you know you can make some pretty cool changes. So you know I just reran the

changes. So you know I just reran the thing and uh we're already seeing some pretty substantial improvements. Uh not

all these improvements are the same ones I was showing you guys before. It's this

research lab. just I'm just resetting it over and over and over again to see if I could find anything more interesting.

Okay, so hopefully that's pretty straightforward. Simplest and easiest

straightforward. Simplest and easiest way to do that um is just head over to github.com/carpy/auto

github.com/carpy/auto research. And then what you do is you

research. And then what you do is you just copy this link. Okay, so how do we actually do this? Just open up anti-gravity. I'll click open folder. Uh

anti-gravity. I'll click open folder. Uh

I'll just make a new one called autoresearch test. Okay. And then I'm

autoresearch test. Okay. And then I'm going to open and I'm going to click on cloud code.

Zoom way in so you guys could see and actually just paste this and say clone this into our current folder. um auto

research test just so that it doesn't do this in my um kind of my root folder which it's done a couple times. All

right. So, it's going to start saying, "Hey, I want you to clone this." So,

it's going to give it a quick try and it's just going to dump all the files in here. So, now we basically have the

here. So, now we basically have the exact same thing we had before, right?

We have the program.py, prepare.py,

train.py, the progress, and you know, even like a read me that explains everything. So, now all we need to do if

everything. So, now all we need to do if we want to like I don't know, train this on a site or something is um well, first of all, why don't we just make a quick site? Hey, build me a simple onepage

site? Hey, build me a simple onepage portfolio site for Nyx. And obviously,

it doesn't know what my name is. So,

it's now going to build a simple onepage portfolio site. I just wanted to do it

portfolio site. I just wanted to do it here. So, it's going to do this inside

here. So, it's going to do this inside of this file. First, it's going to ask me some questions.

Just uh add demo information for everything. And my goal is I just want

everything. And my goal is I just want to build a brief little website here for us. And then I just want to run auto

us. And then I just want to run auto research on it to show you guys how easy it is to optimize things. Now, in our case, we're going to do website. There

are a million different things you can apply auto research to. I'll run you guys through a quick and easy framework, but first I'm just going to show you guys what you need in order to actually set this up. All right. Now, what I'm going to say is, excellent. I'd like you

to create a dashboard for auto research and then set up the auto research framework to optimize the Google Lighthouse page score for index.html. I

want you to run this on a local loop and basically just make index.html HTML as fast as possible across LCP, FCP, TBT, and then also performance score. Then

give me some sort of live dashboard view so I could watch it um actually work in reality.

Cool. And then I'm just going to press enter. And basically what it's going to

enter. And basically what it's going to do is just going to read through all these files right over here. And then

it's going to use all of the information here in order to set up the dashboard for me. And while it's working, I just

for me. And while it's working, I just wanted to explain a little bit about where we are and where we're going. The

initial stage of AI encoding was sort of like vibe coding. This is like 2024 2025 stuff where a human being aka us prompts then an AI writes some code and then a human being reviews. So in this way our

roles were basically relegated to writing. We would write the prompts we

writing. We would write the prompts we would make minor changes where necessary and in that way we like build a website or something. Well nowadays most of us

or something. Well nowadays most of us do agendic engineering and this is sort of what the advanced part of our course um deals with. So this is where instead of just dealing with one AI, we're actually orchestrating agents and these agents are doing multiple things for us

all the time and then basically like returning the results so that we could see and then like assess and make slight little recommended uh changes. So in

this way our role is more of a director but auto research represents sort of the the next jump from agentic engineering to actually full independent research where now all we do is we're no longer like actually even directing the AI

agents. We we let them handle their own

agents. We we let them handle their own direction. What we do is we just say,

direction. What we do is we just say, "Hey, I have a goal and I'd like you to achieve this goal. Here's how you can modify X, Y, and Z and here's an assessment." And so in this way, we set

assessment." And so in this way, we set the direction. The agent just runs

the direction. The agent just runs completely autonomously. And then what

completely autonomously. And then what we are is we're basically like a we're like a we're like a principal investigator, like a researcher at a lab somewhere. We just say, "Hey, you know,

somewhere. We just say, "Hey, you know, I want you to do XYZ." And then we just go farming it to a bunch of uh, you know, research assistant RA monkeys to go and and do the experiments and so on and so forth for us. And so this is

along a spectrum of decreasing human involvement. And I'm not really sure

involvement. And I'm not really sure what comes next after independent research, but I do not imagine it will require human beings in the loop essentially at all. This is the same sort of thing that big research labs right now are currently using to

optimize their setup. So Anthropic is almost certainly doing this all day long for cloud code to make things faster, to make things more performant. Um, you

know, Open AAI is probably doing this behind the scenes to make codecs not only better, but even like adjust the architecture of the AI models and so on and so forth. They're probably doing it across all their web properties, right?

Anyone that's really worth any salt at this point. It's probably been doing

this point. It's probably been doing something like what I'm showing you guys with auto research for at least a little while. It's just auto research is

while. It's just auto research is Carpathy's way to democratize that and then allow people um, you know, to to do this even with like paid providers like Anthropics Cloud. Okay. So, if I go back

Anthropics Cloud. Okay. So, if I go back here, you can see this is actually set up the auto research loop and it's actually doing the research um, which is not essentially what I wanted to do. I

want to actually see the dashboard. So,

what it'll say is show me the dashboard because I I actually want to like watch it work live. And then it's just paused the optimization loop. Now, it's going to show me set dashboard. It's restarted

that. And then um I guess it's going to actually show it to me now in a second.

Cool. We have it right here. Awesome.

So, here is our dashboard and we are running multiple experiments. Obviously,

this looks a little bit different from the dashboard I showed you guys earlier for my leftclick auto research, but that's okay. I don't want this to look

that's okay. I don't want this to look the same. And I want to show you guys

the same. And I want to show you guys you can apply this to whatever you want.

Um, our very first experiment had an FCP of 464,752 and a size of 12.9. What we ended up doing is we ended up minifying the CSS, making a bunch of changes to the code basically, and it took it from 12.9 down

to 10, which uh technically makes our website even faster, but in reality doesn't actually influence things because our scores are basically the same, at least speedwise. Okay, so this

is just going to continue um operating.

Just say continue. Now, in my case, what this is doing is it's currently occupying the main thread, right? So,

this is why it's going to be writing and making changes and stuff like that. Um,

at any point in time, I could say, "Hey, just go run this in the background." Or,

"Hey, I just want you to run this in a loop using like the Enthropic um uh agent SDK or something like that." I'd

supply my API key and then it would and then it would go. And what it's doing now is it's actually making the changes.

I guess I should probably also like open the website itself. That would probably make more sense. Let me actually take a look at uh what that looks like. Right.

So, here's here's the actual website itself. And you can see that like for

itself. And you can see that like for the most part, you know, it's very basic and simple. But what we're doing is

and simple. But what we're doing is we're just optimizing it. We're making

it faster and faster and faster. This

may break the website in some cases.

Sometimes some minor changes like this do. But as you can see here, we've

do. But as you can see here, we've actually like improved it by a whole whopping 2 milliseconds, right? We made

whatever change we did that made this a little bit slower has now been fixed and we're a little bit faster. Then it's

just keeping each of these. So, you

know, these things will go down very very very uh slightly. They'll increase

very very slightly. But, you know, if you uh let it go for enough loops, then eventually you can get to the point where you're legitimately making pretty large improvements to the least contentful paint. Um, you know, first

contentful paint. Um, you know, first contentful paint and so on and so on and so forth. And just know that we can

so forth. And just know that we can discard any runs that don't actually do anything. So, you know, in my case, my

anything. So, you know, in my case, my um uh like the one requirement I had for my leftclick perf auto research uh run was that you can't visually change the website at all. So, you should take a screenshot and it should be pixel

perfect compared to the initial one, which is why it's like not adjusting the font or whatever. But I can make more or less any other change aside from that.

And it it is doing so, which is pretty neat. Okay, so now you're probably

neat. Okay, so now you're probably wondering, Nick, so how the hell do I actually use auto research in my own business aside from the demo that I just showed you? And like what else could I

showed you? And like what else could I apply it to? And my rule for auto research is that in order for you to meaningfully make any changes, you need to have three things. The first is you

need to have a metric that you want to optimize for. So in my example, what is

optimize for. So in my example, what is the metric that I am optimizing for?

Well, I'm obviously optimizing for my lighthouse score. And so it's a very

lighthouse score. And so it's a very standardized metric. It's really simple

standardized metric. It's really simple and it's very objective. There's no real negotiations about what a lighthouse score is. Google invented it. It is what

score is. Google invented it. It is what it is. That's what I'm looking basically

it is. That's what I'm looking basically to uh to to assess. The second thing that you need is you need a way to change that metric. So you need a way

you can influence an outcome that modifies the metric itself. So if you think about it in terms of lighthouse page score, the direct way to modify your lighthouse score is just to change your website. And the direct way to do

your website. And the direct way to do that is just like alter the code a little bit. So in my case, not only do I

little bit. So in my case, not only do I have the metric, which is a lighthouse score, I have a direct way I can immediately change the metric. And then

the third thing that you need on top of that is not only do you need the metric and then you need a way to change the metric, you also need a way to assess what it is that you just did. And so

because it's kind of like in the name, right? This is sort of a contrived

right? This is sort of a contrived example, but like the lighthouse score has a lighthouse test, and the lighthouse test just tells you what your lighthouse score is. So, I have like the thing I'm trying to improve, which is, you know, all the metrics that I just

showed you guys. I have a way to improve it, which is modifying the website. And

then I have a way to assess that, which is my Lighthouse page score, um, which I can run in a loop basically immediately after the changes. It takes me just a few seconds. And so, those are the three

few seconds. And so, those are the three things that you need. If I were to formalize this, okay, and I will because I just want everybody to know uh, and and be able to visualize it. The three

things you need in order to do auto research, okay, are number one, a metric, number two, a way to influence

or the I don't know change method, let's call it, which allows you to influence the metric. And then three, some sort of

the metric. And then three, some sort of assessment. And with the change method

assessment. And with the change method and the assessment, the most important thing, at least in in my view, is that you can do both of these things pretty fast. Like if your change method takes a

fast. Like if your change method takes a really long time to do, it takes like an hour or whatever and then your assessment takes another hour. If you

think about it, your your experiment will only be able to run as fast as basically once every two hours and that's still like light years ahead of like a you know, a human experimentter.

But if you really want to see like those crazy vertical lines in the graph as things just get better and better and better sort of recursive self-improvement, you know, you need to have a pretty short change method. So

ideally, this would take I don't know, let's say like 30 seconds or so. Why am

I drawing like that? I could just do this. You know, maybe like 30 seconds or

this. You know, maybe like 30 seconds or And ideally, the assessment would also take maybe 30 seconds or so as well because combined what we have here is we have a loop that can run 60 times per hour. Or if you multiply that out,

hour. Or if you multiply that out, what's 24 * 60? Um, a lot. 1,440 times a day. I mean, like if you could run an

day. I mean, like if you could run an experiment 144 times a day, you know, even if like only 2% of these are actually good, that's like I don't know about 30 changes that improve. And if

every change improves things by 1%, what you've just done to be clear is you've gone 1.1 raised to the 30, 1.01 raised to the 30, which is uh 34% improvement

per day, at least in the first day. If

you had, I don't know, let's say 90 of these changes be good, then this math ends up mathing way better for you. It's

2.4x. If you had 180 of these changes, you'd be 6x and so on and so forth. This

is going to go basically as high as you let it. Anyway, so going back to my

let it. Anyway, so going back to my anti-gravity here, um, just seeing a couple of the changes. It looks like the biggest change that it has made that is actually and actively improved things

was this jump between 45 and 627.

So it made some change here, content visibility auto, remove scroll behavior smooth. That actually significantly

smooth. That actually significantly improved the uh the load speed. And so

that's what it did here. And we gone from 646 at the top to a fast contentful paint here of at the lowest 619. It

looks like the least contentful paint did not change at all. meaning that this currently loads in like I think 600 milliseconds or so which is pretty dang good. Now kind of a contrived example

good. Now kind of a contrived example since I just had AI build me the simplest website ever. But you know you can see with a more complex website one that I built for the most part at least initially and then one that AI didn't really have a lot of time to optimize

for and it was a lot more complex those animations and stuff we've actually improved that improved that by 20%. To

give you guys some more context there are some people out there that have applied this to projects that have improved metrics by like 50%. So Toby

Litk pointed this autonomous AI research um system over at By the way, this is the founder of Shopify, right? Big guy

or CEO of Shopify, I should say. He ran

auto research on the entire Shopify liquid codebase. Now that's responsible

liquid codebase. Now that's responsible for like running more or less everything about uh Shopify. Like it's it's their templating liquid syntax language thing.

It's it's a lot of freaking code. And he

found that after running this for however many times, he had 53% faster combined parse plus render time, which is his main metric. 61 fewer 61% fewer object allocations. Another metric. And

object allocations. Another metric. And

things are just freaking printing for him. I mean, you know, what's that like

him. I mean, you know, what's that like twice as fast essentially to think that you could just point this at something and go twice as fast in like 20, I don't know, like 30 runs or something like that is nuts to think about. I don't

know how long this took. Maybe it was like an evening. Maybe he went to bed, woke up the next morning and his freaking whole code library was twice as fast. I I don't know. Um but yeah, I

fast. I I don't know. Um but yeah, I mean like the fact that he he has done this and he can do this is obviously very impressive to anybody that has any sort of software that they want to optimize. So what are like the practical

optimize. So what are like the practical takeaways? Um you can optimize basically

takeaways? Um you can optimize basically anything you want. So in my case, optimizing a website. How about you guys make a SAS app? Well, you can actually optimize that SAS app. You can optimize not only the uh front end of the SAS app, you can optimize the back end. You

can say, "Hey, hey, here's your server.

Here's the whole setup. I want you to make this load as fast as possible. I

want like the request to come in instantly. Do whatever the heck it takes

instantly. Do whatever the heck it takes to do it. Here's a quick little test method. You know, we we time how long it

method. You know, we we time how long it takes for one request to come in when you click a button. You could just tell it that. Even if you just gave it

it that. Even if you just gave it literally the exact transcript that I just gave you a moment ago, it would probably do a pretty good job so long as you have the auto research framework.

You could optimize random tiny things in your business. I mean, there are

your business. I mean, there are probably some like interfaces, random little modules and stuff like that in your company that like, you know, could be way faster and way better. You can

actually optimize that. You could

optimize things like customer support queries. You could like uh I don't know

queries. You could like uh I don't know have like a prompt let's say that like an AI agent uses in order to handle customer support and maybe you're running some big enterprise or maybe you're plugged into a big enterprise and you have the ability to collect this data. You could actually just like test

data. You could actually just like test modifying the prompt and then like waiting I don't know like an hour and then seeing the changes and you know it's an hour which is kind of a loop but it's still 24 changes a day. You could

like meaningfully modify that and move that in the direction um towards your goal. You could do cold email. That's

goal. You could do cold email. That's

personally what I'm using this for. Cold

email is kind of a special case because again you need a fair amount more time, but I'm still capable of doing something like six to 10 tests a day at like over 500 to a,000 emails per test. It's

pretty dang good. You could optimize a bunch of other things as well. You could

optimize like your ad creative. You

could optimize your copy. You could

optimize your conversion rate by making minor changes to a page. You could

really have a agents optimize whatever the heck you want as long as you have the volume of data necessary in order to like construct the test. So hopefully I made it really clear how all this stuff works. So, all you really have to do is

works. So, all you really have to do is just head over to, you know, that Carpathy auto research, that Carpathy auto research. Sorry, not that one um

auto research. Sorry, not that one um library or repo over here, okay? And

then just copy that puppy in, clone it inside of your repo, and then just do away on whatever task you have. The

simplest and easiest one for you guys to see how things work are obviously the website ones. But yeah, just know that

website ones. But yeah, just know that like you can apply this to more or less anything as long as you have those three points that I mentioned. You need a metric to optimize. You need a change method or a way to influence that metric. And then ultimately, you need an

metric. And then ultimately, you need an assessment.

Next, I'd like to talk about automation, specifically automating things on the internet. We're going to start with HTTP

internet. We're going to start with HTTP requests. Then we're going to move up to

requests. Then we're going to move up to browser automation. And then finally,

browser automation. And then finally, we're going to round it off with computer automation. And I'll talk about

computer automation. And I'll talk about a bunch of different platforms you could use and ways to do more or less all of these things. So HTTP requests are

these things. So HTTP requests are probably the simplest and easiest form of, you know, internet automation. And

cloud code does this natively. In case

you guys didn't know, HTTP stands for hypertext transfer protocol. And

essentially, every time I send a request to a website, basically every time I try and load one, what I'm doing is I'm sending a HTTP get request to the server upon which my website is located. And

then my browser will take the response and then mark it up and make it look all pretty. So for instance, let's just like

pretty. So for instance, let's just like rerun that one more time. my browser,

the client, decides it wants to access leftclick.ai on account of I just typed it into my freaking page. The second I press enter, what we're doing is we're actually sending a request over to their server, okay, which is located at some

IP address. And that server is

IP address. And that server is configured to automatically respond to a request of that kind by just dumping the whole website and giving it to you. And

so then my browser takes that whole website and then it like marks it up and now I can see it. Right? Now, you might be wondering what exactly is it marking up? Well, if you view the source of the

up? Well, if you view the source of the website, which is pretty easy to do. You

can go to any website, just rightclick, press view page source, and you'll see all the HTML, you can see that what a website is actually sending and receiving is not like the pretty images and stuff like that. It's it's usually just sending references to those images.

And um this is actually the content of the website. My browser just has

the website. My browser just has mechanisms inside of it that just know how to turn this into that. Okay, so

case in point, um the definitive AI growth partner for fastmoving B2B companies. This didn't just like come

companies. This didn't just like come out of nowhere. It's not like this is like an image. This is actual text on a page, right? If I go the definitive, you

page, right? If I go the definitive, you can see that it's actually being represented on the kind of code of the page that um is being sent from the server every time I make an HTTP get request. The definitive a growth partner

request. The definitive a growth partner for fast moving B2B companies. All

right. So why is this relevant to us?

Well, because the first aspect of any sort of browser automation, doing things on the internet, I should say not um browser automation, but like automating network tasks is this hypertext transfer

protocol. Claude and other AI models now

protocol. Claude and other AI models now have the ability to use web tools to basically make HTTP requests of the kind that I just showed you. And that allows it to do a tremendous number of things.

Not all things, but a tremendous number of things if you know how to use it, right? So, the simplest and easiest way

right? So, the simplest and easiest way for me to demonstrate that is you can actually just like scrape any website you want now with Claude or any other agent. Hopefully, it's pretty clear and

agent. Hopefully, it's pretty clear and obvious how. What we do is we just take

obvious how. What we do is we just take the URL, we go back to our agent, which in my case is this auto research one.

Then I'm just going to say retrieve contents of this, just the text.

What this is going to do next is it's obviously going to send the HTTP request using the web fetch tool over to https leftclick.ai.

leftclick.ai.

And now what will have gotten back, okay, is it will have gotten back exactly what I just showed you a moment ago, okay, which is all of this. Now,

because I said just the text, if I go back here, you can see that it has extracted sort of stripped all of the code here and it's returning basically just the stuff that I could actually

see. So, what did it say? Navigation,

see. So, what did it say? Navigation,

case studies about services, reviews, let's talk. Case studies about services,

let's talk. Case studies about services, reviews, let's talk the definitive a growth planner, fast moving, B2B companies. See that right over here, you

companies. See that right over here, you know, uh worked with anthropic, notion, Wix, Hey Gen, V, Litrix, Durbal, and so on and so on and so forth, right? So I

guess what I'm trying to say is like this is a simple way that I can get data. And so one of the first and most

data. And so one of the first and most elementary uses of you know uh any sort of coding agent is just you can automate website scraping really easily. So I

could give it a simple list of tasks and I could say hey I want you to scrape like 400 different websites. I could

literally just give it a big array top to bottom. It could go and it could do

to bottom. It could go and it could do the scraping. Now the issue is a lot of

the scraping. Now the issue is a lot of the time okay you want to go further than just scraping than just reading a website. What you want to do is you

website. What you want to do is you actually want to dynamically interact with website and change things. So for

instance, let's say what I'm doing is I'm getting a big list of all of the agencies out there, the AI agencies like Leftclick, and I want to send them all messages. Well, you know, I could just

messages. Well, you know, I could just scrape every single website to see if there's an email address, right? But in

my case, maybe there's no email address.

So what do I want to do? I want to take that next step. The way that I do so is usually through some sort of form or whatever. How do I automate the clicking

whatever. How do I automate the clicking of a specific button? It's kind of difficult to do, right? I can't just automate the clicking of a specific button um through an HTTP request

because you know this is something more than HTTP. It's kind of JavaScript. I

than HTTP. It's kind of JavaScript. I

could try and some websites I'll be able to so hacking this. Hey uh extract the cal.com link for me and then open

in Chrome. Now going one step further.

in Chrome. Now going one step further.

Okay, we're going to open this link in Chrome. So, we actually have this link

Chrome. So, we actually have this link available. And there are some services

available. And there are some services out there where you can actually just send an HTTP request to actually like book a meeting on a page. But you might think like in order to do that, I actually have to click on this button and then type this in and then enter a bunch of information and so on and so

forth. Turns out I can actually just use

forth. Turns out I can actually just use HTTP requests. So, I'm just going to say

HTTP requests. So, I'm just going to say book a meeting for 3:30 p.m. tomorrow.

First name test, last name test, email nickattest.com. And without any more

nickattest.com. And without any more information, what it's going to do is it's going to go and it's going to find the API documentation.

It's going to check the availability using the API documentation and then finally it's going to ask to book. So

I'm going to say 3:30 p.m. March 30th.

Then it's going to go and actually do the booking. But do you notice how many

the booking. But do you notice how many issues there are and errors there are with this? This obviously isn't perfect.

with this? This obviously isn't perfect.

Now I could theoretically figure out the exact schema and format that I need to use in order to send requests like this every single time that I try and book like a cal.com, but the reality is like not everybody's going to have a cal.com.

What I'm doing here is I'm building a very particular solution that solves my one particular problem, the HTTP request. And even then, you know,

request. And even then, you know, there's just going to be some back and forth. It's not going to be it's not

forth. It's not going to be it's not going to be perfect. And this is taking forever. I mean, I've been sitting here

forever. I mean, I've been sitting here for like 10 15 minutes. It's trying its best. It's booking with a variety of

best. It's booking with a variety of different means. And I don't know, who

different means. And I don't know, who knows? Maybe it'll actually go and do

knows? Maybe it'll actually go and do the booking. Okay, there we go. We

the booking. Okay, there we go. We

actually did end up doing the booking, thank goodness. That said, that took

thank goodness. That said, that took forever and it was obviously a very fragile solution that only works with like particular cal.com pages, right?

And so that's where we move to the next level of automation. That's where we go from simple HTTP requests which you know most services out there will have some sort of API application programming interface that you can actually communicate with. But um you know

communicate with. But um you know they're super fragile. They require very particular formats and as you can see they could take a really long time and then they're very narrow. That's where

we move from sort of like the first level of automation HTTP requests all the way to full scale browser automation which is where uh cloud actually fully controls your browser. And you know there there are a couple of built-in

tools with this now. But typically the best way to do this is using one of two tools at least as the time of this recording. Um Chrome DevTools

recording. Um Chrome DevTools MCP or there's also the browser use platform which actually is pretty new, pretty recent, but it costs a fair

amount of money. And so what this does is instead of just sending HTTP requests under the hood, what this does is it actually loads up a whole browser for you and then goes through the process of doing a booking. So you see how hard it

was for me to do this, you know, sort of simple task of like booking a meeting on a calendar even though I gave it the exact time, the exact information and so on and so forth. That might have taken a human being 1 second. It took me like something like 5 minutes of back and

forth and probably like 40 bucks of tokens. So meanwhile, I can open up a

tokens. So meanwhile, I can open up a page that has Chrome DevTools MCP. And I

could basically say, "Go here, book a 30 minute meeting for uh I don't know, March 30th at 300 p.m. Nick test

Nick.com.

Answer a bunch of demo stuff for any booking cues." Okay? And I I just want

booking cues." Okay? And I I just want you to look at what's going on. I was

just using Chrome somewhere else, so it's just going to kill the pre-existing um instance, but now it's actually going to open up a new one. I want you to notice that like this is actually like opening up a freaking instance on my browser and then it's scrolling through

and it's clicking on buttons and navigating on them uh navigating through the page for me. It's literally doing this by modifying the JavaScript of the page and running brief little commands in order to like communicate and go

through things. So, it's filling out the

through things. So, it's filling out the phone number, what made you want to contact Nick's team, what's the project budget, please share anything that will help us prepare and so on and so forth.

I think the project budget in this case might not actually be um five or I don't even think that's a an option because we don't go that cheap. As you can see here, it's finding the options for the budget, selecting 25 to 50K, and then it

actually goes through and it it does it.

So, what are we learning from this experience? This is much more general.

experience? This is much more general.

Okay, it works way better for a much wider variety of use cases, but it's also a lot slower. Right? This is

something that previously um I could have just sent one HTTP request once I know the format and then I would have like booked after like 0.2 seconds, right? But now, you know, we're kind of

right? But now, you know, we're kind of going through the page one step at a time. Every single one of these actions

time. Every single one of these actions realistically is kind of like a it's almost like the same amount of time that a single HTTP request would take. Now,

what it's doing is it's actually deleting um you know, my numbers and trying to reformulate numbers and stuff like that in order to like make it a valid phone number. And you know, after a little bit of finagling, it it actually ended up finishing it, which is nice. So, it actually went through, it

nice. So, it actually went through, it confirmed it. It then went through the

confirmed it. It then went through the booking process and so on and so forth.

And it actually took screenshots the whole way through of the process. So, um

why am I showing you this now? Because

basically this is a gradient where it takes more setup time to do browser or any sort of automation via HTTP

requests, but it's faster and usually cheaper.

Then there's a spectrum where we go from more setup time, faster and cheaper to basically always works but more expensive and slower assuming that you you don't. And so what

does that mean? That means for any sort of like prototyping business application on a browser. I typically use browser automation or even computer automation which I'll talk about. And then once I've sorted out that it works, I'll actually go and I'll see hey can we do this via HTTP request because if so

it'll be way cheaper and then we can just run a bunch of HTTP requests in the background. It's important because like

background. It's important because like most of the time like the cool stuff that you can do with cloud is actually just like automation, right? So

understanding sort of this trade-off between pure HTTP requests which typically function off of like you know hidden APIs or whatever and then browser automation full computer automation um will let you be able to control a lot of

things much better. So that's just one example of browser automation. I could I could use browser automation for anything. Hey, I'm considering renting

anything. Hey, I'm considering renting in Vancouver, BC, looking for $3,000 a month uh max one-bedroom rentals somewhere in the downtown core are in buildings that have cool amenities like pools and stuff. And then the bottom two

are sort of like our budget options. I

could stick that puppy in there and then it'll actually go through and, you know, navigate to some rentals.ca page. I

couldn't do this via HTTP requests uh without spending a lot of time sorting all this stuff out. Even then, it would be very fragile because the way that these websites work is they actually like explicitly try and go anti-automation. They make it like

anti-automation. They make it like really really difficult to do anything.

But um you know, in this case, what can I do? I can actually just open it up. I

I do? I can actually just open it up. I

can change a couple of filters and I can actually go and like zoom in on the page. It can do whatever the heck. Can

page. It can do whatever the heck. Can

use the stuff on the right hand side. It

could it could use the stuff in the middle. It can thumb through things. It

middle. It can thumb through things. It

can get me like a big list of apartments and so on and so forth. And I mean like the trade-off here is this is going to take a fair amount of time, right? Like

as you see, it's like one action every 5 seconds or so. But it's so general that I could just give it a task and we'll go and do it. You know, if I were to try and do this by saying, "Hey, go scrape the rentals.ca web page or whatever that

the rentals.ca web page or whatever that that would take so much time in order to build to the point where it doesn't just error out." And then most websites are

error out." And then most websites are also very anti uh anti-HTTP request automation because it's the simplest and easiest one. So you end up just getting

easiest one. So you end up just getting like error error error error. This

actually like uses my browser which is kind of neat, right? Anyway, I'm just going to let all this stuff go and uh in the meantime talk a little bit about browser use, which I think is probably

like the the next level up.

Just called browser use the way the AI uses the internet. I don't know how long this is going to end up being sort of like the the way to go, but basically this is like the next level up from Chrome DevTools MCP where you give it

some very simple instructions and stuff like that like fill out my loan application and it'll actually go through the form using something very similar to what we did. Maybe it uses Chrome MC Chrome DevTools MCP under the hood, I don't know. Um, and you do it

for, you know, like a bulk onetime payment of 100 bucks plus like pay as you go via credits. So in my case, I'm not like affiliated with this company at all to be clear. So, I'm not going to touch on it too much, but obviously it's

a pretty cool product. Um, the big draw I would say for most people here is just like HTTP requests can be blocked because of uh, you know, platforms and stuff like that just being scraped all

the times. So, they try and stop you.

the times. So, they try and stop you.

Um, so too can Chrome DevTools MCP be blocked in like any sort of like instance browsers. This platform like

instance browsers. This platform like basically the whole point, you know, just to kind of cut to the, you know, the pricing page and all that stuff like like 99.9% of the reason you would want to use this is because it is completely

undetectable. um you could make HTTP

undetectable. um you could make HTTP requests um sort of the old school way and then try proxies and stuff and maybe that'll work but maybe it also won't.

But if you go Chrome DevTools MCP and that doesn't work, this is what you do and it's basically like 99.9% perfect.

Um it does this because it fingerprints aka it like gives every one of your browser instances that are controlled by AI like this hyper custom sort of profile. So it seems like it's like a

profile. So it seems like it's like a request that's made from a real person and then in that way it like just like offiscates it all. So, for most purposes, like I still use Chrome, DevTools, MCP, and this is like my main pick. But if I have anything that like I

pick. But if I have anything that like I need to do in sort of a sneaky way. And

uh when I say sneaky way here, I mean like this is great for stuff like social media. So, if you want to do like

media. So, if you want to do like Facebook scraping or Instagram scraping or if you actually want to like interact with and leave posts and comments and stuff, pretty tough to do just right out of the box sort of with like a a version

Chrome DevTools MCP. But this is really really good at like posting um sending DMs, X connect requests, whatever the heck you want to do. Um so yeah, not affiliated with that company at all, but it is pretty sweet and I think that

that's they're probably going to remain the market leader in there. But anyway,

so just like HTTP requests had a lot of setup time, but they were faster and cheaper once you set them up. Browser

automation is kind of like a good like middle ground where it's like um oh, you know, like this actually has some some basic browser functionality built in and like it's pretty obvious how to like click a button or whatever. Computer

automation is sort of like on the far end of the spectrum where basically no matter what you throw at it, it will always work. The downside is it's very

always work. The downside is it's very expensive. Takes a tremendous number of

expensive. Takes a tremendous number of tokens at least right now and it is very very slow. And the way it does this is,

very slow. And the way it does this is, you know, whereas HTTP requests manipulate like APIs and curl requests.

Curl is actually lowercase. Browser

automation manipulates JavaScript and um I don't know like page clicks like button clicks. Computer automation

button clicks. Computer automation literally controls your mouse and your keyboard.

And because it controls your mouse and your keyboard, you can do more or less whatever the heck you want. Like I could literally like it could take my mouse and then it could go all the way up here and then it could close that tab. Could

move this all the way at the left. It

could close that tab. Like basically it can do anything on the computer that I can do. Now the way you do this right

can do. Now the way you do this right now is you got to use the Claw desktop app. So I'm going to head over to

app. So I'm going to head over to Claude. I'm then going to open that up.

Claude. I'm then going to open that up.

And then I think it's currently available in both coord.

I'll just move over to the co-work tab and I'll say have computer use.

Scan through my downloads.

Find the um image called maker school 26 or something and then rename it to weekly community

call picture.

And the reason why I'm doing this is because every dang week I have a weekly community call and then I always just lose where the image is that I use as the thumbnail. And what it's going to do

the thumbnail. And what it's going to do to start is it's actually going to whip up like computer use. So it's going to request access to my finder. And now as you can see here, it's actually whipped up like a computer use thing. So now

it's going to go through and actually like type in my downloads folder or whatever. navigate over there and it's

whatever. navigate over there and it's just going to start typing a bunch of different things like maker school and maker school 26 and probably try multiple variations of like maker school

maker school underscore and so on and so forth because it's using my mouse and my keyboard. You know, I can actually like

keyboard. You know, I can actually like scroll through and uh and do things.

Now, this is like local browser automation. This is actually literally

automation. This is actually literally exactly what I want, which is nice. I

could have done this in like 30 seconds, but it's nice that it's figuring this out. It's using like a local browser uh

out. It's using like a local browser uh sorry local automation here to like click through, scroll down and stuff like that. If at any point in time I

like that. If at any point in time I want to change it, I'll say no, you had it. It's the

it. It's the the cover 26.

I'll press that in just so that it knows what it's doing. All right, just went to grab a coffee and I got back and it has now found the Maker School icon 26.

renamed it to exactly what I wanted. And

yeah, I guess I screwed up on the name, but that that was what I wanted, which is pretty cool. So hopefully you guys could see pretty straightforward here to use computer automation. Takes a lot longer. Also consumes a lot more tokens

longer. Also consumes a lot more tokens because it is literally like controlling my mouse as it moves across the page, taking screenshots of everything as it does so. And the amount of like fidelity

does so. And the amount of like fidelity that it requires in order to do that is is pretty high. But um yeah, I mean like eventually, okay, put on a loop, this sort of thing will work. Uh it might just take a tremendous amount of time.

just give it a task, say keep going until you solve it, and it will do it.

It will just probably burn your a hole through your wallet um while while it does. So, realistically, the probably

does. So, realistically, the probably core play um that I repeatedly fall on as somebody that designs these systems for real businesses that earn hundreds of thousands to millions of dollars a month is I will start with some form of

browser automation for the most part since we're usually just doing this in browser. Um I'll usually try Chrome

browser. Um I'll usually try Chrome DevTools OCP first. If that doesn't work because it's like a stealth application or it's something that, you know, requires social media access, I'll do browser use. Once I have that flow down,

browser use. Once I have that flow down, you know, unless it's like a Facebook or something like that cuz, uh, those are just notoriously difficult to like HTTP automate as well. Um, assuming that it's

not, what I'll do is I'll look to have Cloud Code build like a custom utility based off of the data that it gets from Chrome DevTools MCP because it'll have access to network requests. I can

actually see the requests that are being sent and received. Once we have all that, then I now have like the API internally. I write a bunch of docs and

internally. I write a bunch of docs and have cloud code sort of like embed that within my workspace and then the next time around I can just use HTTP requests. Although, you know, keep in

requests. Although, you know, keep in mind that when you do it this way simply because of the volume that you're able to hit and the fact that HTTP is like typically a lot more regulated than browser automation. Um, you know, there

browser automation. Um, you know, there are some there are some risks to that as well. You could get rate limited, you

well. You could get rate limited, you could get throttle, you could also get shadowbanning economically valuable knowledge work through claude. Um, it's really just

through claude. Um, it's really just HTTP request, browser automation or computer automation. Whatever way you

computer automation. Whatever way you decide, just know that doing that sort of automation is against the terms of services of a lot of platforms that you work with. So, I'm not condoning this. I

work with. So, I'm not condoning this. I

can't really explicitly recommend it.

Just making sure that you guys understand sort of what's available and um what other people are doing as well.

Next up, I want to talk about claude code performance fluctuations and what to do if and when this ends up happening. I don't know if you guys have

happening. I don't know if you guys have ever watched that movie Interstellar, the one with Matthew McConnA. It's one

of my favorite movies ever. And in it there is a major problem that has plagued the world that uh has you know sort of set all the events in the movie in motion. And that's basically this

in motion. And that's basically this idea of the blight.

Now what the blight is is it's some disease that started affecting a bunch of plants. And as a result something

of plants. And as a result something like 90% of all of the food in the world is now just corn. It's a specific type of corn. That's why they got these big

of corn. That's why they got these big corn fields and stuff. And then you know the main character's family just does corn farming all day. So in history,

this idea is referred to as monoculture um harvesting like monoc monoculture farming essentially. And it's where you

farming essentially. And it's where you know one particular crop is just so damn good. It's just so freaking productive,

good. It's just so freaking productive, right? Has the highest yields and so on

right? Has the highest yields and so on and so forth that over the generations the farmers learn, well this is the best crop ever. Why don't I just replace all

crop ever. Why don't I just replace all my crops with this crop? Then I can make a bunch of crops and then I'll just trade this crop for other crops as necessary.

Every time that happens, usually productivity or yields will go up. And

they'll go up for sometimes a long period of time, sometimes like literally generations. And then all of a sudden,

generations. And then all of a sudden, what occurs is there ends up being a problem with that crop. The problem is either in the soil, the problem is maybe a bug that is developed that like really screws with that crop specifically or

something else. And because all of the

something else. And because all of the farmer's eggs were in that one basket with that one crop, what ends up happening is this blight or this disease or this circumstance ends up destroying all of their crops at once. That's led

to some of the biggest famines throughout history, I believe. And um

it's one of the reasons why, you know, farmers nowadays do a bunch of things, namely crop rotation. They have multiple different crops that occupy the same thing of land. They uh you know, usually don't do just one crop, they have

multiple crops going, whatever types of crop they are, just so that if a harvest on, you know, one type fails, then you know, they'll at least get something from something else. Well, the reason why I'm bringing up this analogy, and I

think I've really hammered it home here, uh is because I think this applies to cloud code. Cloud code is really good. I

cloud code. Cloud code is really good. I

don't think there's a better coding harness out there. I don't think there really is anything better than cloud code. At least as the time of this

code. At least as the time of this recording, and I don't know if there ever will be. This is me just being honest with you guys. I think at a certain point with AI, you know, an agent's ability to program the next model, okay, just gets better and better

and better. And so the people that have

and better. And so the people that have the better agents, if if they apply their resources effectively, just end up with like this impossible advantage due to exponential growth. So what that logically means is that, you know, it's

the best crop ever, right? It gives you the biggest yields ever because it's so productive and because it makes you productive. You're probably just going

productive. You're probably just going to want to use it all the time. The

downside to that is there are a lot of things here outside of our control in terms of cloud code performance. And

sometimes cloud code performance goes up and it goes down and other times just completely gone. So the reality is um

completely gone. So the reality is um we're probably all going to be using cloud code a lot because cloud code as mentioned is freaking awesome. But if

you grow too rely on it to the point where cloud code is basically a monoulture crop, you end up with situations like this, which actually just happened yesterday, just one of many occurrences. To make a long story

many occurrences. To make a long story short, cloud went down. You know, there was a big issue with Opus 4.6. And I

think it lasted like maybe an hour or so. And basically 95% of developer

so. And basically 95% of developer productivity uh plummeted the second that cloud was gone. The reason why is because, you know, cloud was everything.

They sort all their files on, you know, the claude desktop app with uh simple skills that were just made in like Claude's format and nobody or nothing else's. The second that Claude uh uh you

else's. The second that Claude uh uh you know was down then all their prompts that they had saved and specific points and stuff like that were very difficult to access and they weren't good to use with other models. Whole code bases that had been designed by Claude were not

interpretable at all. There was no commenting. So they tried using other

commenting. So they tried using other models and other agents and like that didn't really work. And then ultimately cloudish is the best. The intelligences

of the these other agents just don't work the same. So, you know, just led to like a bunch of bunch of issues essentially. This isn't the first time

essentially. This isn't the first time that this has happened. Um, this has actually happened a number of times. You

know, this is Adam from um earlier today talking about like major outages with claude and how different types of platforms are operational whereas other ones aren't. There's also um a bunch of

ones aren't. There's also um a bunch of cloud code performance degradations. You

know, I just looked up an old post from I think it was uh Turk here who's one of the lead guys on cloud code. He like

drops cloud code updates and stuff all the time. Well, anyway, um, you know,

the time. Well, anyway, um, you know, there were degradations historically, this is December 17, 2025, of Opus 4.5 and Claude Code, where basically because of some runaway either garbage

collection or some sort of like memory issue, um, you know, Opus just got worse and worse and worse and worse every day for a certain period of time, which led to like, you know, massive performance decreases uh, literally probably on

planet Earth, at least in knowledge work. So, okay, hopefully at least this

work. So, okay, hopefully at least this point I've convinced you guys why claude is nowadays probably already pretty monocultury and likely as it continues to to dominate likely to just become

more and more and more monocultury over time. Um, the question obviously is what

time. Um, the question obviously is what the hell can we do about it? And so

there are a couple of uh solutions and most of them revolve around this idea of diversification where basically you know instead of just putting all of your eggs in the clawed basket. This is my cute

little basket. Sticking it chalk full of

little basket. Sticking it chalk full of you know nice clawed eggs. What we do is instead of putting I don't know all 10 of our productivity eggs in this clawed basket. We put like seven eight or maybe

basket. We put like seven eight or maybe nine in them. Okay? So maybe like seven out of 10 in claude. And then what you do with your other three out of 10 is you just distribute them. you distribute

them such that you know I don't know one out of the ten are in codecs you know another one out of 10 my god I'm going to get really good at drawing

these are in uh I don't know like anti-gravity's like gemini right and maybe one out of 10 are in some other

type of coding harness like pi or something that maybe also uses like some form of like local models or whatever the point that I'm making is obviously we're being pragmatic here like you should probably predominantly use the

best model out there because you know it's not like a it's not a linear thing.

If a model is like 1% better than another model that 1% once you get smart enough is like the difference of like a gulf right Einstein is like 1% smarter than a normal human being or something

like that and he was able to come up with the theory of relativity or something along those lines. Obviously

don't take me at face value there. I'm

sure his IQ is through the roof. But the

point that I'm making is like when you get to this point with these weird galactic intelligences, even like a small little increase in the the the intelligence of the model might lead to like a big downsize difference, right?

So if you have the ability to use the best model, just use the best model. But

don't put all your eggs in that basket because if that occurs then what'll basically happen is like as the performance of claude over time goes up

assuming um claude is orange your total productivity in uh blue here will also go up basically in lock step.

And so if the performance of cloud goes down so too is your entire productivity.

If the performance of cloud goes up so too does your entire productivity.

instead diversify. Okay, instead of just this like yellow one which is Claude, maybe you have like a green one here which is Codeex. And what occurs is, you know, Codex maybe is a little bit more like this. And so what ends up happening

like this. And so what ends up happening is the average performances of, you know, both of these sort of average out and then instead of being super reliant on Claude, what you get to do is, you know, this black thing which is like you

ends up being a lot more stable. It's

the same thing in uh investing. Have you

guys ever invested in like, I don't know, an ETF or some sort of um um um index fund? You know, basically the way

index fund? You know, basically the way that all of the stocks work is there'll be a stock that does this, there'll be another stock that does that, there'll be another stock that does this, there'll be another stock that does

this. Do you see how volatile Okay, that

this. Do you see how volatile Okay, that stock probably doesn't go back. Do you

see how volatile all these different stocks are? Well, rather than tie your

stocks are? Well, rather than tie your your literal life savings to all of you know any one of these stocks, you just tie them to all of them simultaneously such that you know over time maybe your things slowly goes up and that's a lot

more reliable and dependable. Okay. So

the way that you do this in practice, the way that you diversify your models in practice, um, is you use platforms built in that have the ability to orchestrate or juggle multiple different

types of agents just inherently or you use things like MCP servers or whatever that allow you to do that sort of thing within cloud code or within, you know, some other um, you know, coding agent.

And so obviously like right now, okay, if I'm just being pragmatic with you, there's there's Cloud Code and that's sort of like the big boy and he's they're fantastic. Then there's, you

they're fantastic. Then there's, you know, codeex. And some people will swear

know, codeex. And some people will swear on their mother's life that the codeex is way better than clog code, but I I don't really think so. And then there's like, you know, um Gemini isn't really the the right term. It's sort of like

anti-gravity's like agent chat within um um uh anti-gravity. Okay. And this is sort of like my little personal tier list, but basically, you know, use other models in conjunction with harnesses and

stuff like that you might have set up in cloud code for for best results. Okay.

So, yeah. Anyway, there are two main major ways of doing this right now. The

first is using a platform like Conductor. If you've never seen a

Conductor. If you've never seen a platform like Conductor, what this does right now is it allows you to create a bunch of parallel codecs and cloud code agents inside of isolated workspaces on your computer. You can then just like

your computer. You can then just like with anti-gravity or you know cloud code desktop app or whatever, you can just see how their performances and what they're doing sort of in real time. And

because you are just the conductor up at the top, if you know the clawed code chunk of these don't end up working, but then the codeex ones do, then that's perfectly fine. It doesn't really change

perfectly fine. It doesn't really change anything for you. You're just going to like momentarily allocate most of your time and energy to the codeex ones. It's

all in the exact same interface. It's

very straightforward. You just do it all, you know, through this sort of like conductor interface. Super easy. And

conductor interface. Super easy. And

then uh you know like this is used by a lot of real big people all over the place to basically average out minor statistical fluctuations in models and then allow for the taking advantage of

different parts of different models that are slightly better slightly worse than each other at things like for instance a lot of people think the codeex is actually like quite cracked at you know the sort of like deep contemplation required to make big backends and it's better than cloud code. I don't know if

I entirely agree with that and I think even if that were correct today, it probably would not be correct in like a few weeks because things change so quickly. But, you know, this allows them

quickly. But, you know, this allows them to take advantage of Codex's ability to build the most cracked back end ever and then have Cloud Code do some other thing that Cloud Code is great at. Okay, so

Conductor is pretty sweet. Uh, I'm not going to worry too much about like setting it all up. It's actually quite self um explanatory and I don't want to just make like a 700 hour YouTube video that's me, you know, setting up a bunch of different platforms. There's no real

value to this. These guys set out the knowled the um documentation really really plainly and really intelligently done here. You can just click that

done here. You can just click that download button, set it up and and you'll be good to go. Okay, so that's number one, right? Number two is you can use something like um MCP servers to

distribute your load across multiple different models. So for instance,

different models. So for instance, there's this codeex MCP server which you know technically lives in cloud code. So

if cloud code does go down or something like that, you won't necessarily be able to use it. Keep that in mind. Um, but

you know, if it's just one of the cloud models or whatever, it's a little bit different. Basically, what you do is you

different. Basically, what you do is you download an MCP server that allows you to communicate back and forth with a codeex. And so, that one's very

codeex. And so, that one's very straightforward and easy. There's

there's a git repository right over here. It's very straightforward. All you

here. It's very straightforward. All you

do is you literally just like install the codeex CLI, okay? Using npmi-g

at OpenAI/Codex. You just give it your OpenAI API key. Then you just add it to Cloud Code. Then you can actually just

Cloud Code. Then you can actually just like have a conversation with them. So,

for simplicity sake, I'm actually just going to do that cuz that's a lot faster. I'm just going to go back to my

faster. I'm just going to go back to my anti-gravity instance, which is right over here. You can see I got a search

over here. You can see I got a search back a little while ago from something that I was working on. I'm just going to open this up and I'll say install this.

I'll say keys andv don't share. This is a demo. Let me know

don't share. This is a demo. Let me know when done so I can restart. And what

it'll go through is it'll go and install the codeex mcp server. And then I can just go here and I could say, hey, ask codeex how it's going. So now what it's going to do is rather than just you know

kind of operating on its own thread it literally just run through like a thing pinging codeex and saying hey man what's going on it echoed back the message successfully okay I want to chat with

codeex yes and let's just hear uh what it has to do what it has to say rather soc- cli codeex this is just a ping I guess to make sure that it's online this one is now saying hey I'm running on

codecs uh on gb5 in your local coding workspace I can do all this stuff the file system's currently restricted and so on and so forth. So I mean this will work in the cases where you want claude to like orchestrate a conversation with

codeex without actually having to go into codeex and that can that can be quite good when um you know you don't really want to like upset your local workflow you still want to work within cloud code and do everything that you're normally doing but then for whatever reason cloud code performance has been

degradated degradated degragated degraded uh but I should note that you know if cloud code itself goes down let's say there is some widespread anthropic outage you know your your next

best bet is to literally go and download probably like the codeex um desktop app here, download it for Mac OS and either get a subscription or at least know how to get a subscription,

know how to use the app such that if there are major issues with any one of these platforms, you know, at any point in time, you can just jump right back.

So, that's personally what I do. I

actually have Codeex up and running. I

know how to use Codeex. I'm very

familiar with Codec. um you know the way that I set up my workflow is not only do I have like a dot claude with the skills and and you know so on and so forth but at any point in time I just I can just

duplicate this whole workspace such that it's like generally accessible by any agent I can actually go over here and then say hey uh for whatever reason cloud code is down so I'd like you to duplicate this whole business workspace

change anything that is cloudspecific like the claude the cloudMD etc to um the usual agent specification. You can

find all that at agents.mmd.

Um, and in general, just make sure all of this stuff works for codecs. Then

what you can do is you can either run some sort of like synchronization flow or you could just like manually do this every now and then. And then you can send that off to codeex however necessary. Cool. Now it's actually going

necessary. Cool. Now it's actually going through this process of syncing the workspace to the exact same type of folder / business-codex. Then it's just changing my agents.mmd and stuff. Um,

what you could also do is inside of the same workspace, you could just like duplicate this, make this like agents or whatever. You could have this just all

whatever. You could have this just all go cap agents. You just probably need some line in your cloud MD that says, hey, when you update your cloud MD, also your agents or whatever. The whole

purpose of this workspace is to work with anything. Um, in my case, you know,

with anything. Um, in my case, you know, this is just very cloud specific and I'm making courses on cloud, so I can't really just mess this up and I don't want the workspace to get any any messier than it already is. But

hopefully you guys see how easy it would be realistically to do some form of diversification. Okay, so just to make

diversification. Okay, so just to make it super clear, there were three main forms that I was recommending here, right? The first form was I recommend

right? The first form was I recommend downloading and then installing a tool like Conductor. What Conductor does is

like Conductor. What Conductor does is it allows you to run a team of different coding agents right out of the bat using like the native CLI for codecs and um cloud code. And so you're actually

cloud code. And so you're actually having multiple agents just like operating in parallel. They're just

doing so sort of in one workspace that is not like branded or tied to any individual type of model provider. The

second one is using something like the Codex MCP server which is great to use when like cloud code is up but individual cloud models are degraded or there's some issue that are um that is preventing it from operating the way

that you wanted to. In that way you could still take advantage of whatever cloud model you do have access to and also like your own cloud interface let's say in cloud codes desktop app or maybe like an anti-gravity um cloud code

extension setup like I have. And then

the third is just operating in an entirely different agent platform entirely. My recommendation, at least as

entirely. My recommendation, at least as of right now, is to use codecs because uh every test that I've ran with Gemini is nowhere near as good um at anything except for front- end design. Perhaps

their new model will come out and that'll be way better or something like that, but I'm not going to hold my breath for that at the moment because as mentioned, I think Claude is really just the dominant the the dominant playboy as of right now. Okay? And all of this is

because we do not want the monoculture crop. We do not want all of our eggs in

crop. We do not want all of our eggs in one basket. We can have most of our eggs

one basket. We can have most of our eggs in the cloud basket for sure, but if you put all of them in, then you're going to suffer the exact same situation this present jit guy did where you know the second that cloud went down, he just couldn't do anything. Okay, so hopefully

that makes sense. I personally am about 70% cloud code and maybe 30% spread across codecs and then like a couple of open source models. Um, and then I use agnostic uh you know coding harnesses

like PI in conjunction with things like conductor in order to make sure that I'm good to go.

All right, now let's chat workspace organization. I'm going to show you guys

organization. I'm going to show you guys the way that I personally organize my workspaces, discuss a couple of alternative ways and then also just talk about like the hierarchy of information and then how to maintain like a really

root clean file space. So this is the structure that I basically have set up and I'm going to run you guys through my actual anti-gravity setup in a second. I

actually just had AI generate me a bunch of diagrams for this. So that's pretty meta. But um to make a long story short,

meta. But um to make a long story short, I store all of my business stuff in a business workspace. Okay? Now, my

business workspace. Okay? Now, my

business workspace includes a bunch of additional folders that you don't really need in order to have my structure there are very specific to the platforms that I use and whatnot. Really, the folders that you need, if I just cross out all

the stuff that you probably don't actually need, okay? And like you probably don't need this either. Some

people have virtual environments, some don't. But really the stuff that you

don't. But really the stuff that you actually do need um is going to be like aclad which is where you're going to store all of your you know claude specific files. It's where you're going

specific files. It's where you're going to store your skills. It's where you're going to store your agents and etc. An active or a temporary folder or whatever the heck you want to call it. But this

is basically just going to store everything else. So all the generated

everything else. So all the generated files and so on and so forth. Aenv where

you're going to put your obviously env keys. So any sort of like API keys,

keys. So any sort of like API keys, credentials, anything like that. And

then finally, your um local cloud.MD,

which is just like your local system prompt. And if you guys remember, we

prompt. And if you guys remember, we store the global system prompts um in a kind of like a tilda.

Claude folder um where you know the rest of your your global stuff is. And this

is like this is somewhere else. This is

usually like your home folder wherever that is. On a Mac, you know, in my case,

that is. On a Mac, you know, in my case, it's like Nyx. So if I go in my Nyx or I folder and then I show hidden, I can actually see the doc folder. I can click on it and I can see it under your workspace. If it's like a Windows or

workspace. If it's like a Windows or whatever, it's going to be different. So

you're going to have to look for it.

Okay. So mine obviously looks a little bit different from that, but I just want you to keep in mind those um you know the cloud, the active, thev, and then the cloud NMD. That sort of structure I showed you a moment ago. That's the one I'm going to be assuming that you you're

going to be building. Okay. So I

separate things into and I also have a personal version of this, but for now we're just going to stick with business.

a business workspace. And so I literally have like a folder on my computer, you know, nicks arrive and then it goes slash business and it's within this business folder that I

currently exist that I do all of my work. So what do you have inside of

work. So what do you have inside of business? You have yourv, you have your

business? You have yourv, you have your claude skills, which is sort of like the intellectual capital that you accumulate over time as you do various sopable things. You have your claude.md. Then

things. You have your claude.md. Then

you also have um you know like your active folder and the way that I personally organize this as somebody that not only um uses cloud code and other agents in my day-to-day life but also sells clients on the implementation

of these sorts of things and then is also responsible for using cloud code in order to fulfill the implementation is I separate it such that my main business needs that contain all of like like my

stuff is in this business folder and then anything that I do on behalf of my clients lives in specific client folders. So let's say I have a client

folders. So let's say I have a client called client A. Well, client A actually has his own env with the client's API keys. They have acloud/skills with the

keys. They have acloud/skills with the project skills. Skills that are highly

project skills. Skills that are highly specific to the needs of that particular project. You know, if I work with like

project. You know, if I work with like some sort of digital marketing agency and I have a skill that uh I use on their behalf in order to like connect to some service that they use to print out a report, like I would put that skill

inside of the client folder. Then I also have a cloud. MD um that essentially you know I just run with a slash in it and that also just describes a little bit about the client in the same way that I showed you guys earlier. I have my own

cloudmd that describes a bunch of stuff about me. So oh who am I Nick Sarif? You

about me. So oh who am I Nick Sarif? You

know I'm 30 years old. I'm an end-J. I

currently live in XYZ area. Here are all my businesses. Here much money I make.

my businesses. Here much money I make.

Here's all this like highly relevant contextual information. I also have

contextual information. I also have similar contextual information for my clients and then for their businesses as well as anybody on their team. So that

you know if I say hey send a message over to Jane letting her know XYZ. It's

literally just like one message and then and then it's sent. Okay, so I I duplicate that across all my client base. So client A, client B, client C,

base. So client A, client B, client C, however many clients you have, that's how many project folders I have. And the

key here, and the reason why I think this is like this most solid organizational scheme I've stumbled on after several years of working with this stuff, is you can actually call client skills while still being in the business um workspace. You know, it's not the

um workspace. You know, it's not the exact same because you're not technically loading them inside of the um if I just go slashcontext here, you're not technically loading them inside of the actual context. Okay, you

only got the ones that are like sort of local here, but uh you can still call skills that are not local simply by putting in your cloudmd a oneline thing that says, "Hey, um there's some skills that we reference that aren't all going

to live inside thecloud/skills folder.

These are client specific skills. If you

want to reference those, then you actually have to go inside of the client folder that I'm referencing and then, you know, pull it out that way." And so in my case, um you know, the business uh workspace is sort of like top level and the client workspace is sort of

underneath. So what's up with this?

underneath. So what's up with this?

Don't pollute a root. Always store an active or subdirect root. You know,

earlier I said I have an active folder.

The reason why is because if you start polluting your root, it just ends up being like a total nuclear bomb waiting to happen. You just have so many files.

to happen. You just have so many files.

Your files are stored all across one giant folder. Not only is it like

giant folder. Not only is it like visually insane to look at because it's like this is always open essentially and it just pushes all the way down to the bottom, but it's also a little disorganized for your agent as well.

better instead to store specific locations that you dump files to. Okay,

using the skill spec itself. So for

instance, inside of model chat, if I go over to my skill, you'll see that it actually specifies where to put the actual model chat. It literally says dump it inside of active/model-

chat and then name it in this particular way. So in that way, this model- chat

way. So in that way, this model- chat skill is actually hooked up over here to this model- chat um you know conversation thread. and I can open that

conversation thread. and I can open that up and I can actually like see the conversations that we have been having.

Um, it's also much more organized for the skill because I'm not just dumping everything in the same place. Uh, it's

super easy to do and then I don't actually have to do any sort of like agentic search or agentic lookup which I think is pretty valuable. Uh, because

agentic lookups are just more things that consume tokens. So, what I'm trying to say is I just store everything inside of like a folder I can toggle called /active and then I store any specific information as to where these things

will go um inside of the actual skill themselves. So, you know, there's a

themselves. So, you know, there's a bunch of leads of my own CRM. That's

where they live. There's like some config files for other things. This is

where they'd live. If I do research, this is where they live. And so on and so forth. I would never store random

so forth. I would never store random scripts directly in root. Neither would

I do temp files or data files. If you

want like temp files, files that you know are only going to be used for like a short period of time or in the course of a a process being executed.

Personally, I actually store these as like active/TMP inside of some hidden TMP folder so they don't even mess up my active. And you're

probably thinking like, well, won't I lose stuff if everything's super nested?

No, you you won't lose anything nowadays. You're trading off the amount

nowadays. You're trading off the amount of time it would take you to like scroll through your root thing for um the amount of time it would just take you to pump it into your agent to ask it, hey, can you find XYZ? But you'll find that if you just like allow the agent to

organize your workspace, it it tends to do so in a pretty consistent and then reliable way so long as you expressly give them a structure where you're like, hey, make sure to always put stuff in active. And remember earlier I talked

active. And remember earlier I talked about diversifying away from just cloud code. Well, what's really cool is um you

code. Well, what's really cool is um you know when you'd run a business workspace like this and then you have your client and and and so on and so forth workspace sort of underneath it um what you can really easily do is just duplicate your

cloud. MD into an agents and then a

cloud. MD into an agents and then a Gemini.mmd and you can just have all of

Gemini.mmd and you can just have all of these in all of your workspaces simultaneously such that if at any point in time you want to use I don't know cursor for something you want to open it in anti-gravity you want to do it directly in cloud code like you never

really run out of the system prompt um um design pattern like you know if you have the same thing written in cloud MD the same thing in agent MD the same thing in gemini you can basically just like have that on 247 now I haven't needed to do that personally um in quite

a while and I've actually been very lucky to have not been affected by some of the recent outages but I remember back uh I don't know like a month and a half ago or whatever I actually had like a specific line that said hey I want you to synchronize the cloudmd with the

agent and the gemini identity all the time just in case you know we have an outage and I need to drop this into a different coding platform now another thing that'll happen reasonably often is you know because we're not dumping stuff into our root we're going to end up

dumping a lot of stuff into um active right and so I have like just a bunch of stuff here dub video links yay dentist auto research hindi source you know when I was dubbing my stuff bunch of different um um screenshots and stuff

like But uh you want to periodically clean up this workspace. So you

periodically want to say something along the lines of, "Hey, clean up my active slashfolder. Anything inside of

slashfolder. Anything inside of subfolders are fine, but anything that's just loosely in the in the folder, like any txt files, py files, JPEGs, and related, I want you to clean up by

either deciding if it's necessary. If

it's just a temp file, just get rid of it. Otherwise, store it in a folder that

it. Otherwise, store it in a folder that makes sense. You're going to want to run

makes sense. You're going to want to run something like this reasonably often.

Um, the reason why is because you just don't want to have to, you know, scroll again through like a quadrillion different things. And you also want to

different things. And you also want to make sure that any future model that comes around can just like very logically look at some sort of organizational hierarchy and then make decisions based off of that. So that's

what's going on here with all these docs for Icloed, right? It's deciding what to do here. It's going to download them

do here. It's going to download them into different folders. It's actually

going to get rid of a couple files here, like, hey, this is a file. This is an incomplete download. This is a bunch of

incomplete download. This is a bunch of unnamed temp snapshots, right? And and

what you'll find is within like 2 seconds, it just does the whole thing.

Now my active folder is much much cleaner and I don't have to worry about this sort of thing um ever again which is nice. Um and you know in my case I

is nice. Um and you know in my case I also have a couple of these web design projects. Enumerate all the web design

projects. Enumerate all the web design projects in active. Um these are things like Volta or aura and so on and so forth. Find similar projects and then

forth. Find similar projects and then store all of them within a web- design folder. And despite the fact that you

folder. And despite the fact that you know you might be like thinking Nick why the hell are you spending time and energy doing this? Um, if your workspace is clean, the work that you do within that workspace tends to be a lot cleaner

as well. And so, I mean, in my case, I

as well. And so, I mean, in my case, I just found what, like 1 2 3 4 5 6 7 8 9 10 11 or something like that, different things, I've just sorted all these out.

Now, anything here that is more personal than business, let me know and I'll upload it into the personal workspace instead. I just let that go, but I don't

instead. I just let that go, but I don't obviously want to show you because there are some personal things in there. And

that takes me to the next point of workspace organization, which is everything that I just talked to you about. Um, when it comes to like

about. Um, when it comes to like organizing with a business at the top level and then having various client folders in, you can do the exact same thing with personal. And so I don't actually just have a business uh sort of

workspace set up. Claude has now gone beyond just my business partner. Okay?

And it also assists me with a lot of personal stuff. And when I say personal

personal stuff. And when I say personal stuff, I'm not referring to like, I don't know, relationship troubles or whatever. I'm talking about like for the

whatever. I'm talking about like for the most part my health, uh, you know, things like my my my citizenship paperwork, uh, you know, important documentation relating to my identity, personal projects that I have that are,

I don't know, related to like learning piano, uh, that sort of thing. And so,

like I have like a business one over here, okay? But just because I want this

here, okay? But just because I want this to be really, really clean, I'm also going to show you guys a personal um version of this, okay? Which is basically the exact same

okay? Which is basically the exact same thing. And then instead of doing this

thing. And then instead of doing this via clients, which you know, I mean, like obviously it's it's a personal project. It's not a client project

project. It's not a client project anymore and then you can't really do it that way. But instead of doing things

that way. But instead of doing things based off of clients, I now recommend doing things based off of like domain and or um you know like a particular field of your life. So, I haven't found

the best way to organize this yet, but for instance, I have one right now on citizenship because I'm currently proving my my citizenship to uh you know, a particular country in Europe.

And as a result, I'll be able to be an EU citizen. It's going to be pretty fun.

EU citizen. It's going to be pretty fun.

Likewise, I have a sub one called health. This contains a couple of skills

health. This contains a couple of skills that I use to like visualize my genetic libraries and stuff like that. And

hopefully you guys are seeing the point.

What you do is you just sort of you enumerate the clients of your personal life which tend to be projects like citizenship, you know, your health, uh I don't know, your skincare and whatnot.

And then you contact or or or list those underneath your personal workspace. Then

you also have skills related to your personal workspace like hey you know can you clear out all of my I don't know like personal emails for X Y and Z. In

this way you have a good separation at least in my mind between business uh life, your personal life and then also just logical grouping of each of the different things that you can do within them. So I also have as mentioned you

them. So I also have as mentioned you know that personal folder and I can open that personal folder anytime I want. Uh

it was just right back up here and that'll just contain you know specific personal conversations I've had with uh you know Claude and Anti-Gravity to do things. And I'm happy to like pay token

things. And I'm happy to like pay token costs, stuff like that to absorb that because my personal life isn't like personal personal. It's just stuff that

personal personal. It's just stuff that is not business, right? If I can improve the productivity of that, might as well.

One more thing you'll notice is that when I open up this personal, the colors were a little bit different. Um, I do that on purpose. I do that because, you know, if I am working on business stuff, I want it to be very clearly like accessible and visible to like my my my

monkey brain. Like I instantly want to

monkey brain. Like I instantly want to know I'm in my business folder. Whereas

when I'm in my personal folder, that's different. And so what I've done is I've

different. And so what I've done is I've made the outline of this green. I do

that by creating this VS Code settings folder and then I just have sort of like this um config that VS Code reads at the beginning of every run to like actually change the header bar. This isn't like a super big unlock or anything, but I do find just like having a slightly

different color will always just make my own be like, "Hey, this is my personal folder, so I have access to like personal information here, so I can actually have a conversation about whatever." I don't need to reprompt it

whatever." I don't need to reprompt it with a bunch of stuff. And you'll also notice that uh you know, this doesn't have like the Netlefi or a bunch of those other sections because this personal folder only stores stuff that is like for me. it's not for Netlefi.

Okay, so hopefully that gave you some insight into at least how I organize my workspace, but this is by no means the only way to do so. There are a bunch of other ways to do it as well. One

candidate way is instead of having like a business workspace, what you do is you just enumerate all the projects in your business. So I don't know, you might

business. So I don't know, you might have a project for instance that's like website overhaul. What you do is you

website overhaul. What you do is you have like a top level folder. Okay, your

top level folder might be business or it might be whatever the name of your company, Leftclick Incorporated. Then

inside you have a projects folder. Then

underneath your projects folder, you have like website design, you have conversion rate optimization, you have lead generation, and so on and so forth.

If you're running a business, you can actually now have your CRM entirely within cloud code as like ajson file.

And then uh periodically on a daily basis, you can synchronize using some sort of crown job or something like that to, I don't know, some events that are pulled in from your calendar. You could

store stuff that way. I've seen people host everything on GitHub as well. do

some sort of like daily uh uh uh download or clone of GitHub and then some sort of like nightly push so that they always have all their information stored on the cloud. You can do that in conjunction with the previous system I told you about or the

business/personal/client one that I talked about initially. You

can also just ask Claude to set it up according to however you like. If you

guys don't like the way that I set up my workspace for whatever reason, despite the fact that I do think it is probably like top 10, um you know, by all means, you can just ask Claude, hey, I want to have information for this. I want to have information for this. Can you build

me like a strong naming scheme or or system that'll enable me to do that better? Okay, hopefully you guys like

better? Okay, hopefully you guys like this and it made a lot of sense to you.

If you guys have any questions on that, let me know. But let's move on to the next module.

Now, on to a topic that I think a lot of people don't like, security. And bear

with me. Usually, most of the time when people talk about security, it's sort of divided into two camps. On the left hand side, you have like the accelerationists that are like, "Cloud code for

everything, baby. I just gave it my DNA

everything, baby. I just gave it my DNA and USB stick with all of my personal private information and passwords. Let's

do this thing. Then on the other side, you have like grubby old folk that used to, you know, program computers by punch cards. And so obviously there's some

cards. And so obviously there's some irreconcilable differences there.

They're like, "What the heck? Why would

you even I don't know like make something web accessible, man. You

should do everything on bare metal." And

then other folk are like, "Well, you should just have Claude code do everything." Now, the reality, like most

everything." Now, the reality, like most things, is nuanced. And in my opinion the best case is somewhere in between.

So this module and the next are going to be a lot of talking and a little bit of demoing. Um but it's important for you

demoing. Um but it's important for you guys to understand as cloud code ends up becoming more of the predominant generator of productivity in your life that there are a few small security differences or impacts that you can have

on cloud code that solve like 90ish% of all of the possible downsides and there's basically no reason not to do them. Okay, so I have this Google doc

them. Okay, so I have this Google doc over here that I'm just going to walk you guys through. And really, the first point I want to make is that everything on planet Earth is hackable. It's always

just a question of how hackable. You

know, your front door is hackable. Uh

technically speaking, the the Department of Defense is hackable. Everything is

hackable. It's just what is the risk and reward involved in securing it to the point where you, you know, dispel 90ish% of attackers. So the way that I see

of attackers. So the way that I see things, you should 80/20 security. Avoid

most the low hanging fruit and then just accept that there's always going to be some small percentage of people that are going to hack you anyway or try to hack you anyway. And you know depending on

you anyway. And you know depending on how big your vibe coded app or agentically engineered flow ends up getting obviously your attack surface is going to increase one to one with that.

You know just for a reference like when I was first starting on YouTube I had like one login attempt per month and it was always me. Well, now I get like probably 30 to 40 login attempts per day. It's just a bunch of people that

day. It's just a bunch of people that are constantly trying to hack my ass.

You know, back in the day I had nothing sort of to lose. Wasn't a very big deal.

Now it's obviously a lot a lot bigger.

And you find this as you kind of go up the chain. You know, if you become a

the chain. You know, if you become a public figure or whatever, obviously you're more likely to get that. Can't

imagine what Chris Hemsworth freaking openclaw probably looks like, but that's aside from the point. Just know that everything is sort of relative and in in your shoes, you should just cover the 8020. Okay. So, we're just going to get

8020. Okay. So, we're just going to get uh to a point where our app or setup is less hackable than the amount of time and effort it would require to actually go through it. Anybody could

theoretically break into your house right now. Most people don't because

right now. Most people don't because there's just a little bit more effort required to break into your house versus, you know, if you just unlocked your front door and somebody could walk right in. So, what we're going to do is

right in. So, what we're going to do is we're going to put the equivalent of a fence and a camera up. Eliminate most of these and then we should be good to go.

Okay. So, let's just cover some low hanging fruit right off the bat. And at

the end, I'm actually going to give you guys a simple security audit that you guys could use to copy and paste through any sort of app or system or or website or or web property that you have to basically minimize the probability of

this occurring. The first thing to know,

this occurring. The first thing to know, which I think most people don't, is that you actually leak API keys every time you chat through plain text with cloud.

Now, maybe they'll fix this at a future version, but right now it's not. All

Cloud Code conversations are actually stored in this folder right here in your computer. Tilda just stands for home

computer. Tilda just stands for home folder slash and then dot is a hidden convention in both Mac, Windows, and Linux where if you have a dot in front of something, you know, you just can't see unless you specifically enable like the hidden folder view. So, what that

means is you probably have a a longunning log of API tokens that are hardcoded there outside of, you know, a ENV or whatever. And just to show you, I'm going to head over to my anti-gravity instance. This one is the

anti-gravity instance. This one is the same auto research repo that we were doing other stuff on. And I'm just going to say, "Hey, I want you to remember the word." Well, let's not even do that. I'm

word." Well, let's not even do that. I'm

just going to say, "Hey, what are your opinions on quit codles?" I don't know. There's some sort

codles?" I don't know. There's some sort of animal I think called a quitzicodal.

Um, that's outside my wheelhouse. I'm a

coding assistant, so I don't really have opinions on misoan feathered serpents.

Interesting. So, hopefully I didn't absolutely butcher this. Is it quite codal? Ah, okay. It's this right over

codal? Ah, okay. It's this right over here. Okay. So, I'm just going to insert

here. Okay. So, I'm just going to insert this into a chat history. And the reason why is because I want to open this up.

And then I want to say search through um claude in the tilda folder for any conversation mentioning qua cotlas. And

what you'll see is there's actually a long running log of all conversations basically right here in this folder. In

my case, it's / user/nextra. That's my

that's my home folder. And now it's going to actually pull up the conversation files and give it to me word for word.

give them to me line by line whole convos.

And so essentially, you know, if we actually uh dive into the output there, um the way that this information is stored is they're stored in JSON L files, which are like JSON files that

are line by line by line. And you can actually see how they're returned just by doing a search here. I mean, I could obviously open it up, but you know, I probably have API tokens and stuff like that in there. I don't really want to

do. You can see that they're organized

do. You can see that they're organized into um like a big JSON sort of structure, right? And so you can

structure, right? And so you can actually see if it pulls it out, you now have the transcript which says user title assistant user assistant. This is

the exact same chat that we just had back here. And so I'm sure you can

back here. And so I'm sure you can imagine like you're going to have a bunch of API keys that you've pasted in plain text also available here. And I

mean like that's not the end of the world. Obviously we need to store our

world. Obviously we need to store our API keys somewhere. But uh a very low hanging fruit in security is just minimizing the number of places that you have um the same sensitive information spread out. Like if you have the same

spread out. Like if you have the same sensitive information aka an API key to like your anthropic account or whatever stored in five different places, the probability somebody stumbles ac across this at some point if they're hacking

you or if it's just some sort of routine data check or whatever um is is like not just five times higher, it's something like 500 times higher. And I think a lot of attackers now are realizing the attack surface. And a good place to like

attack surface. And a good place to like look for this sort of thing is in the conversation history. So, you know, you

conversation history. So, you know, you can't avoid having some API key stored around. But a really simple and easy way

around. But a really simple and easy way to avoid this is basically instead of inserting um you know, I'm just going to make like a fake env here. And then

instead, I think I'm going to make a new conversation. And instead of me just

conversation. And instead of me just saying like, hey, axelottle. Okay. What

I'm going to do instead is I'm going to store this um animal_ame.

And then we'll do axelottle right over here. We say, "Hey, I just inserted an

here. We say, "Hey, I just inserted an animal name in an env for a future task." You know, uh, very important we

task." You know, uh, very important we do not leak this name. Okay. Um, now

what it's going to do is it's just going to like clarify with me. It can use this in some sort of function or whatever the heck it wants. And then if I go through, see how it says never read or display the contents of an ENV file. or convert

enviles to Git. That's another pretty low hanging fruit. If you um have API keys stored in places that are not your ENV, a lot of people will mistakenly push that to GitHub. And like, you know, if you're pushing it to GitHub, now it's

on now it's on the internet as well, right? Which is even worse. But, you

right? Which is even worse. But, you

know, now if I go over here and I say, "Hey, can you find me conversations about um Axelottle in my and then I'm just going to go doclude."

It's going to search all damn day long looking for this thing and it's not going to be able to find it because we haven't actually like specifically said axelottle. And in fact, what's pretty

axelottle. And in fact, what's pretty interesting is the only conversation it found was where I specifically asked, hey, can you find me an axelottle?

So, it's going to look and see whether or not it can find it in other directories. It's not going to be able

directories. It's not going to be able to, but hopefully you guys get my point.

Okay, minimizing the attack surface in a really simple way. Just have all of your API keys in av. So, that's number one.

Um, number two, low hanging fruit is that AI models often hallucinate package names. In case you guys didn't know,

names. In case you guys didn't know, package names are just like dependencies that you have to pull in order for uh, you know, the usage of any project nowadays, you know, like libraries and stuff like that. And so, you know, there's like npm, which is typically

like the big package manager here. And

I'm just going to make this a little bit more visible for you guys. That stands

for node package manager. But basically,

like if you just type npm install, okay, uh, gez, I don't even know like what what are some popular libraries?

Anthropic. Maybe I'll just do a npm search anthropic. Okay. Uh, I don't

search anthropic. Okay. Uh, I don't know. npm install at Composeio

know. npm install at Composeio Anthropic. Like basically what occurs

Anthropic. Like basically what occurs every time you launch a new project or you have AI like design something for you is you'll you'll go through this like online resource this big package manager and then it'll automatically install like all of the packages it

thinks it needs and like that's usually not that big of a problem right because uh npm is like pretty well vetted but you know it's a package manager and so it manages hundreds of thousands millions of different packages and every now and then one of these packages gets

sort of compromised. Now the issue in the way that this increases the attack surface is that AI models often hallucinate a package name. They won't

actually always get it right the first time. Let's say uh you know you want a

time. Let's say uh you know you want a specific dependency or a package called acorn. Okay. Sometimes claude just

acorn. Okay. Sometimes claude just because the way that like the tokens uh were were sort of baked into its various encoding schemes and stuff like that will actually invent a dependency with

like an extra letter acorn s like acorns or acorn with an e or something. And a

lot of people that are sneaky and terrible and super evil and malicious have uh have sort of known about this for a while because of like various encoding issues and the statistical probability of adding additional letters and stuff. So what they've done is

and stuff. So what they've done is they've actually created new packages, okay, with small little misspellings of the main package and they've made those packages contain malware things that literally say, "Hey, I want you to go

through their and then go through all of their uh you know tilda.cloud

conversation logs and then send it over to me." Maha. And so the idea there is,

to me." Maha. And so the idea there is, you know, it'll obviously excfiltrate anything that is important to you and then it'll gain basically full control over your account. It's a form of like, I don't know, prompt injection almost.

But um uh you know, if you're making any sort of live project or ones that tie to API keys with any sort of unlimited usage, you know, there are going to be some out there where I don't know, you just turn the unlimited extra usage token uh thing on and then you'll have

access theoretically to like billing tens of thousands of dollars for a service. Be very careful with that. You

service. Be very careful with that. You

should just audit your dependency list for any unfamiliar packages. You should

actually ask Claude like hey are there any unfamiliar packages here that you don't actually actively use all the time or you know hey before you instantiate this the first time I want you to take a look at all at the npm run and ensure that the only packages here are like

legitimate packages that have verified histories and are not like inserting malware I'm kind of concerned and I'll give you guys like a whole security audit you could use for stuff like that in a moment but the point that I'm making is like this is another attack vector okay a lot of people don't realize this but um in addition to

leaking API keys and getting it all over the place am models also hallucinate package names the third main thing has to do with databases And uh this is going to apply mostly to people that are creating full stack apps or apps that

you know need to call some sort of external data store. A lot of the time nowadays to be honest I just store everything as JSON files directly on my computer. It's a lot easier and simpler

computer. It's a lot easier and simpler for me because I'm not really developing full stack endtoend apps as much these days. I'm the most part just designing

days. I'm the most part just designing flows for myself or internal tools for my team. But anyway, assuming that you

my team. But anyway, assuming that you know you want to go a little bit further than that actually develop full stack software apps. Essentially the simplest

software apps. Essentially the simplest and easiest way to ensure that like 90% of all noted uh database breaches do not

occur on your app is you just use this one little button called rowle security.

It's very straightforward and basically nobody does it which sucks. So superbase

which most of you are probably going to be using for any sort of vibecoded app function does not enable RLS by default.

they'll probably do so at some point, but for now, what that means is if somebody signs up to your app, um you know, typically they're given a key by which they can access their own database table. Uh well, if they have a public

table. Uh well, if they have a public key on a database that does not have RLS enabled, they can read, write, and delete every other row in your database.

And so, you have a lot of cases where, you know, there's some simple uh I don't know, there was a database for like molt book, which was like supposedly Facebook for a agents. That was just a few months ago. And you know, everybody was like,

ago. And you know, everybody was like, "My god, this is the revolutionary whatever." And then like the most

whatever." And then like the most elementary security audits done by some cyber security fellow showed that like they did not have database or RLS um a real row security enabled on their database. So he just went in and then he

database. So he just went in and then he like read literally every single AI agent that had ever been created on the platform in like 2 seconds. Then because

he also had write access he created like 100,000 fake AI agent profiles in like 2 seconds. Funny enough meta Facebook

seconds. Funny enough meta Facebook actually ended up buying them and uh hopefully they understood that a big chunk of those profiles were fake but who knows maybe they didn't. The point

that I'm trying to make is like very very low hanging fruit. Takes like two seconds to do. And uh once you're done with that, you can you can kind of move on. Okay. Be wary anytime you're

on. Okay. Be wary anytime you're publicizing a system like OpenClaw or like your little OpenClaw uh package to the web. So let's say you have some open

the web. So let's say you have some open URL. Let's say this is my open claw.

URL. Let's say this is my open claw.

Okay. And it's nickappyfuntime.com. I'm

kind of curious if I click on this. Is

there anybody at nickappy funtime.com?

Okay. Thank god there's nobody at nickappy funtime.com because I probably have to sanitize my eyes after that.

Anyway, uh, imagine you have your Claudebot or Moltbot or whatever the heck it's called now on nick-happy-fund.com.

nick-happy-fund.com.

Well, odds are if you have a URL and it's like a short straightforward URL and it's on an IP range that is like owned by, I don't know, some virtual private server hosting provider, you are going to be queried constantly by people

that are looking for vulnerabilities.

They will be scanning, okay, all over the place for every single port that's currently open in your computer. There

are huge baud farms, for instance, in China, in the Philippines, in some Indonesian countries, and obviously the West as well. I'm not just trying to point a finger over there, but you know, that's predominantly where a lot of these attacks come from. And there are

huge bot farms that people have set up a long time ago that literally their whole job is they just send tens of thousands of requests per second to like every URL constantly scanning to see like, hey, have they patched this one thing? Hey,

do they have this security vulnerability? Hey, do they do this? And

vulnerability? Hey, do they do this? And

the second even one of those things is good like you know allows them access now they have full access to your freaking machine and box basically and then they can do whatever the heck they want with it. So, I want you to know like if you set up some sort of like

public facing server using some sort of VPS based approach on uh you know like Hostinger or whatever the heck like one of these like major hosting providers know that it is constantly going to be

tested and if you are like wild you're raw dogging this you're wild westing this you don't like understand some pretty foundational things about like firewalls and um you know RLS and and so

on and so forth like people will find vulnerabilities your stuff will be hacked and so the idea is just make sure So, whatever you are putting in there is not like super extraordinarily sensitive. You know, don't give your

sensitive. You know, don't give your open claw agent your social insurance number or like a picture of your passport or whatever. That to me is like way too accelerationist. And I'm not being the old grubby person yelling at

clouds in the sky being like, "Back in my day, we used to punch card stuff."

I'm just trying to be reasonable here, right? Just no need to do stuff like

right? Just no need to do stuff like that. for the most part, you know, if

that. for the most part, you know, if you have like a local Claude instance that's running that's authenticated through Telegram and then you're using like I don't know the the the Claude channels feature or whatever, probability that a hack will occur there

is much much lower because you're just running it locally and you're not actually connecting through like an open thing. You're connecting through a

thing. You're connecting through a vetted uh uh you know Telegram kind of connector or plugin. But if you're just like OpenClaw raw dog, yeah, be be very careful with that stuff. By the way, this isn't just me ragging on OpenClaw

for the 4,000th time. I'm trying to be reasonable about this. I think

decentralized autonomous agents are obviously the future at some point, but you know, most of what we've seen so far has literally just pissed away people's API keys and credit card information.

Speaking of credit card information, never touch a credit card number. So, if

you guys are designing systems that interface with any sort of credit card whatsoever, don't actually like store that data. Don't actually read that

that data. Don't actually read that data. If that data gets read at any

data. If that data gets read at any point by like an AI agent, hell, even your AI agent, guess what's going to happen? Well, same thing. You know,

happen? Well, same thing. You know,

you're going to leak those API keys.

You're going to stick them in your conversation history. And then any sort

conversation history. And then any sort of hacker or you at any uh future point in time if you misconfigure stuff, push stuff to GitHub or I don't know like uh trade in your computer or whatever, you'll now have like a big log of all of

that information just in plain text which is easily vettable. You know, a lot of people will just like reax over your entire computer looking for things like, you know, credit cards if they get access. And then what's a credit card?

access. And then what's a credit card?

Well, usually it's like was it 16 or 20 characters or something? I have to check my credit card now, but it's like very very stereotypical, right? you find 16 or 20 characters all connected together uh maybe like with a space in between.

Boom, you got yourself a freaking credit card. Or maybe you don't even. They just

card. Or maybe you don't even. They just

look for that length. Then they check to see whether or not it's like a Visa pattern. If it is, you're screwed. So

pattern. If it is, you're screwed. So

anyway, I guess what I'm trying to say is like don't put that liability on yourself by storing other people's credit cards if you're running like some sort of business thing. And then don't um put that liability on your own card by storing your own card um here. You

know, use services like Stripe. They do

everything for you. They are super compliant, PCI compliant, and all this stuff. the teams that just like focus on

stuff. the teams that just like focus on making sure that stuff that is stored on their servers uh never gets screwed up and then you never actually have to deal with like the compliance regulatory aspect of touching credit.

All right. Now, once you're done sort of understanding this, which should be now because hopefully nothing here is super complicated, although some of these concepts are advanced, I understand. Um

all you need to do is just run anything public facing through some form of security audit for like maybe the other 8020. And so this is a security

8020. And so this is a security breakdown that I created for um a vibe coding course where I was showing people how to make full stack apps. Uh pretty

cool using Gemini in case you guys are interested. I guess this is Gemini ink

interested. I guess this is Gemini ink code. Uh you can find that on my channel

code. Uh you can find that on my channel if you want. Just type like nix drive vibe coding or something. And uh

essentially down here at the bottom what I have is I have a big security audit prompt where you can actually just feed this into claude and then have it like point out all of the security issues with whatever your your your flow is.

And so what I'm going to do is I'm going to go back here to anti-gravity. And I

mean I sort of I don't really have like anything that's public facing here, but I'm still going to run it through auto research. Then I'm going to just create

research. Then I'm going to just create a new one. And I'll say apply this to our auto research

flow. Um the one optimizing left click.

flow. Um the one optimizing left click.

Once done the security audit, return me everything we need to fix. I know

nothing is web accessible ATM. Okay. And

so what this does is it's just some it's just a big prompt that I uh developed in conjunction with a bunch of AI agents. I

had to like read a bunch of security blogs and so on and so forth to like look for the the biggest lowhanging fruit and the simplest minor configuration changes I could make. And

uh you know what it's going to do is just go top to bottom and then apply this. The reason why I'm uh spinning up

this. The reason why I'm uh spinning up a totally new conversation history is because I do not want any sort of conversation context to bias what's going on here. I don't want the same

agent I used to develop my tool to actually also run the audit because odds are it's going to be biased and it's going to do some specific it's going to make specific errors cuz it's going to think that it's better. Do you see here

how it's searching for sk live sk test sk--bear and so on and so forth? These

are all API token headers. Basically,

these are like the titles of API tokens.

What it just did there other people are going to do at any point in time if they gain access to your system. Same thing

here with like model weights and same thing here with like bash scripts and stuff like that. Okay.

Anyh who, so we're just going to read this top to bottom. Um, architecture

summary gives me some brief details about what's going on. It's not a web app. It's a local single GPU ML training

app. It's a local single GPU ML training pipeline. That's easy. No hard-coded

pipeline. That's easy. No hard-coded

secrets, but the git ignore does not include the env.env local and so on and so forth. Okay. All the stuff that

so forth. Okay. All the stuff that actually applies here is going to be filled in. So in this case, this is an

filled in. So in this case, this is an actual failure, but in this case, it's not not applicable because it's not an actual web app. Then you can see that there's also some sections where it fails. So, finding number one, supply

fails. So, finding number one, supply chain low popularity package, right?

Supply chain issue. Um, let's see over here. It's failed on some machine

here. It's failed on some machine learning specific risks. And that's sort of putting that out. It's funny that it's using the term vibing. I like that.

Anyway, so I'm not really going to go through everything with you, but basically what you do is you you you finish this and then you just say, "Okay, great. Fix according to your

"Okay, great. Fix according to your suggestions."

suggestions." Okay. And then once it's uh you know,

Okay. And then once it's uh you know, once it's done and whatever, I'm just going to pretend it's it's done now even though it obviously isn't. This might

take you like three or four minutes if you're running on something that isn't like uh you know fast mode like I typically run stuff on. What you do is you just go through and then uh you actually implement it. And just like I showed you a moment ago to use something

that is not biased with the conversation history, you spin up another agent to take the recommendations and then actually go through and do it because you also don't want that implement agent to be biased by the security audit kind of overly constrained nature of it. So

in that case you can use a sub agent or some other model itself like Codex, Gemini or whatever. And then you know ultimately you can have it reviewed by claude because I think claude is the best model. But in this way you're

best model. But in this way you're basically like diversifying similarly how we were diversifying by putting seven out of 10 of our eggs in the cloud basket but three out of the 10 you know spread across other models. You're

diversifying against any sort of inherent risk or bias that claude has uh to work that is generated by other clouds versus you know codeex or gemini or whatnot. So the best solution would

or whatnot. So the best solution would actually involve multiple runs through all of them. Okay hopefully that makes sense. I mean I didn't want this to be a

sense. I mean I didn't want this to be a big deal. Obviously, security as

big deal. Obviously, security as mentioned is only as big of a deal as you are willing to make it because of pre-existing assets and what you have to risk and stuff like that. So, if you just understood what I talked to you about right here and then if you get,

you know, a security prompt like what I showed you here, um you you should be good. Just pass something like that

good. Just pass something like that through an AI agent after you're done a project and it'll like cover most of the lowhanging fruit. And by the way, if you

lowhanging fruit. And by the way, if you want that security audit, then definitely check out that um vibe coding full course. Really easy. Just type nix

full course. Really easy. Just type nix drive vibe coding. I actually give you guys all that information for free there. You can also watch it if you want

there. You can also watch it if you want to learn how to develop things with other models.

Congratulations, you made it to essentially the end of theformational Claude technical content of the course.

And now I just wanted to reserve maybe 10 or 15 minutes to chat a little bit about what I consider to be the future of Claude. Uh not just the future of

of Claude. Uh not just the future of Claude code, but the future of Claude the model as well as the future of just agentic engineering in general. And the

reason why I talk about this is because it's a topic that's very close to my heart. I've been considering this for

heart. I've been considering this for probably the last 10 or so years. as a

kid that grew up on science fiction, you know, um, Foundation from Azimoff, tons of Arthur C. Clark books and Heinland and so on and so forth. I I've thought a lot about like what the far future would

look like in an environment that is controlled by agents like Claude Code.

And I've also thought about some of the intervening steps we need to get there.

And now that it's sort of being thrust in my face, I think there's a lot that you could realistically learn from even just like fictional representations of this that uh most people who probably haven't just stuck their head so far in the science fiction bubble. um I think

uh you know I think would find value in hearing. In addition, I obviously have a

hearing. In addition, I obviously have a lot of exposure to both mid-market and then enterprise here. Not to mention all the small businesses that I work with um through left click and I think that gives me sort of an edge here to at

least give you guys some sort of plausible future that has more of a 10% chance of probably being true. I mean

like things are changing so quickly I obviously can't be 100% sure what is going to occur, but these are some things that I consider to be like pretty lowrisk bets that if you make you'll probably have some form of alpha. Okay,

so the first main one is this trend of decreasing human involvement. Do you

guys remember earlier when I showed you guys that diagram where it was like vibe coding to agentic engineering to basically like researchbased direction with auto research and frameworks like that coming up? Well, this is still

something like we are creating right.

It's sort of like open- sourced not necessarily open sourced but um it's something that like you know the community is sort of working on. But all

of these approaches are soon to be quite formalized and it is very likely in my opinion that we are going to continue decreasing human involvement in tasks.

This auto research thing is a great example of ways to you know democratize sort of like little improvements. I've

kept this auto researcher running by the way um if you guys have remembered from like a couple of modules ago and we're now actually at like almost 8,000 millisecond load time from a baseline of 1802. Imagine if you had this running

1802. Imagine if you had this running 3,000 days in a row or whatever, or if you had this um running at like inference capacities 100x this, right?

Which we are obviously getting to. He

goes, remember how slow GPT3 was back in the day? If anybody here is an old head

the day? If anybody here is an old head that used that, well, GPT 5.4 fast or instant or whatever is way faster. And

imagine if you had a model that's 100 times that that fast with the same level of intelligence. You can make some major

of intelligence. You can make some major updates to basically anything. And so

the idea is, you know, we're probably not going to increase the level of human involvement in like direct coding and stuff like that, which is fine. I'm not

making like a value judgment or a normative judgment here. Um, but I imagine you as a developer or business person or whatever, will actually probably grow less involved in the

day-to-day work of either your company, your research lab, your your your your app, whatever the heck. And so my take is in the future we're going to move towards this sort of thing that a lot of frameworks have tried to formalize which

is that we're each going to be the CEO of sort of like our own company. Whether

it's an actual company in practice or whether it's you know some sort of organization that's like a company all of us will basically be just like the the chief executive officer running teams or fleets of agents that are constantly doing things on our behalf

and that have some sort of formalized framework that also like helps them optimize and make better. And so sort of the the way that this works I imagine is we would go from you know like the old

school Wright brothers flying the plane ourselves to sort of like modern aircraft engineers where there's somebody in the cockpit but for the most part you know an autopilot is taking over the vast majority of the work even

in you know like takeoffs and landings now they're obviously like so much so many SOPs and so much of like a process and framework that you know you can imagine how a system that was much less developed than ours much less capable of deep thinking and stuff could actually

probably just execute it entirely. at

this point. That said, you know, will we ever get rid of a human in the loop to some capacity? There are just so many

some capacity? There are just so many regulatory blocks and I think like ethical issues with that that we will probably always just have some person like manning a ship. It's just the number of ships that a person will man

the number of of discrete agents will just continue increasing until you know rather than have a 100 people do a task uh in some specific company like we used to have, we might have one person do 100

tasks. Leverage will go up. Now, a good

tasks. Leverage will go up. Now, a good example of this is Claude's recent um auto mode. I don't know if you guys have

auto mode. I don't know if you guys have seen I said auto mode, but I don't know if you guys have seen their recent development where basically you now have the ability to run some sort of autonomous mode instead of choosing you

know switch permissions or sorry uh execute uh um uh bypass permissions down here or ask before edits or edit automatically and and so on and so forth. Well, now we basically have an

forth. Well, now we basically have an additional one, auto mode, which um I just can't see here right now because I'm using a slightly older version of cloud code. I don't have that yet. But

cloud code. I don't have that yet. But

basically, you know, instead of you actually having to like go through this whole process of changing the security, um changing the access that it has, you know, cloud just does that for you. So,

like that's a pretty good example of something that used to require a person and now it's just like, well, cloud's going to get it 99.9% of the time. Screw

it. I I'll give it to them. Okay, so

that's a very small microcosm, but like imagine the rest of the loop like the planning loop right now. Typically, you

have cloud develop a plan for you and you implement on that plan. That whole

thing is just like being internalized.

Like we're not actually doing most of the plan development now. We will not continue to do most of the plan development in the future.

Realistically, cloud's going to do both the planning and the implementation.

Then the Q&A, it's like right now we're sort of in the loop. We're sort of like clicking in the buttons running it.

Well, they're developing automated testing procedures where cloud actually also does the Q&A for and then delivers you the whole thing. And so some people hate this because they're like, well, they're taking my jobs and whatnot. And

I think there's I think there's a fair point to that. you know, human beings uh leverage will continue to increase, but it depends on like how much work is there really to do? How many software products are there really to develop? Do

we actually are we even going to have like the demand for that sort of thing, but I think that's like a reasonable conversation to have. Uh and you know, unfortunately, I don't know the answer.

My my take is like eventually we're probably going to have to move to some sort of different economic system because most of the world would be unemployed otherwise. But that's me

unemployed otherwise. But that's me getting all political. Um that's number one. Okay. So, the trend of decreasing

one. Okay. So, the trend of decreasing human involvement is very likely to continue with cloud code. They're now at the point where they're developing this so rapidly that like AI is helping AI design products and uh you know automotive is just the beginning of I

think a massive suite of rollouts that will significantly improve your experience but you know make you more hands-off.

My second one is more of like an economic uh consideration which is that software products and tools okay the the quality of the things that you build will no longer be the mo. So in the in

the past in the good old days now back when I was on the comeup how good your software was think like Windows think like you know like Mac OS how good that operating system was might have been the

only thing that distinguished that operating system from another operating system and if it was really really good then obviously it would be much more popular and then it would get you know a bunch of like inherent interest and stuff like that because the capabilities

and you'd obviously use it so the issue with that nowadays is you can make Netflix in 5 Netflix before was this innovative streaming model that you know was like uh wow you know you can just load the

thing and then the the the the video loads on you for for you on demand and it's incredible and like the streaming and latency and uptime and all that stuff it's like super prop proprietary technology well now it's like I can code Netflix in 5 minutes with like you know

three or four agents on fast mode so it's like what is the value of Netflix what is the moat that differentiates Netflix as sort of like this like old school medieval castle from all of the attackers that you know could actually

take it down well the moat now and uh this has been something for at least a couple of years. The mode now is no longer the software. It is the distribution. So in a world where

distribution. So in a world where everybody has basically like a I don't know a nuclear weapon is the differentiator like everybody has a nuclear weapon. No, the differentiator

nuclear weapon. No, the differentiator moves to other things like I don't know the political framework like the wellness of the populace and stuff like that. What I'm trying to say is like

that. What I'm trying to say is like that that skill that software engineering ability is no longer going to be the moat and instead um the moat is going to move to you know the connections that a company has to its

consumers the reputation that the company has in the market the distribution that it has with a bunch of vendors that you know are hard one relationships and connections that they realistically built over the course of

many years. You know Netflix now has a

many years. You know Netflix now has a bunch of patents and rights and licenses and stuff like that to air specific shows. It's seen this coming and and so

shows. It's seen this coming and and so has tried to diversify accordingly. But

you're going to see that in basically every software platform. The moat will like probably move more to the distribution and the legal and compliance aspects than necessarily like how good the software is. Which means

you're going to have like these cracked probably like 14 15year-old kids designing like the most incredible amazing software ever. And then that software will be able to reproduce anything that like a major business would do in like a hundth of the time.

but you know because they don't have like the compliance or whatever certifications or whatever you know it'll probably be more difficult for them to actually go to market with something like that despite it being like objectively superior and uh you know the way that I see it is

we already have AI models that are at the limit of human reasoning capability they can run hundreds of times faster than our brains soon to be thousands of times faster than our brains on basic tasks so even if they're not like better than us at the software design

individually if you run a thousand you know 90 IQ models comparatively like one 100 IQ human those will eventually figure out the things that the 100 IQ human would do. And not only will you develop more software like quality,

you'll also develop more software quantity. And so software as just a

quantity. And so software as just a market thing, supply and demand, like economically, the supply will be so damn high that the demand for any sort of like purchasable software gets a lot lower, which means I personally don't think like a SAS product is really the

play here. I don't think there's going

play here. I don't think there's going to be any sort of life cycle for like subscription based products. I think

you'll have a short window of time where you could actually just monetize like a one-time buy product and then most people will just say, "Well, should I spend $199 on the product or should I just spend $19 plus 30 minutes of my

time on tokens and they just design it for myself." And I think that's going to

for myself." And I think that's going to change the way that we do, you know, like software more generally. So, I'm

not very bullish on like, you know, developing software as a service, apps and stuff like that. I have a lot of people be like, "Nick, you know, all this stuff, you know how to design all the software. Like, why aren't you

the software. Like, why aren't you making a software app and why why aren't you monetizing your community, let's say, through software?" And I'm like, I'd only really be able to do that for a short period of time. And then even if I were to, like, where's the value in that? If anybody could just make it, I'm

that? If anybody could just make it, I'm just sending them like 20 minutes and a couple bucks in tokens, right? It's not

that big of a deal. So, I mean, I would I would move accordingly, I guess. Okay.

The third thing that I'm like 99.9% sure of is that the pace of change is not slowing down anytime soon. It will

continue to accelerate. Just as

technology has helped us increase the pace of change through our history with things like the printing press uh with developments and you know communication with like the telegraph and so on and so forth. You know these things don't just

forth. You know these things don't just improve the quality of life of the average person they improve the research and development arm of technologists who work on that exact thing. And so because

of that you know the pace of change is is basically just going up. If I had to um graph sort of where we are now, and I will because I freaking love graphs,

right? Just the best.

right? Just the best.

And if I were to graph the intelligence, which is a very loose term here and obviously means different things to different people, but the intelligence of a model over time, you know,

basically I'd go like this. Okay? And so

this back here was sort of like linear growth from like maybe like the 1970s and stuff with like Minsky uh you know 1970s and ' 80s and stuff. Minsky and

like the the first few neural nets and stuff like that. Then this right over here is probably like I don't know 2010 when models started actually doing stuff

right then this over here is like 2020.

You know, this over here is like 20 25 and then this over here is 2026. Do you

see how how like high this is going? How

quickly? And then um a point that I want to make is basically like this right here is the intelligence of maybe like a like a chimpanzee.

Okay. Um this right here is the intelligence of like an average human.

And then this right here is maybe the intelligence of like Einstein.

And um what we what we have now is you know we're like right over here man these models I say smart as a chimpanzeee not to dimin or whatever chimpanzees but um you know their brains are extraordinarily advanced and

developed. They have like these

developed. They have like these cerebelli these these sections of their brains that are responsible for calculating like millions of of movements and and so on and so forth every minute. Like it's a very

every minute. Like it's a very complicated thing to like replicate the intelligence the distributed intelligence of an organism. And you

don't capture that all just by like hey can it write? Hey, you can, you know, reason and do math. Have you ever seen like a chimpanzeee's like memory? Have

you seen its like ability to like uh, you know, move around on a page and like figure out symbolism and then symbols, sorry, and then like count numbers up and their motor neurons? Anyway, the

point I'm making is not this is a course on chimpanzees, so I'll stop talking.

God, that's my nerdy side showing. But

um, that the gap between the intelligence of a chimpanzeee, if you just count up all the neurons in its brain, the intelligence of a human if you count up all the neurons in it brain, the intelligence of Einstein, they're actually very close together.

They're very clustered. And I'd say like we're basically right over here right now. So guess what's going to happen in

now. So guess what's going to happen in like you know the next few years. This

is going to go like up here and we are going to it's going to be like wow these things are so dumb. They're dumb. Oh

wow. They can do things that a chimpanzeee can do. And then like 6 months it's like oh okay these things are now like you know freaking galaxy brain intelligences that you know can do everything and anything for us. And um

imagine what happens when you know all of this is just humans working on stuff and then eventually it gets to the point where it can actually like use human level intelligence which is right now to like improve its rate of growth. This

thing is just vertical. I mean this thing would go so vertical it would go through my roof in 2 seconds. So that's

my take on it personally. I think um you know I think we're getting really really close to super fast paces of change. And

if you guys have like been monitoring the the cloud even claude code X page recently or like seeing YouTube there's there's new updates coming out every day. This would have been unfathomable

day. This would have been unfathomable just like three or four years ago to make this level of development and this level of like small additions to a software product while also making sure they're testable and reliable just

because intelligence is making intelligence more intelligent now. And

then the last thing I'm going to say is that uh the people that will control not necessarily control but have the most like power and ability over the course of the next years are people that learn to use this technology. Yeah, you're

part of a very like privileged minority.

And I don't say that in like the political sense of the term because I think that's all muddled up, but like you're part of a minority of people right now that like actually use this technology. Do you know how few people

technology. Do you know how few people even understand what an agent harness is? We're talking like sub like 1% of

is? We're talking like sub like 1% of the population of Earth. The percentage

of people that know how to use an agent harness like you are doing right now is even less. It's a fraction, vanishingly

even less. It's a fraction, vanishingly small percentage. I don't know if

small percentage. I don't know if everybody that watches this uh is old enough to remember, but there were like some protests back in the day on Wall Street. And uh the the point is that

Street. And uh the the point is that they were like we are the 99% or whatever. And they were protesting the

whatever. And they were protesting the massive wealth divide in specific parts of America between like you know really really wealthy people that work on Wall Street and then like the populace the rest of the people that like I don't know manage the service industry and hospitality and basically do everything

else. And they're like why do you guys

else. And they're like why do you guys get to have like thousands of times more money than us? Um you are the 1% right now. you are that group of people that

now. you are that group of people that I'm sure in the future other people will be raising their hands about and you know shaking their fists at because uh you have an enormous capability to use models like this for just cents on the

dollar to do incredibly amazing economically valuable things that would take that other group of 99% like like months to do what you could realistically do in a day. It's insane.

Um, you know, I think you talk all day about like the wealth divide, but you can also talk about like the productivity divide and uh the wealth improves the likelihood that you will be in that product the positive uh chunk of the productivity divide. You right now

even if you don't have a lot of money have access to insane technology and leverage simply because you're in it. So

that's going to increase. Now Willing

Gibson, one of my favorite authors, said it best. The future is here. It's just

it best. The future is here. It's just

unevenly distributed. Meaning that like we have access to insane technology.

It's just like not all of us do it at the same rate. There's small pockets of people like yourself that understand how to use these tools far better than others. And in doing so, you have the

others. And in doing so, you have the ability to reap asymmetric rewards over a small chunk of time. And my take is as the economy shifts to accommodate smarter than human intelligences, the people that understand things like agent harnesses and coding harnesses, the

people that understand how to use the best models in the world like Claude, uh, you know, Opus or or Mythos or whatever the heck we're at now, people that know how to turn these into economically valuable things are the ultimate people that are going to win

this share of the future. Um, whatever

small percentage it is. Because given

the massive unbounded upside here, like we're talking, you know, solar panels orbiting the freaking sun in a few year, like we're we have solar panels, but the point that I'm making is the massive

potential upside of if everything goes right with this technology, if things don't go super wrong. If you own even 0.1% of that potential future because of some decisions that you made today to, you

know, upskill and start this productivity kickoff. um you know like

productivity kickoff. um you know like the the the abundance of your own personal life would would be huge. Okay,

so I guess that's it. We made it to the end of the course and that's really all I had to say on that. Hopefully you guys appreciated learning everything that I had to give on claude code and you guys have learned some advanced concepts here

whether it's about you know initial um system prompts and and and cloudmds or it's some of the more obscure things and esoteric things like security or the future like I just talked about. Um, if

you guys like this sort of thing, you'd be doing me a big solid to subscribe to the channel. For whatever reason,

the channel. For whatever reason, something like 70% of my regular viewers are not subscribed. I think it's just how YouTube works. Most people don't sub, but uh, you could you could sub, that would really help me out. I want to get this sort of message out to more people and obviously help them be in

that small little chunk. Uh, if you do me a solid, leave a comment down below with a video idea or something that you want me to cover. I actually get most of my ideas directly from my audience now, so I'd really appreciate that. If

there's anything that I didn't cover here or maybe didn't touch on that you would like me to touch on or maybe anything that I personally made a mistake on, I' I'd love to hear it because I'm trying to improve my ability to use these tools. Finally, I also help other companies implement this sort of

thing in their own businesses, whether you are a small to mid-size business, mid-market, or enterprise. Um, so if you want to chat with my team, just uh check down below uh somewhere at the top of the description, there'll be a link.

Thank you for making it all the way to the end of the video. I'll see you all soon. by

soon. by

Loading...

Loading video analysis...