I Studied Stripe's AI Agents... Vibe Coding Is Already Dead

By IndyDevDan

Summary

Topics Covered

Agentic Engineering Predicts System Behavior
Dev Boxes Enable Agent Parallelization
Blueprints Merge Code and Agents
Tool Shed Scales Tool Selection

Full Transcript

Are you vibe coding or are you a gentic engineering? The difference is massive.

engineering? The difference is massive.

Keep that question in mind as you look at one of the best engineering teams on the planet to determine if they're vibe coding or agentic engineering. Stripe

engineers are shipping 1,300 pull requests every single week. Get this.

There is zero human written code and they're doing it right. Imagine what

will happen to their insane numbers of 1.9 trillion in total volume up 34% which is the equivalent to 1.6 of global

GDP. Stripe's doing 1 billion this year

GDP. Stripe's doing 1 billion this year and they power all of the best companies that you and I use and you yourself might be running on Stripe as well. What

happens when Stripe multiplies all this with agents? And not just agents, what

with agents? And not just agents, what happens when they multiply it with their custom end-to-end solution they're calling minions, fully unattended coding agents that start from a Slack message

and end in a production ready PR. This

is Stripe's oneshot endto-end coding agent. To me, the minions aren't even

agent. To me, the minions aren't even the interesting part here. The

interesting stat here to me is that their agents operate a code base with millions of lines of code operating a uncommon stack with a number of

homegrown libraries that are unique to Stripe and therefore unknown to LLMs. On top of that, the stakes that Stripe operates in are extremely high. The code

they write moves over 1 trillion per year of payment volume. They have a number of real world dependencies, regulatory and compliance obligations that their code must honor. Now, here's

a simple important question for you. Do

you think Stripe can afford to vibe code? I personally have written millions

code? I personally have written millions of lines of code with agents and without agents. I've been building with agents

agents. I've been building with agents since it was first possible way, way back in the day when we were using GPT 3.5 Turbo. Many engineers don't even

3.5 Turbo. Many engineers don't even know that model exists or once existed.

So allow me to clarify these terms a little bit. Aentic engineering is

little bit. Aentic engineering is knowing what will happen in your system so well you don't need to look. Vibe

coding is not knowing and not looking.

It's very clear stripe engineers are agentic engineering. And in this video

agentic engineering. And in this video we'll break down stripes aentic layer so you can take the best pieces and add it to your agentic systems. Vibe coding is

the lowest hanging fruit. When you

agentic engineer systems just like Stripe has, from the prompt to your skills to your custom agents to your agent harness all the way up through your tech stack, you capitalize on the

greatest opportunity for engineers to ever exist. Agents,

ever exist. Agents, let's look at their agendic system at a high level so that we can analyze the key pieces of their system. If these

components interest you, definitely stick around. We're going to be breaking

stick around. We're going to be breaking down Stripe's key components. And as we do this, you'll see what you have and what you're missing. All right. So, the

first thing is the API layer. They have

a way to communicate to their agents. As

you'll see, they have many ways to do this. Then they have a warm devbox pool.

this. Then they have a warm devbox pool.

What is this? This is an agent sandbox, a space to place their agent. Fantastic.

They then have the agent harness. Stripe

built their agent harness. They forked

it from a tool we'll cover in a second here. And then they have this blueprint

here. And then they have this blueprint engine, the marriage of the old world and the new world, code and agents. This

is super super important. This single

piece has given Stripe a massive edge.

You'll see why in a second. All right.

Then we have the rules file. How did

they manage the context problem? Agents

cannot read their 100 million line of code codebase. So how do they solve that

code codebase. So how do they solve that problem? We'll then talk about the meta

problem? We'll then talk about the meta layer of their tool shed. You can

imagine they have hundreds of tools and tens of services that they want their agents to operate with. How do they solve that problem? They built a tool shed. All right. Then of course they

shed. All right. Then of course they have a way to validate all their agents work. This is a critical validation

work. This is a critical validation layer that they can use to give their agents feedback and to validate that they're not breaking existing working features that's helping them generate

and maintain that movement of that $1 trillion. All right, so we're talking

trillion. All right, so we're talking about real stakes that the Stripe engineers are facing. All right, this is not a green field rapid prototype application. All right, these are

application. All right, these are serious stakes with real world consequences. And then of course you

consequences. And then of course you need a place to review your agents work.

They're using GitHub PRs. Everyone's

using GitHub PRS. This is the standard.

Nothing new here. All right. But these

are the critical pieces. We're going to walk through these piece by piece and understand how they put these together to build their Agentic layer. Let's go

ahead and start with their minions. So

what is Stripe's take on agentic coding?

Let's find out. Aentic coding has gone from new and exciting to table stakes.

Unattended coding agents have gone from possibility to reality. You know, Stripe engineers know what they're talking about because this is true. If you are not agentic coding, the gap between you

and the agent coding team within a week, within a month is going to be astronomical. Okay? It's going to be

astronomical. Okay? It's going to be exponential. This is the last moment to

exponential. This is the last moment to hop on the train. Stripe minions are Stripe's homegrown coding agents.

They're fully unattended, built to oneshot tasks. Thousands of pull

oneshot tasks. Thousands of pull requests merge each week. So one week goes by, Stripe engineers merge a thousand pull requests. Let's just

really understand that scale. All right.

And as I mentioned, they contain no human written code. They realize you have to stop coding to get the real scale, to get the real power out of these agents. You work on the agents,

these agents. You work on the agents, not the application. Right? This is a weird mindset shift that you need to make if you're going to be building with agents. Now, interesting to note here,

agents. Now, interesting to note here, our developers can still plan and collaborate with traditional agent decoding tools, Claude and Cursor. But

in a world where one of our most constrained resources is developer attention, the agents allow for parallelization of tasks. This is super super super critical. All right, they

realize that the most important resource and really any software company's most important resource is your developers time. It's your developer attention. And

time. It's your developer attention. And

when you maximize the leverage your developers get, you can do crazy things like this. They see their engineers

like this. They see their engineers spinning up multiple minions in parallel and able to solve multiple problems at the same time in different conditions.

All right, so this is fantastic. So the

first thing we need to figure out is why they built the minions in the first place. Why did they build it themselves?

place. Why did they build it themselves?

What's the point of this? Isn't cloud

code good enough? Vibe coding a prototype from scratch is fundamentally different from contributing to Stripe's codebase. Okay, interesting. Say more.

codebase. Okay, interesting. Say more.

Stress codebase encompasses hundreds of millions of code across a few large repositories. Okay. Written in Ruby,

repositories. Okay. Written in Ruby, uncommon stack, homegrown libraries, LLMs don't have it baked in, right? It's

not in the models training data. Stakes

are high. Stripe moves over 1 trillion per year in payment volume. As

mentioned, they have real world dependencies and compliance obligations.

LLM agents are really great at building from scratch when there are no constraints on the system. However,

iterating on any codebase of scale, complexity, and maturity is inherently much harder. Very, very true. Engineers

much harder. Very, very true. Engineers

build sophisticated models to make changes inside their large repo. This is

huge. And we talk about this on the channel all the time. Specialization is

how you win. When you're building a great product, it is literally a specialized solution to a specialized problem. So, why would you stop at your

problem. So, why would you stop at your tooling? Your tooling and your code must

tooling? Your tooling and your code must also be specialized. So, this is why they built their own custom agent. It's

because they're solving specific problems in specific ways better than anyone. And again, this is a theme we

anyone. And again, this is a theme we talk about on the channel all the time.

Specialization is your advantage. And in

last week's video, we talked about the PI coding agent because there are many coding agents, but this one is mine. We

emphasized this very idea. You can

customize your prompt. You can customize your skills. You can customize your

your skills. You can customize your custom agents and you can specialize your agent harness. The more you're specializing, the more you're building specific solutions to specific problems,

the bigger your edge is. And the more you distance yourself from the out- of-the-box experiences that a lot of agentic coding tools are driving everyone toward, the better off you're going to be. So, Stripe built minions to

solve their specific problem and to operate their large code base better than anyone. Makes sense, right? Big

than anyone. Makes sense, right? Big

shout out to everyone who shared that video. That one went absolutely viral

video. That one went absolutely viral and for a good reason. Engineers are

realizing that we don't want to be super locked in to a single tool like cloud code or cursor or codeex or whatever.

Every tool is going to have a problem, but the tool that won't have a problem is the one you customize to solve your specific problems better than anyone.

There are many coding agents, but this one is mine. I love this slogan. Let's

see how Stripe customized their minions.

So, what is it like to use a minion?

Right away, we jump into another critical idea. There are several entry

critical idea. There are several entry points for minions. They're designed to integrate as ergonomically as possible where Stripe engineers are. All right,

so they use a CLI, a web interface, and they have Slack. Already they have three points of contact for kicking off their API, I assume, right? They they have a separate application which kicks off

their agents, right? Their pool of agents, but they have multiple ways to interface with that primary service.

Very important. And so we can see here, you know, here's a clear example of an engineer in Stripe using that at symbol at devbox and then they write their prompt to the agent, right? Makes sense.

Nothing new there. Okay. And so we can see here they have a custom UI that they built, right? They have an interface to

built, right? They have an interface to allow them to interface with their custom agent. So you know on the left

custom agent. So you know on the left you can see kind of a typical view. We

have that log of tools and the thought process that their agents go through.

And then on the right we can see that they have all the modified files. So

they can see very very quickly what's going on with that agent. And then of course in the top right here they have their actions. All right, so create pull

their actions. All right, so create pull request. And I'm sure they have some

request. And I'm sure they have some prompt interface here as well. So nice

and simple, very concise. You can see that they're just surfacing the most important information. And this hints at

important information. And this hints at another key aspect of your agents and your agentic system. You need to be able to observe what's going on. Once a task has been completed, a minion will create

a branch, pushes it to CI, and prepares a pull request following Stripe's PR template. And then they're going to

template. And then they're going to request another review from a Stripe engineer. And they can also iterate. So

engineer. And they can also iterate. So

this is a classic end of process setup.

When you're a genta coding, you show up at the beginning and the end during planning and during review and ideally not once in the middle. All right? And

that is what creates an outloop agent coding system. You just write the prompt

coding system. You just write the prompt and you just do the review. There's

inloop agent coding and then there's outloop agent coding. All right? We'll

circle back to that idea in a second. So

how do their minions actually work? A

minion starts in an isolated developer environment or a dev box. Fascinating.

So, this is a concept we've talked about on the channel. They're giving their agents their own environment to operate in, right? Which is the same type of

in, right? Which is the same type of machine that Stripes engineers write on.

This is a simple yet powerful idea. If

you want your agent to do what you can, you must give it the tools and the environment that you have. So, Stripe

realizes this. They reuse their developer setup for their agents. They

give them everything that the engineer has. Super super powerful idea here.

has. Super super powerful idea here.

Dead boxes are pre-warmed, so one can be spun up in 10 seconds. Love that. Uh,

not very fast, but for the machine that they're booting up, which I think they'll mention in a moment, that is very fast. They're booting up full-on

very fast. They're booting up full-on AWS EC2 instances. All right, with Stripe's code and services preloaded, they're isolated. This is a safe space

they're isolated. This is a safe space to place their agent. And they do this so that they can run minions on dev boxes without human permission checks.

Of course, this also gives you parallelization without the overhead of something like Git Work trees, which just falls apart at certain scales. All

right, after some time, the Git Work trees just fall apart. You're going to need your own dedicated device. I have a Mac Mini here as a local personal kind of private device. But recently, I also

just said, "Screw it. I'm going to need more scale." And I started spinning up

more scale." And I started spinning up entire dev boxes for my agents on, you know, use your favorite cloud hosting tool, GCP, AWS, and some of the other ephemeral agent sandbox tools like E2B,

modal, so on so forth. But this is a really big idea, right? Um, the more autonomy you give your agents and the more you set up their environment to be yours, the more they can act and perform as you would. The core agent loop runs

on a fork of blocks coding agent, goose.

One of the first widely used coding agents, which they forked early on. So

shout out Goose. They took this and they customized the orchestration flow in an opinionated way to interle agent loops and deterministic code. Huge huge huge

idea here and they're going to expand on this even more in a moment here with one of the big ideas they talk about later which is their blueprint engine. Okay,

so this is a huge huge huge idea. Let me

just emphasize this. You want to be interle agent loop with deterministic code and what type of operations right we're talking get liners and most importantly testing. Okay, this lets

importantly testing. Okay, this lets your agents, your system operate with feedback. And this gives you the best of

feedback. And this gives you the best of both worlds. You get the deterministic

both worlds. You get the deterministic world and the non-deterministic reasoning creativity world. And they

explicitly say that here they run a mix of creativity of the agent with asurances that they'll always complete stripe specific steps like llinters. So

here we have stripe agentic engineering determinism with agents. All right. So a

couple additional things to note here.

connected to MCP. They use cursor and clawed code and some conditions. They

operate agent rule files. We'll talk

about that more in a second. This solves

the large context problem for Stripe.

All agent rules are conditionally applied based on subdirectories. Super

important. They have MCP as I mentioned.

They have this tool shed idea which is basically a meta tool to help them select one or more of their 400 MCP tools. Okay. A really big piece of why

tools. Okay. A really big piece of why this blog post is so incredible, you know, shout out to the Stripe engineers, shout out to Alistar Gray, is the fact that they're operating at such a massive scale, at such success, and they're

still gaining massive value from their agents and from their agentic layer that they're building. All right, managers

they're building. All right, managers are built with a goal of oneshotting, but if they don't, the key is to give them feedback. Key key idea. Two more

them feedback. Key key idea. Two more

ideas here. We seek to shift feedback left when thinking about developer productivity. The best thing for humans

productivity. The best thing for humans and agents is basically you want the issues to happen earlier rather than later, right? On the engineers's device,

later, right? On the engineers's device, on the agents device as early in the process as possible. All right? And then

if local testing doesn't catch anything, they have a whole suite of tests over 3 million tests that run upon push. Key

idea here, they figured out a way to selectively run tests on push. All

right? And they're choosing from many of 3 million tests. Okay? And this is going to, as you can imagine, offer feedback to their agentic system. Now, here's

something that I would critique Stripe on a little bit here. Due to the cost constraints, they only let their minion run at most two rounds of CI. All right,

so you can imagine at this scale that they have to just limit this for it to be costefficient. This is where I would

be costefficient. This is where I would push back a little bit. We'll talk about that later, but this is a interesting thing here, right? So, they basically limited the rounds of feedback for their minion to just two. This is part one of their blog. Let's look at part two and

their blog. Let's look at part two and dig into some of the details of some of these key nodes, right? Specifically,

their agent sandbox and their powerful blueprint engine because their blueprint engine sits at the center of how they operate their strike minions at scale.

So, here's part two, dev boxes hot and ready. So for maximum effectiveness,

ready. So for maximum effectiveness, their minion agents requires a cloud developer environment that's paralyzable, predictable, and isolated.

So this is very clearly an agent sandbox. Okay, it gives them a place to

sandbox. Okay, it gives them a place to operate at scale with full autonomy. And

if something goes wrong, if they destroy something, the agent can't cause as much damage as they could if they were operating your device or god forbid a device connected to the production system. And I completely agree.

system. And I completely agree.

Containerization, get work trees, they're great, but they have hard limits and it's hard to really really scale without giving each agent their own device, right? Again, if you want your

device, right? Again, if you want your agent to perform like you, give them the tools that you have. All right. What

else can we learn about Stripes Devbox here? So, very cool. Stripes Devbox is a

here? So, very cool. Stripes Devbox is a full-on computer, right? It's an EC2 instance and it contains their source code and services under development.

Very, very cool. Many engineers use one dev box per tasks and this means that every engineer might have half a dozen running at a time. Check out how awesome this is, right? They're allowing their

engineers to scale their impact by allowing parallelization of their agents and every agent has their own sandbox.

Okay, so a question I would ask them is, do their minions have access to additional minion sub agents or not even sub agents, other primary agents that are specialized across their code base?

Very cool stuff here. This is again part of their agentic system, right? It's

giving and servicing scale very very quickly so that engineers can knock out more problems than ever before. All

right, we want it to feel effortless to spin up new dev boxes. Right, ready in 10 seconds. Hot and ready. So fantastic.

10 seconds. Hot and ready. So fantastic.

The raw pieces of engineering should feel effortless. You want to be building

feel effortless. You want to be building systems that allow you to move at the agentic speed, the speed of agents.

Something kind of funny happened to me the other day while I was reading this blog. Actually, I'll throw the image on

blog. Actually, I'll throw the image on the screen. I had to save it. The

the screen. I had to save it. The

agentic speed is just insane. Your

agents can process information much, much faster than you can. I was reading through this blog, you know, took me maybe, you know, 20 minutes to read through part one and part two, take notes on this. I also spun up a cloud

code agent to read the blog alongside me. It read the whole thing, of course,

me. It read the whole thing, of course, in what was it, 5 seconds. And so I just had like a really funny interaction where uh you know I was shocked and then you know I said something and the agent

literally said nothing. It was the first time I've ever had my agent respond with nothing. It was just a really

nothing. It was just a really interesting interaction point. And and

this is the agentic speed, right? It's

this multiplied by every single agent you can spin up. Your agents can read, they can code, they can engineer at agentic speed. So you need to build the

agentic speed. So you need to build the system that allows you to tap into that.

And you can see here Stripe is doing that with their powerful dev boxes that spin up in just 10 seconds and it somehow sets up their entire gigantic repository. Millions of lines of code,

repository. Millions of lines of code, tens and thousands, probably hundreds of thousands of files. All right. And you

know, props to Stripe. We built out dev boxes for the needs of human engineers long before LLM coding agents. As it

turns out, parallelism, predictability, isolation were very, very good properties for engineers as well as agents. Fantastic. We're almost at the

agents. Fantastic. We're almost at the blueprint, which is a really, really big idea. But let's talk about their agent a

idea. But let's talk about their agent a little bit more, right? So, they built this on their own. They forked Goose.

Let me be clear about that. They forked

Goose and then they customized it to work within Stripe's LLM infrastructure.

Okay? So, you can imagine they have custom prompts, custom skills, custom agents, and then they customize the agent harness. All right? And again,

agent harness. All right? And again,

this was the big idea we talked about in last week's video. I'll leave that linked in the description for you if you're interested. Customizing your

you're interested. Customizing your Aentic harness gives you a massive edge.

You can do it your way. You can build it to fit the needs of your specific problem. Okay, once again, I want to

problem. Okay, once again, I want to beat this idea over the head.

Specialization is the advantage of every engineer. Now, you can build specialized

engineer. Now, you can build specialized solutions, specialized developer tools to help you solve your problems at the agentic speed. Okay, the speed of

agentic speed. Okay, the speed of agents, not the speed of humans. And so

they focus their use on the needs of minions rather than human supervised tools. And this is another big idea we

tools. And this is another big idea we need to double click into. That's the

use case well filled by third-party tools such as cursor and claw code.

Okay, which are made readily available for our engineers. So a couple things here. They're not limiting, they're not

here. They're not limiting, they're not forcing their engineers to use any specific tooling. That's a terrible idea

specific tooling. That's a terrible idea in general. But what they are doing is

in general. But what they are doing is building two types of agent coding tools. Inloop and outloop. I've talked

tools. Inloop and outloop. I've talked

about this on the channel before. This

is a critical idea to get right if you want to do more with your agentic engineering. When you are in loop

engineering. When you are in loop agentic coding, your butts in the seat at your desk and you're prompting back and forth and back and forth and back and forth. This is great for highly

and forth. This is great for highly specialized work. This is great for when

specialized work. This is great for when you're building the system that builds the system, but this is bad for everything else. Okay? Uh as a general

everything else. Okay? Uh as a general rule, I recommend to engineers now that you spend more than 50% of your time building the system of agents that build your application for you. That's inloop,

right? And that's really the value prop of inloop. You get full control. You can

of inloop. You get full control. You can

see everything. It's very manual, but it is very slow and expensive. You're using

human engineered time. Then there's

outloop agent coding. And this is what Stripe's minions offer Stripe. Okay,

they are building an Outloop system that operates at scale in parallel in the dev box, right? In dedicated agent

box, right? In dedicated agent sandboxes. This is a big big idea. Why

sandboxes. This is a big big idea. Why

is that? It's because now instead of having one engineer with one terminal or one engineer with three terminals, you can have one engineer with six agent sandboxes operating and solving problems

at scale in parallel, right? And six is just the beginning of this. The whole

idea here is that you should be handing off more work over time to your Outloop system. If you're building a great

system. If you're building a great agentic layer, if you're building a great system that has agents operating your services for you, you should slowly be handing off more work to them. Okay?

And that saves you from the expensive time that you'll spend. And you know, never forget your time is your most important resource. It is constantly

important resource. It is constantly running out. Okay? Let me just be super

running out. Okay? Let me just be super clear about that. But your your agentic systems, you can clone, you can dupe, you can parallelize these as far as your system allows you to. All right? And

that's the lever that agentic engineering unlocks. If you build the

engineering unlocks. If you build the system that builds the system, you get massive, massive reasoning at scale. you

get access to intelligence that engineers your way and at some point better than your way. But uh that's key.

So I just wanted to to really dial into that. Minions give Stripe engineers

that. Minions give Stripe engineers access to outloop agentic coding. Very

very powerful. And so they talk about specialization a little bit more. And

you know they're really hitting on this this idea I just mentioned there.

Offtheshelf local coding agents are usually optimized for workflows where the engineer is sitting looking over his shoulder, right? And I just call this

shoulder, right? And I just call this babysitting the agent. Minions are fully unattended and so their agent harness can't use humanfacing features. Okay,

they built the minion to be fully autonomous, right? They're built so that

autonomous, right? They're built so that humans cannot interject. That's not the point, right? The point is that they

point, right? The point is that they operate on their own. Again, inloop

agent coding, outloop agentic coding.

Cloud code minions. Okay? And just to emphasize it once again, you know, cloud code has the ability cursor has the cursor CLI. And of course, there are

cursor CLI. And of course, there are great tools we've covered on the channel like pi.dev dev or you can programmatically inject these into your Outloop systems, right? You can deploy an agent outside the loop and have them

run on a cron job, have them run via an API request, so on and so forth. All

right, that is where all agent engineers must move to get massive leverage. You

can see stripes engineers using minions to do just that. All right, so uh they talk about permissions. Uh let's focus on the big idea here right next to dev boxes. The next most important thing

boxes. The next most important thing here for sure is their blueprint engine.

So let's talk about this thing. So what

is this? So they talk about workflows versus agents. They talk about loops.

versus agents. They talk about loops.

They talk about, you know, series of steps, which is like the workflow. This

is what a lot of prompts and skills actually are. They're just steps that

actually are. They're just steps that you want to work through. Sometimes you

have an agent that's actually doing some intelligent reasoning, right? Loop with

tools. But you can do much better than that. You can push a lot further. And

that. You can push a lot further. And

that's exactly what Stripe has done.

Minions are orchestrated with a primitive we call blueprints. Blueprints

are workflows designed in code that direct a minion run. Okay. And then they go on to say blueprints combine the determinism of workflows with agents

flexibility in dealing with the unknown.

What is this? Every tactical agent coding member knows this as an ADW, an AI developer workflow. This is the past and the future. This is code plus your agent. Okay, this is the highest

agent. Okay, this is the highest leverage point of agent coding is when you put these two together. You have

step-by-step workflows that have determinism and non-determinism put together. In essence, a blueprint is

together. In essence, a blueprint is like a collection of Asian skills interwoven with deterministic code so that particular subtasks can be handled most appropriately. Okay, there are some

most appropriately. Okay, there are some things like a llinter for instance or like a git commit or a whole number of things, right? Running tests, creating

things, right? Running tests, creating certain structures, creating certain templates, certain reusable pieces, certain hard deterministic code pathways. There are certain pieces that

pathways. There are certain pieces that an agent would perform worse in. Adding

an agent to specific steps actually makes the whole system worse, more brittle, and more expensive, frankly.

So, for these steps, why would you throw an agent at that problem? Right? The

real advantage that Stripe has completely identified here with blueprints is the fact that agents plus code beats agents alone and agents plus code beats code alone. That's the big

idea here. So, Alistar goes on to break

idea here. So, Alistar goes on to break this concept down here. You have the agent call here implement the task fix the CI failures whatever but you also have the actual nodes run configuration

lenders push changes which are fully deterministic okay they don't invoke an LM at all they just run code so imagine you know some toptobottom process where you have agent running and then you have code running and then you have agent

running and then you have code running right so on and so forth this is what you want to build right it's this it's the combination of both the agent and your code okay because not everything needs an agent and not Everything needs

code. Okay, very very powerful idea

code. Okay, very very powerful idea here. Another advantage of creating

here. Another advantage of creating these blueprints of combining code plus agents is that their blueprint machinery makes contact engineering with sub agents easy. Why is that? It's because

agents easy. Why is that? It's because

they're operating at a specific step.

And so at that step, you might constrain the tools, you might constrain the system prompt, right? Or you might modify the conversation required by the subtask at hand. Okay? And again, we're hitting on this idea of specialization.

There are specific steps in your engineering, in your product, in your tool that you've uniquely implemented.

Okay? And so when you can break that down into determinism or in a gentic process step by step, this allows you to specialize, right? And so, you know,

specialize, right? And so, you know, once again, what are we doing? We're

back at foundational engineering. If

you're trying to tackle a big problem, chunk it up into small pieces. every big

problem is just a you know a few small problems put together and then chunk those problems into types and then give it to code or give it to agents. Okay,

that's what their blueprint system is effectively doing. To me, this is the

effectively doing. To me, this is the highest leverage point. This is what makes their agentic layer, their agentic system so powerful. It's the combination

of code and agents inside of a repeatable format for success. Okay?

Because guess what they can do? They can

now deploy meta aentics. they can

effectively create an agent that builds their blueprint just in the right way and then they can validate it, right?

They I wouldn't be surprised if they had a blueprint for creating blueprints. All

right. Anyway, let's move into context.

So, they use the rule files setup that you know is much like claude.md or

agents.md due to the size of the repository they can't have unconditional job rules. So, they need a specific

job rules. So, they need a specific solution to do this. They're using a standardized rule format much like cursors. All right. So this is a rule

cursors. All right. So this is a rule format that looks like this. So you have your primary directory whatever tool you're using you know tries to claim that name and then you have / rules and

then you have some markdown files right but the interesting part is that you have a markdown file with some front matter. All right, front matter is going

matter. All right, front matter is going to be you know MDC files are like the most popular file format and for good reason right so they have these rules here where you can specify the glob

pattern in which to activate this context or you know a specific subset of this context and then they have rule anatomy you can imply intelligently or you can apply only when specific files

are being accessed. All right. And so

this gives you more control over the context that's loaded as you're accessing different directories throughout your codebase. Okay. And so

this is the structure that Stripe Minions use, right? And the big line is right here. We almost exclusively give

right here. We almost exclusively give minions context from files that are scoped to specific subdirectories or patterns automatically attached as the agent traverses the file system. And

they're using the, you know, kind of cursor rules to do that. And so they've combined it with a format from cloud code. Once again here you can see that

code. Once again here you can see that they're building customized agentic solutions that best solves the problems they're facing. Okay and they're

they're facing. Okay and they're combining the best for the industry. I'm

not saying that you know cursor agents or claw code agents or how they do things is wrong. That's not the point.

There are many ways to do things. The

question is what's the best way for you and how do you get the most leverage out of what's available? We can see stripe engineers doing exactly that. Last

important idea to mention here is Stripes gathering MCPs. Right. So what

are and and how does Stripe put together the tools? So as we all know tools are

the tools? So as we all know tools are an essential element of the core for context model prompt tools. Tools is

what created agentic coding, right? It's

the only reason that any of this is possible because our agents can now use tools to take actions as we can. So how

does Stripe handle their 500 MCP tools?

Won't this immediately cause a token explosion? Absolutely right. It totally

explosion? Absolutely right. It totally

would. What they've done here is they've built a tool shed. They built a centralized internal MCP server called a tool shed which makes it easy for Stripe engineers to make new tools and they're automatically discoverable in their

agentic systems. Very very powerful stuff here. Okay. All very agentic

stuff here. Okay. All very agentic systems are able to use the tool shed. I

want to be super clear about this. We're

talking about meta agentics. This is

something that keeps coming up over and over. You build prompts that create

over. You build prompts that create prompts. You make agents that build

prompts. You make agents that build agents. You have skills that build

agents. You have skills that build skills. You have tools that allow you to

skills. You have tools that allow you to select tools. Okay. The tool shed is a

select tools. Okay. The tool shed is a tool that unlocks tools for their agents. Okay, so these are called

agents. Okay, so these are called metaagentics and they're a powerful way to solve the class of problems, right?

To to solve repeat problems in the space of agents. And you know, to be clear,

of agents. And you know, to be clear, this is not new at all, right? OG

engineers watching, you've heard of like things like meta programming, right?

Passing functions into functions. This

is not a new phenomenon, but what is new and what's really important for you and I to focus on when we're building out these powerful agentic layers is to think about when we need to build the thing that builds the thing, right? So,

Stripe uses a tool shed to create and connect to over 500 or nearly 500 MCP tools. Okay? Very, very powerful. And

tools. Okay? Very, very powerful. And

you can imagine they have all types of internal and external services that they want to connect to. And the tool shed lets them do that. This was completely net new to me. I had not seen a concept like this before. I think this is really

cool. A tool shed centralized location

cool. A tool shed centralized location to load specific tools. So, you know, big shout out to the team for uh for building something like this. And then

lastly, you know, one of the big ideas they talk about and that's just super critical for engineering. Like this is just great engineering. You just

iterate, right? All this stuff is so new. All this stuff is moving so

new. All this stuff is moving so quickly. You and I strip engineers.

quickly. You and I strip engineers.

Doesn't really matter who you are. It's

not about what you can do anymore. It's

about what you can teach your agents to do for you. Okay? This is a big idea.

It's one of the central thesis we talk about in tactical agentic coding in addition to building agentic layers like handing off work and thinking about your agents as tools that you're templating

into and templating your engineering into. That's the name of the game,

into. That's the name of the game, right? Teach your agents how to build

right? Teach your agents how to build like you would so you can scale them to the moon. All right, so what else do we

the moon. All right, so what else do we have here? A lot of really great ideas.

have here? A lot of really great ideas.

I'm curious what you think if you've operated in code bases with more than 10,000 files. Comment down below what

10,000 files. Comment down below what would you rank Stripe's agentic layer based on everything we've gone through here and you know our highle understanding of their system right they have multiple CI entry points they have

EC2 agent sandboxes that mirror developer environments they have their own custom agent harness they have a customizable blueprint engine that lets them combine code and agents together to

outperform either they have rules file for context engineering they have tool shed for selecting one of 500 tools or many of 500 tools tools. They of course have CI for self validation and they

have GitHub PRs to review the work their agents have done on their dedicated agent sandboxes. All right, so rank

agent sandboxes. All right, so rank this. I'm super curious what you think.

this. I'm super curious what you think.

Rank Stripes agentic layer out of 10.

I'm going to go ahead and give them and and you know again if you've worked on code bases that are larger than 10,000 files, no offense, guys, but I don't want to hear a vibe coders opinion on Stripe's endto-end system. But for mid to senior level plus engineers, I'd love

to hear what you think. I'm going to give Stripe an eight out of 10. Okay, so

very very very powerful agent layer. And

let me be super clear here. I have no ego in this. Let me say it this way. I

cannot solve Stripe's programmable financial infrastructure problems better than any one of their engineers on their team could. They own that problem in

team could. They own that problem in this problem space. So that's not what I'm saying at all. They are the experts there for sure. My expertise is in agentic engineering. It's in building

agentic engineering. It's in building agentic layers. And so I only have two

agentic layers. And so I only have two notes of of feedback for them here that I would pitch to them as potential improvements. The first thing is this.

improvements. The first thing is this.

you know, they they identify this right away. Why only two rounds of feedback in

away. Why only two rounds of feedback in their CI for their agents? Okay. And so

they say, you know, speed, completeness, cost, time, compute, blah blah blah.

These are fair constraints and reasons to only run two rounds. But I think this is a mistake, frankly. Think about

yourself as an engineer. Has anyone ever said to you, "Solve this problem. You

have two attempts."

All right? You just have two shots at this. Uh, no. No one said that. Right?

this. Uh, no. No one said that. Right?

It often takes us tens and hundreds of times to get something right. So, I

think limiting their minions to just two shots is potentially going to cost them more developer time and also increase the gap between the next learning of how

to improve their agentic system by letting their agents run more, right?

Like, I think the learnings you get from running five rounds of your agent is going to be a lot more informative than running just two. All right, but I could be totally wrong. Again, they know their system better than we do. All right, but

that's my first note. And my last note here is in the language of their minions. So, you know, they call these

minions. So, you know, they call these end to-end agents, but you might have noticed they have a prompt step and they have a review step. Okay, that's two

steps. End to end is this, right? And

steps. End to end is this, right? And

you take out the review, right? It's

prompt to production, P2P. Okay? And

this is something we've talked about inside of Tactical Agentic Coding. This

is the northstar for all agentic engineers. This is an idea, a concept

engineers. This is an idea, a concept called ZTE, zero touch engineering, prompt to production. No review, no human in the loop. I want to be a little critical about their language here. I

know that this is industry standard and of course, again, of course, they're operating on a scale most of us engineers will never get to, but that's what I would push Stripe to think about

next. What are the lowlevel simple

next. What are the lowlevel simple tasks? maybe some lower risk tasks,

tasks? maybe some lower risk tasks, developer tools stuff, you know, some non-userfacing stuff and even some userfacing stuff that they could ship actually end to end. And the value isn't in doing it. It's in answering the

question, what would it take for you to run a prompt and trust that your agentic system can deliver this to production without human oversight, right? The

value is in the journey of the question.

So, that's that's another area where I would just like really try to push the Stripe engineers to that next next level. Um, I made a prediction on this

level. Um, I made a prediction on this at the end of last year in our 2026 top 2% engineering video. I think in 2026 we're going to see a blog post very similar to this where an engineer

operating at serious scale, we're talking tens of millions in revenue. I

predict we're going to see a blog post where they break down their agentic layer and talk about how they ship from prompt to production with ZTE zero touch engineering. So those are the only two

engineering. So those are the only two notes I have. Again, you know, I'm not trying to Stripe has some of the most cracked engineers on the planet. This is just a note on the agentic system and not a

note on any of their true core domain problem because again, if you operate a specific domain for years and years, no one knows how to solve it better than you do. All right, so those are my two

you do. All right, so those are my two notes. Big shout out to the Strap

notes. Big shout out to the Strap Engineering team and you know, Alistar Gray for writing this up. This is a great post. This really caught my eye

great post. This really caught my eye and I thought it would be valuable to share with you here because it really emphasizes that point that building a powerful agentic layer really comes down

to owning all the pieces bottom to top.

Now there is a point in which you want to start owning your agentic technology, right? And again, if you're like

right? And again, if you're like creating a brand new net new product, you probably don't need to do that for a while. You just don't have the scale for

while. You just don't have the scale for it out of the box. It's going to work for you for a while. But then there's going to be a point where you're going to need a specific solution, right? A

customized solution to solve a specific problem. And you want to boil that all

problem. And you want to boil that all the way down just like your application is a is a, you know, a detailed edge case covering solution. Your agent

should reflect that too. That's why we covered the PI coding agent. There are

many, but this one is mine. And the

whole idea here that I want to, you know, connect with you on is that specialization goes all the way up the chain, all the way into the agent harness, all the way to your stack of technology that you operate. So anyway,

big shout out to Stripe Engineers. This

was really fun. I like blogs like this.

You know, frankly, I'm getting a bit tired of everyone hyperfixating on models and prompts and skills. Like

let's let's uplevel this and talk about the systems that have agents inside of them that contain agents and that contain code and that contain, you know, modern engineering technology that puts

it all together to generate real value for you, your team, your company, and ultimately your users and customers, right? Because that's where the value

right? Because that's where the value really is. That's what makes all this

really is. That's what makes all this stuff actually matter at all. All right?

If you're still watching, first off, you know, big thanks to you. I hope these ideas make sense. You really want to be thinking about the agentic layer as a whole, not just your coding tool, not just the models. Let's let's ease up on

the obsession on these, you know, models and who's winning and what genera company is more just let's focus on solving problems by building agentic layers with the key pieces. All right?

And Stripe has outlined a lot of them, right? Like every agentic layer, every

right? Like every agentic layer, every product is going to run into the problems that each one of these nodes is a solution to. So let's pay attention to them, right? Let's think about how these

them, right? Let's think about how these are pieces to the puzzle of building at scale with agents. All right? And this

is just one interpretation. Uh no one has all the answers right now. But it's

about collecting the right context to solving the problem of agentic engineering. Right? And and pushing what

engineering. Right? And and pushing what you can do further beyond before the industry before the mainstream catches up. All right. Everything we do in

up. All right. Everything we do in engineering represents an asymmetry of information and then technology and then results with your product, with your tool, with your team, so on and so forth. All right? So, you want to be

forth. All right? So, you want to be pushing forward on this stuff. Don't let

up the gas. Stay focused on valuable information like this blog and, you know, me being biased, but like this channel. I really try to focus in on

channel. I really try to focus in on concrete signal in the industry, not hype, not slop. There's going to be a lot of both of those as we move week after week. But I want this to be a

after week. But I want this to be a place where you, the engineer, can come to focus and get some serious insight on how you can continue to win in the age of agents. If you made it to the end and

of agents. If you made it to the end and you like this content, definitely feel free to check out tactical agentic coding. This is my take on how to scale

coding. This is my take on how to scale far beyond AI coding and vibe coding with advanced agentic engineering so powerful your codebase runs itself. As

you can imagine, a lot of the ideas detailed in this blog, detailed in the architecture of how Stripes built their agentic coding tool has been detailed in here. All right, I'll I'll be honest,

here. All right, I'll I'll be honest, I'm not like gloating or anything. I've

been early to this. This is what happens when you're a first mover, when you bet big on an emerging technology.

Everything you're going to see over the next year, I have in tactical Agenta coding and Agenta Horizon, the second part of this course detailed here. So,

if you're interested, I'm going to leave a link to this. You can see all the ideas are really in stone here and thousands of engineers, some of your favorite engineers mind you, are inside of this course, have taken this course

and have gotten massive value and are moving ahead of the curve. So, I'm going to leave this in here, link in the description for you. Of course, I'm also going to link the minions post.

Definitely give this a look and I'll bet that if we search hiring. Yeah, so

Stripe is hiring. If you're an agent that's interested in this, you can tell them that Andy Deb Dan sent you if you want. And again, just big shout out to

want. And again, just big shout out to the Stripe team. This is really great stuff. really great engineering in the

stuff. really great engineering in the age of agents. No matter what, stay focused and keep building.

Loading...

Loading video analysis...