I Studied Stripe's AI Agents... Vibe Coding Is Already Dead
By IndyDevDan
Summary
Topics Covered
- Agentic Engineering Predicts System Behavior
- Dev Boxes Enable Agent Parallelization
- Blueprints Merge Code and Agents
- Tool Shed Scales Tool Selection
Full Transcript
Are you vibe coding or are you a gentic engineering? The difference is massive.
engineering? The difference is massive.
Keep that question in mind as you look at one of the best engineering teams on the planet to determine if they're vibe coding or agentic engineering. Stripe
engineers are shipping 1,300 pull requests every single week. Get this.
There is zero human written code and they're doing it right. Imagine what
will happen to their insane numbers of 1.9 trillion in total volume up 34% which is the equivalent to 1.6 of global
GDP. Stripe's doing 1 billion this year
GDP. Stripe's doing 1 billion this year and they power all of the best companies that you and I use and you yourself might be running on Stripe as well. What
happens when Stripe multiplies all this with agents? And not just agents, what
with agents? And not just agents, what happens when they multiply it with their custom end-to-end solution they're calling minions, fully unattended coding agents that start from a Slack message
and end in a production ready PR. This
is Stripe's oneshot endto-end coding agent. To me, the minions aren't even
agent. To me, the minions aren't even the interesting part here. The
interesting stat here to me is that their agents operate a code base with millions of lines of code operating a uncommon stack with a number of
homegrown libraries that are unique to Stripe and therefore unknown to LLMs. On top of that, the stakes that Stripe operates in are extremely high. The code
they write moves over 1 trillion per year of payment volume. They have a number of real world dependencies, regulatory and compliance obligations that their code must honor. Now, here's
a simple important question for you. Do
you think Stripe can afford to vibe code? I personally have written millions
code? I personally have written millions of lines of code with agents and without agents. I've been building with agents
agents. I've been building with agents since it was first possible way, way back in the day when we were using GPT 3.5 Turbo. Many engineers don't even
3.5 Turbo. Many engineers don't even know that model exists or once existed.
So allow me to clarify these terms a little bit. Aentic engineering is
little bit. Aentic engineering is knowing what will happen in your system so well you don't need to look. Vibe
coding is not knowing and not looking.
It's very clear stripe engineers are agentic engineering. And in this video
agentic engineering. And in this video we'll break down stripes aentic layer so you can take the best pieces and add it to your agentic systems. Vibe coding is
the lowest hanging fruit. When you
agentic engineer systems just like Stripe has, from the prompt to your skills to your custom agents to your agent harness all the way up through your tech stack, you capitalize on the
greatest opportunity for engineers to ever exist. Agents,
ever exist. Agents, let's look at their agendic system at a high level so that we can analyze the key pieces of their system. If these
components interest you, definitely stick around. We're going to be breaking
stick around. We're going to be breaking down Stripe's key components. And as we do this, you'll see what you have and what you're missing. All right. So, the
first thing is the API layer. They have
a way to communicate to their agents. As
you'll see, they have many ways to do this. Then they have a warm devbox pool.
this. Then they have a warm devbox pool.
What is this? This is an agent sandbox, a space to place their agent. Fantastic.
They then have the agent harness. Stripe
built their agent harness. They forked
it from a tool we'll cover in a second here. And then they have this blueprint
here. And then they have this blueprint engine, the marriage of the old world and the new world, code and agents. This
is super super important. This single
piece has given Stripe a massive edge.
You'll see why in a second. All right.
Then we have the rules file. How did
they manage the context problem? Agents
cannot read their 100 million line of code codebase. So how do they solve that
code codebase. So how do they solve that problem? We'll then talk about the meta
problem? We'll then talk about the meta layer of their tool shed. You can
imagine they have hundreds of tools and tens of services that they want their agents to operate with. How do they solve that problem? They built a tool shed. All right. Then of course they
shed. All right. Then of course they have a way to validate all their agents work. This is a critical validation
work. This is a critical validation layer that they can use to give their agents feedback and to validate that they're not breaking existing working features that's helping them generate
and maintain that movement of that $1 trillion. All right, so we're talking
trillion. All right, so we're talking about real stakes that the Stripe engineers are facing. All right, this is not a green field rapid prototype application. All right, these are
application. All right, these are serious stakes with real world consequences. And then of course you
consequences. And then of course you need a place to review your agents work.
They're using GitHub PRs. Everyone's
using GitHub PRS. This is the standard.
Nothing new here. All right. But these
are the critical pieces. We're going to walk through these piece by piece and understand how they put these together to build their Agentic layer. Let's go
ahead and start with their minions. So
what is Stripe's take on agentic coding?
Let's find out. Aentic coding has gone from new and exciting to table stakes.
Unattended coding agents have gone from possibility to reality. You know, Stripe engineers know what they're talking about because this is true. If you are not agentic coding, the gap between you
and the agent coding team within a week, within a month is going to be astronomical. Okay? It's going to be
astronomical. Okay? It's going to be exponential. This is the last moment to
exponential. This is the last moment to hop on the train. Stripe minions are Stripe's homegrown coding agents.
They're fully unattended, built to oneshot tasks. Thousands of pull
oneshot tasks. Thousands of pull requests merge each week. So one week goes by, Stripe engineers merge a thousand pull requests. Let's just
really understand that scale. All right.
And as I mentioned, they contain no human written code. They realize you have to stop coding to get the real scale, to get the real power out of these agents. You work on the agents,
these agents. You work on the agents, not the application. Right? This is a weird mindset shift that you need to make if you're going to be building with agents. Now, interesting to note here,
agents. Now, interesting to note here, our developers can still plan and collaborate with traditional agent decoding tools, Claude and Cursor. But
in a world where one of our most constrained resources is developer attention, the agents allow for parallelization of tasks. This is super super super critical. All right, they
realize that the most important resource and really any software company's most important resource is your developers time. It's your developer attention. And
time. It's your developer attention. And
when you maximize the leverage your developers get, you can do crazy things like this. They see their engineers
like this. They see their engineers spinning up multiple minions in parallel and able to solve multiple problems at the same time in different conditions.
All right, so this is fantastic. So the
first thing we need to figure out is why they built the minions in the first place. Why did they build it themselves?
place. Why did they build it themselves?
What's the point of this? Isn't cloud
code good enough? Vibe coding a prototype from scratch is fundamentally different from contributing to Stripe's codebase. Okay, interesting. Say more.
codebase. Okay, interesting. Say more.
Stress codebase encompasses hundreds of millions of code across a few large repositories. Okay. Written in Ruby,
repositories. Okay. Written in Ruby, uncommon stack, homegrown libraries, LLMs don't have it baked in, right? It's
not in the models training data. Stakes
are high. Stripe moves over 1 trillion per year in payment volume. As
mentioned, they have real world dependencies and compliance obligations.
LLM agents are really great at building from scratch when there are no constraints on the system. However,
iterating on any codebase of scale, complexity, and maturity is inherently much harder. Very, very true. Engineers
much harder. Very, very true. Engineers
build sophisticated models to make changes inside their large repo. This is
huge. And we talk about this on the channel all the time. Specialization is
how you win. When you're building a great product, it is literally a specialized solution to a specialized problem. So, why would you stop at your
problem. So, why would you stop at your tooling? Your tooling and your code must
tooling? Your tooling and your code must also be specialized. So, this is why they built their own custom agent. It's
because they're solving specific problems in specific ways better than anyone. And again, this is a theme we
anyone. And again, this is a theme we talk about on the channel all the time.
Specialization is your advantage. And in
last week's video, we talked about the PI coding agent because there are many coding agents, but this one is mine. We
emphasized this very idea. You can
customize your prompt. You can customize your skills. You can customize your
your skills. You can customize your custom agents and you can specialize your agent harness. The more you're specializing, the more you're building specific solutions to specific problems,
the bigger your edge is. And the more you distance yourself from the out- of-the-box experiences that a lot of agentic coding tools are driving everyone toward, the better off you're going to be. So, Stripe built minions to
solve their specific problem and to operate their large code base better than anyone. Makes sense, right? Big
than anyone. Makes sense, right? Big
shout out to everyone who shared that video. That one went absolutely viral
video. That one went absolutely viral and for a good reason. Engineers are
realizing that we don't want to be super locked in to a single tool like cloud code or cursor or codeex or whatever.
Every tool is going to have a problem, but the tool that won't have a problem is the one you customize to solve your specific problems better than anyone.
There are many coding agents, but this one is mine. I love this slogan. Let's
see how Stripe customized their minions.
So, what is it like to use a minion?
Right away, we jump into another critical idea. There are several entry
critical idea. There are several entry points for minions. They're designed to integrate as ergonomically as possible where Stripe engineers are. All right,
so they use a CLI, a web interface, and they have Slack. Already they have three points of contact for kicking off their API, I assume, right? They they have a separate application which kicks off
their agents, right? Their pool of agents, but they have multiple ways to interface with that primary service.
Very important. And so we can see here, you know, here's a clear example of an engineer in Stripe using that at symbol at devbox and then they write their prompt to the agent, right? Makes sense.
Nothing new there. Okay. And so we can see here they have a custom UI that they built, right? They have an interface to
built, right? They have an interface to allow them to interface with their custom agent. So you know on the left
custom agent. So you know on the left you can see kind of a typical view. We
have that log of tools and the thought process that their agents go through.
And then on the right we can see that they have all the modified files. So
they can see very very quickly what's going on with that agent. And then of course in the top right here they have their actions. All right, so create pull
their actions. All right, so create pull request. And I'm sure they have some
request. And I'm sure they have some prompt interface here as well. So nice
and simple, very concise. You can see that they're just surfacing the most important information. And this hints at
important information. And this hints at another key aspect of your agents and your agentic system. You need to be able to observe what's going on. Once a task has been completed, a minion will create
a branch, pushes it to CI, and prepares a pull request following Stripe's PR template. And then they're going to
template. And then they're going to request another review from a Stripe engineer. And they can also iterate. So
engineer. And they can also iterate. So
this is a classic end of process setup.
When you're a genta coding, you show up at the beginning and the end during planning and during review and ideally not once in the middle. All right? And
that is what creates an outloop agent coding system. You just write the prompt
coding system. You just write the prompt and you just do the review. There's
inloop agent coding and then there's outloop agent coding. All right? We'll
circle back to that idea in a second. So
how do their minions actually work? A
minion starts in an isolated developer environment or a dev box. Fascinating.
So, this is a concept we've talked about on the channel. They're giving their agents their own environment to operate in, right? Which is the same type of
in, right? Which is the same type of machine that Stripes engineers write on.
This is a simple yet powerful idea. If
you want your agent to do what you can, you must give it the tools and the environment that you have. So, Stripe
realizes this. They reuse their developer setup for their agents. They
give them everything that the engineer has. Super super powerful idea here.
has. Super super powerful idea here.
Dead boxes are pre-warmed, so one can be spun up in 10 seconds. Love that. Uh,
not very fast, but for the machine that they're booting up, which I think they'll mention in a moment, that is very fast. They're booting up full-on
very fast. They're booting up full-on AWS EC2 instances. All right, with Stripe's code and services preloaded, they're isolated. This is a safe space
they're isolated. This is a safe space to place their agent. And they do this so that they can run minions on dev boxes without human permission checks.
Of course, this also gives you parallelization without the overhead of something like Git Work trees, which just falls apart at certain scales. All
right, after some time, the Git Work trees just fall apart. You're going to need your own dedicated device. I have a Mac Mini here as a local personal kind of private device. But recently, I also
just said, "Screw it. I'm going to need more scale." And I started spinning up
more scale." And I started spinning up entire dev boxes for my agents on, you know, use your favorite cloud hosting tool, GCP, AWS, and some of the other ephemeral agent sandbox tools like E2B,
modal, so on so forth. But this is a really big idea, right? Um, the more autonomy you give your agents and the more you set up their environment to be yours, the more they can act and perform as you would. The core agent loop runs
on a fork of blocks coding agent, goose.
One of the first widely used coding agents, which they forked early on. So
shout out Goose. They took this and they customized the orchestration flow in an opinionated way to interle agent loops and deterministic code. Huge huge huge
idea here and they're going to expand on this even more in a moment here with one of the big ideas they talk about later which is their blueprint engine. Okay,
so this is a huge huge huge idea. Let me
just emphasize this. You want to be interle agent loop with deterministic code and what type of operations right we're talking get liners and most importantly testing. Okay, this lets
importantly testing. Okay, this lets your agents, your system operate with feedback. And this gives you the best of
feedback. And this gives you the best of both worlds. You get the deterministic
both worlds. You get the deterministic world and the non-deterministic reasoning creativity world. And they
explicitly say that here they run a mix of creativity of the agent with asurances that they'll always complete stripe specific steps like llinters. So
here we have stripe agentic engineering determinism with agents. All right. So a
couple additional things to note here.
connected to MCP. They use cursor and clawed code and some conditions. They
operate agent rule files. We'll talk
about that more in a second. This solves
the large context problem for Stripe.
All agent rules are conditionally applied based on subdirectories. Super
important. They have MCP as I mentioned.
They have this tool shed idea which is basically a meta tool to help them select one or more of their 400 MCP tools. Okay. A really big piece of why
tools. Okay. A really big piece of why this blog post is so incredible, you know, shout out to the Stripe engineers, shout out to Alistar Gray, is the fact that they're operating at such a massive scale, at such success, and they're
still gaining massive value from their agents and from their agentic layer that they're building. All right, managers
they're building. All right, managers are built with a goal of oneshotting, but if they don't, the key is to give them feedback. Key key idea. Two more
them feedback. Key key idea. Two more
ideas here. We seek to shift feedback left when thinking about developer productivity. The best thing for humans
productivity. The best thing for humans and agents is basically you want the issues to happen earlier rather than later, right? On the engineers's device,
later, right? On the engineers's device, on the agents device as early in the process as possible. All right? And then
if local testing doesn't catch anything, they have a whole suite of tests over 3 million tests that run upon push. Key
idea here, they figured out a way to selectively run tests on push. All
right? And they're choosing from many of 3 million tests. Okay? And this is going to, as you can imagine, offer feedback to their agentic system. Now, here's
something that I would critique Stripe on a little bit here. Due to the cost constraints, they only let their minion run at most two rounds of CI. All right,
so you can imagine at this scale that they have to just limit this for it to be costefficient. This is where I would
be costefficient. This is where I would push back a little bit. We'll talk about that later, but this is a interesting thing here, right? So, they basically limited the rounds of feedback for their minion to just two. This is part one of their blog. Let's look at part two and
their blog. Let's look at part two and dig into some of the details of some of these key nodes, right? Specifically,
their agent sandbox and their powerful blueprint engine because their blueprint engine sits at the center of how they operate their strike minions at scale.
So, here's part two, dev boxes hot and ready. So for maximum effectiveness,
ready. So for maximum effectiveness, their minion agents requires a cloud developer environment that's paralyzable, predictable, and isolated.
So this is very clearly an agent sandbox. Okay, it gives them a place to
sandbox. Okay, it gives them a place to operate at scale with full autonomy. And
if something goes wrong, if they destroy something, the agent can't cause as much damage as they could if they were operating your device or god forbid a device connected to the production system. And I completely agree.
system. And I completely agree.
Containerization, get work trees, they're great, but they have hard limits and it's hard to really really scale without giving each agent their own device, right? Again, if you want your
device, right? Again, if you want your agent to perform like you, give them the tools that you have. All right. What
else can we learn about Stripes Devbox here? So, very cool. Stripes Devbox is a
here? So, very cool. Stripes Devbox is a full-on computer, right? It's an EC2 instance and it contains their source code and services under development.
Very, very cool. Many engineers use one dev box per tasks and this means that every engineer might have half a dozen running at a time. Check out how awesome this is, right? They're allowing their
engineers to scale their impact by allowing parallelization of their agents and every agent has their own sandbox.
Okay, so a question I would ask them is, do their minions have access to additional minion sub agents or not even sub agents, other primary agents that are specialized across their code base?
Very cool stuff here. This is again part of their agentic system, right? It's
giving and servicing scale very very quickly so that engineers can knock out more problems than ever before. All
right, we want it to feel effortless to spin up new dev boxes. Right, ready in 10 seconds. Hot and ready. So fantastic.
10 seconds. Hot and ready. So fantastic.
The raw pieces of engineering should feel effortless. You want to be building
feel effortless. You want to be building systems that allow you to move at the agentic speed, the speed of agents.
Something kind of funny happened to me the other day while I was reading this blog. Actually, I'll throw the image on
blog. Actually, I'll throw the image on the screen. I had to save it. The
the screen. I had to save it. The
agentic speed is just insane. Your
agents can process information much, much faster than you can. I was reading through this blog, you know, took me maybe, you know, 20 minutes to read through part one and part two, take notes on this. I also spun up a cloud
code agent to read the blog alongside me. It read the whole thing, of course,
me. It read the whole thing, of course, in what was it, 5 seconds. And so I just had like a really funny interaction where uh you know I was shocked and then you know I said something and the agent
literally said nothing. It was the first time I've ever had my agent respond with nothing. It was just a really
nothing. It was just a really interesting interaction point. And and
this is the agentic speed, right? It's
this multiplied by every single agent you can spin up. Your agents can read, they can code, they can engineer at agentic speed. So you need to build the
agentic speed. So you need to build the system that allows you to tap into that.
And you can see here Stripe is doing that with their powerful dev boxes that spin up in just 10 seconds and it somehow sets up their entire gigantic repository. Millions of lines of code,
repository. Millions of lines of code, tens and thousands, probably hundreds of thousands of files. All right. And you
know, props to Stripe. We built out dev boxes for the needs of human engineers long before LLM coding agents. As it
turns out, parallelism, predictability, isolation were very, very good properties for engineers as well as agents. Fantastic. We're almost at the
agents. Fantastic. We're almost at the blueprint, which is a really, really big idea. But let's talk about their agent a
idea. But let's talk about their agent a little bit more, right? So, they built this on their own. They forked Goose.
Let me be clear about that. They forked
Goose and then they customized it to work within Stripe's LLM infrastructure.
Okay? So, you can imagine they have custom prompts, custom skills, custom agents, and then they customize the agent harness. All right? And again,
agent harness. All right? And again,
this was the big idea we talked about in last week's video. I'll leave that linked in the description for you if you're interested. Customizing your
you're interested. Customizing your Aentic harness gives you a massive edge.
You can do it your way. You can build it to fit the needs of your specific problem. Okay, once again, I want to
problem. Okay, once again, I want to beat this idea over the head.
Specialization is the advantage of every engineer. Now, you can build specialized
engineer. Now, you can build specialized solutions, specialized developer tools to help you solve your problems at the agentic speed. Okay, the speed of
agentic speed. Okay, the speed of agents, not the speed of humans. And so
they focus their use on the needs of minions rather than human supervised tools. And this is another big idea we
tools. And this is another big idea we need to double click into. That's the
use case well filled by third-party tools such as cursor and claw code.
Okay, which are made readily available for our engineers. So a couple things here. They're not limiting, they're not
here. They're not limiting, they're not forcing their engineers to use any specific tooling. That's a terrible idea
specific tooling. That's a terrible idea in general. But what they are doing is
in general. But what they are doing is building two types of agent coding tools. Inloop and outloop. I've talked
tools. Inloop and outloop. I've talked
about this on the channel before. This
is a critical idea to get right if you want to do more with your agentic engineering. When you are in loop
engineering. When you are in loop agentic coding, your butts in the seat at your desk and you're prompting back and forth and back and forth and back and forth. This is great for highly
and forth. This is great for highly specialized work. This is great for when
specialized work. This is great for when you're building the system that builds the system, but this is bad for everything else. Okay? Uh as a general
everything else. Okay? Uh as a general rule, I recommend to engineers now that you spend more than 50% of your time building the system of agents that build your application for you. That's inloop,
right? And that's really the value prop of inloop. You get full control. You can
of inloop. You get full control. You can
see everything. It's very manual, but it is very slow and expensive. You're using
human engineered time. Then there's
outloop agent coding. And this is what Stripe's minions offer Stripe. Okay,
they are building an Outloop system that operates at scale in parallel in the dev box, right? In dedicated agent
box, right? In dedicated agent sandboxes. This is a big big idea. Why
sandboxes. This is a big big idea. Why
is that? It's because now instead of having one engineer with one terminal or one engineer with three terminals, you can have one engineer with six agent sandboxes operating and solving problems
at scale in parallel, right? And six is just the beginning of this. The whole
idea here is that you should be handing off more work over time to your Outloop system. If you're building a great
system. If you're building a great agentic layer, if you're building a great system that has agents operating your services for you, you should slowly be handing off more work to them. Okay?
And that saves you from the expensive time that you'll spend. And you know, never forget your time is your most important resource. It is constantly
important resource. It is constantly running out. Okay? Let me just be super
running out. Okay? Let me just be super clear about that. But your your agentic systems, you can clone, you can dupe, you can parallelize these as far as your system allows you to. All right? And
that's the lever that agentic engineering unlocks. If you build the
engineering unlocks. If you build the system that builds the system, you get massive, massive reasoning at scale. you
get access to intelligence that engineers your way and at some point better than your way. But uh that's key.
So I just wanted to to really dial into that. Minions give Stripe engineers
that. Minions give Stripe engineers access to outloop agentic coding. Very
very powerful. And so they talk about specialization a little bit more. And
you know they're really hitting on this this idea I just mentioned there.
Offtheshelf local coding agents are usually optimized for workflows where the engineer is sitting looking over his shoulder, right? And I just call this
shoulder, right? And I just call this babysitting the agent. Minions are fully unattended and so their agent harness can't use humanfacing features. Okay,
they built the minion to be fully autonomous, right? They're built so that
autonomous, right? They're built so that humans cannot interject. That's not the point, right? The point is that they
point, right? The point is that they operate on their own. Again, inloop
agent coding, outloop agentic coding.
Cloud code minions. Okay? And just to emphasize it once again, you know, cloud code has the ability cursor has the cursor CLI. And of course, there are
cursor CLI. And of course, there are great tools we've covered on the channel like pi.dev dev or you can programmatically inject these into your Outloop systems, right? You can deploy an agent outside the loop and have them
run on a cron job, have them run via an API request, so on and so forth. All
right, that is where all agent engineers must move to get massive leverage. You
can see stripes engineers using minions to do just that. All right, so uh they talk about permissions. Uh let's focus on the big idea here right next to dev boxes. The next most important thing
boxes. The next most important thing here for sure is their blueprint engine.
So let's talk about this thing. So what
is this? So they talk about workflows versus agents. They talk about loops.
versus agents. They talk about loops.
They talk about, you know, series of steps, which is like the workflow. This
is what a lot of prompts and skills actually are. They're just steps that
actually are. They're just steps that you want to work through. Sometimes you
have an agent that's actually doing some intelligent reasoning, right? Loop with
tools. But you can do much better than that. You can push a lot further. And
that. You can push a lot further. And
that's exactly what Stripe has done.
Minions are orchestrated with a primitive we call blueprints. Blueprints
are workflows designed in code that direct a minion run. Okay. And then they go on to say blueprints combine the determinism of workflows with agents
flexibility in dealing with the unknown.
What is this? Every tactical agent coding member knows this as an ADW, an AI developer workflow. This is the past and the future. This is code plus your agent. Okay, this is the highest
agent. Okay, this is the highest leverage point of agent coding is when you put these two together. You have
step-by-step workflows that have determinism and non-determinism put together. In essence, a blueprint is
together. In essence, a blueprint is like a collection of Asian skills interwoven with deterministic code so that particular subtasks can be handled most appropriately. Okay, there are some
most appropriately. Okay, there are some things like a llinter for instance or like a git commit or a whole number of things, right? Running tests, creating
things, right? Running tests, creating certain structures, creating certain templates, certain reusable pieces, certain hard deterministic code pathways. There are certain pieces that
pathways. There are certain pieces that an agent would perform worse in. Adding
an agent to specific steps actually makes the whole system worse, more brittle, and more expensive, frankly.
So, for these steps, why would you throw an agent at that problem? Right? The
real advantage that Stripe has completely identified here with blueprints is the fact that agents plus code beats agents alone and agents plus code beats code alone. That's the big
idea here. So, Alistar goes on to break
idea here. So, Alistar goes on to break this concept down here. You have the agent call here implement the task fix the CI failures whatever but you also have the actual nodes run configuration
lenders push changes which are fully deterministic okay they don't invoke an LM at all they just run code so imagine you know some toptobottom process where you have agent running and then you have code running and then you have agent
running and then you have code running right so on and so forth this is what you want to build right it's this it's the combination of both the agent and your code okay because not everything needs an agent and not Everything needs
code. Okay, very very powerful idea
code. Okay, very very powerful idea here. Another advantage of creating
here. Another advantage of creating these blueprints of combining code plus agents is that their blueprint machinery makes contact engineering with sub agents easy. Why is that? It's because
agents easy. Why is that? It's because
they're operating at a specific step.
And so at that step, you might constrain the tools, you might constrain the system prompt, right? Or you might modify the conversation required by the subtask at hand. Okay? And again, we're hitting on this idea of specialization.
There are specific steps in your engineering, in your product, in your tool that you've uniquely implemented.
Okay? And so when you can break that down into determinism or in a gentic process step by step, this allows you to specialize, right? And so, you know,
specialize, right? And so, you know, once again, what are we doing? We're
back at foundational engineering. If
you're trying to tackle a big problem, chunk it up into small pieces. every big
problem is just a you know a few small problems put together and then chunk those problems into types and then give it to code or give it to agents. Okay,
that's what their blueprint system is effectively doing. To me, this is the
effectively doing. To me, this is the highest leverage point. This is what makes their agentic layer, their agentic system so powerful. It's the combination
of code and agents inside of a repeatable format for success. Okay?
Because guess what they can do? They can
now deploy meta aentics. they can
effectively create an agent that builds their blueprint just in the right way and then they can validate it, right?
They I wouldn't be surprised if they had a blueprint for creating blueprints. All
right. Anyway, let's move into context.
So, they use the rule files setup that you know is much like claude.md or
agents.md due to the size of the repository they can't have unconditional job rules. So, they need a specific
job rules. So, they need a specific solution to do this. They're using a standardized rule format much like cursors. All right. So this is a rule
cursors. All right. So this is a rule format that looks like this. So you have your primary directory whatever tool you're using you know tries to claim that name and then you have / rules and
then you have some markdown files right but the interesting part is that you have a markdown file with some front matter. All right, front matter is going
matter. All right, front matter is going to be you know MDC files are like the most popular file format and for good reason right so they have these rules here where you can specify the glob
pattern in which to activate this context or you know a specific subset of this context and then they have rule anatomy you can imply intelligently or you can apply only when specific files
are being accessed. All right. And so
this gives you more control over the context that's loaded as you're accessing different directories throughout your codebase. Okay. And so
this is the structure that Stripe Minions use, right? And the big line is right here. We almost exclusively give
right here. We almost exclusively give minions context from files that are scoped to specific subdirectories or patterns automatically attached as the agent traverses the file system. And
they're using the, you know, kind of cursor rules to do that. And so they've combined it with a format from cloud code. Once again here you can see that
code. Once again here you can see that they're building customized agentic solutions that best solves the problems they're facing. Okay and they're
they're facing. Okay and they're combining the best for the industry. I'm
not saying that you know cursor agents or claw code agents or how they do things is wrong. That's not the point.
There are many ways to do things. The
question is what's the best way for you and how do you get the most leverage out of what's available? We can see stripe engineers doing exactly that. Last
important idea to mention here is Stripes gathering MCPs. Right. So what
are and and how does Stripe put together the tools? So as we all know tools are
the tools? So as we all know tools are an essential element of the core for context model prompt tools. Tools is
what created agentic coding, right? It's
the only reason that any of this is possible because our agents can now use tools to take actions as we can. So how
does Stripe handle their 500 MCP tools?
Won't this immediately cause a token explosion? Absolutely right. It totally
explosion? Absolutely right. It totally
would. What they've done here is they've built a tool shed. They built a centralized internal MCP server called a tool shed which makes it easy for Stripe engineers to make new tools and they're automatically discoverable in their
agentic systems. Very very powerful stuff here. Okay. All very agentic
stuff here. Okay. All very agentic systems are able to use the tool shed. I
want to be super clear about this. We're
talking about meta agentics. This is
something that keeps coming up over and over. You build prompts that create
over. You build prompts that create prompts. You make agents that build
prompts. You make agents that build agents. You have skills that build
agents. You have skills that build skills. You have tools that allow you to
skills. You have tools that allow you to select tools. Okay. The tool shed is a
select tools. Okay. The tool shed is a tool that unlocks tools for their agents. Okay, so these are called
agents. Okay, so these are called metaagentics and they're a powerful way to solve the class of problems, right?
To to solve repeat problems in the space of agents. And you know, to be clear,
of agents. And you know, to be clear, this is not new at all, right? OG
engineers watching, you've heard of like things like meta programming, right?
Passing functions into functions. This
is not a new phenomenon, but what is new and what's really important for you and I to focus on when we're building out these powerful agentic layers is to think about when we need to build the thing that builds the thing, right? So,
Stripe uses a tool shed to create and connect to over 500 or nearly 500 MCP tools. Okay? Very, very powerful. And
tools. Okay? Very, very powerful. And
you can imagine they have all types of internal and external services that they want to connect to. And the tool shed lets them do that. This was completely net new to me. I had not seen a concept like this before. I think this is really
cool. A tool shed centralized location
cool. A tool shed centralized location to load specific tools. So, you know, big shout out to the team for uh for building something like this. And then
lastly, you know, one of the big ideas they talk about and that's just super critical for engineering. Like this is just great engineering. You just
iterate, right? All this stuff is so new. All this stuff is moving so
new. All this stuff is moving so quickly. You and I strip engineers.
quickly. You and I strip engineers.
Doesn't really matter who you are. It's
not about what you can do anymore. It's
about what you can teach your agents to do for you. Okay? This is a big idea.
It's one of the central thesis we talk about in tactical agentic coding in addition to building agentic layers like handing off work and thinking about your agents as tools that you're templating
into and templating your engineering into. That's the name of the game,
into. That's the name of the game, right? Teach your agents how to build
right? Teach your agents how to build like you would so you can scale them to the moon. All right, so what else do we
the moon. All right, so what else do we have here? A lot of really great ideas.
have here? A lot of really great ideas.
I'm curious what you think if you've operated in code bases with more than 10,000 files. Comment down below what
10,000 files. Comment down below what would you rank Stripe's agentic layer based on everything we've gone through here and you know our highle understanding of their system right they have multiple CI entry points they have
EC2 agent sandboxes that mirror developer environments they have their own custom agent harness they have a customizable blueprint engine that lets them combine code and agents together to
outperform either they have rules file for context engineering they have tool shed for selecting one of 500 tools or many of 500 tools tools. They of course have CI for self validation and they
have GitHub PRs to review the work their agents have done on their dedicated agent sandboxes. All right, so rank
agent sandboxes. All right, so rank this. I'm super curious what you think.
this. I'm super curious what you think.
Rank Stripes agentic layer out of 10.
I'm going to go ahead and give them and and you know again if you've worked on code bases that are larger than 10,000 files, no offense, guys, but I don't want to hear a vibe coders opinion on Stripe's endto-end system. But for mid to senior level plus engineers, I'd love
to hear what you think. I'm going to give Stripe an eight out of 10. Okay, so
very very very powerful agent layer. And
let me be super clear here. I have no ego in this. Let me say it this way. I
cannot solve Stripe's programmable financial infrastructure problems better than any one of their engineers on their team could. They own that problem in
team could. They own that problem in this problem space. So that's not what I'm saying at all. They are the experts there for sure. My expertise is in agentic engineering. It's in building
agentic engineering. It's in building agentic layers. And so I only have two
agentic layers. And so I only have two notes of of feedback for them here that I would pitch to them as potential improvements. The first thing is this.
improvements. The first thing is this.
you know, they they identify this right away. Why only two rounds of feedback in
away. Why only two rounds of feedback in their CI for their agents? Okay. And so
they say, you know, speed, completeness, cost, time, compute, blah blah blah.
These are fair constraints and reasons to only run two rounds. But I think this is a mistake, frankly. Think about
yourself as an engineer. Has anyone ever said to you, "Solve this problem. You
have two attempts."
All right? You just have two shots at this. Uh, no. No one said that. Right?
this. Uh, no. No one said that. Right?
It often takes us tens and hundreds of times to get something right. So, I
think limiting their minions to just two shots is potentially going to cost them more developer time and also increase the gap between the next learning of how
to improve their agentic system by letting their agents run more, right?
Like, I think the learnings you get from running five rounds of your agent is going to be a lot more informative than running just two. All right, but I could be totally wrong. Again, they know their system better than we do. All right, but
that's my first note. And my last note here is in the language of their minions. So, you know, they call these
minions. So, you know, they call these end to-end agents, but you might have noticed they have a prompt step and they have a review step. Okay, that's two
steps. End to end is this, right? And
steps. End to end is this, right? And
you take out the review, right? It's
prompt to production, P2P. Okay? And
this is something we've talked about inside of Tactical Agentic Coding. This
is the northstar for all agentic engineers. This is an idea, a concept
engineers. This is an idea, a concept called ZTE, zero touch engineering, prompt to production. No review, no human in the loop. I want to be a little critical about their language here. I
know that this is industry standard and of course, again, of course, they're operating on a scale most of us engineers will never get to, but that's what I would push Stripe to think about
next. What are the lowlevel simple
next. What are the lowlevel simple tasks? maybe some lower risk tasks,
tasks? maybe some lower risk tasks, developer tools stuff, you know, some non-userfacing stuff and even some userfacing stuff that they could ship actually end to end. And the value isn't in doing it. It's in answering the
question, what would it take for you to run a prompt and trust that your agentic system can deliver this to production without human oversight, right? The
value is in the journey of the question.
So, that's that's another area where I would just like really try to push the Stripe engineers to that next next level. Um, I made a prediction on this
level. Um, I made a prediction on this at the end of last year in our 2026 top 2% engineering video. I think in 2026 we're going to see a blog post very similar to this where an engineer
operating at serious scale, we're talking tens of millions in revenue. I
predict we're going to see a blog post where they break down their agentic layer and talk about how they ship from prompt to production with ZTE zero touch engineering. So those are the only two
engineering. So those are the only two notes I have. Again, you know, I'm not trying to Stripe has some of the most cracked engineers on the planet. This is just a note on the agentic system and not a
note on any of their true core domain problem because again, if you operate a specific domain for years and years, no one knows how to solve it better than you do. All right, so those are my two
you do. All right, so those are my two notes. Big shout out to the Strap
notes. Big shout out to the Strap Engineering team and you know, Alistar Gray for writing this up. This is a great post. This really caught my eye
great post. This really caught my eye and I thought it would be valuable to share with you here because it really emphasizes that point that building a powerful agentic layer really comes down
to owning all the pieces bottom to top.
Now there is a point in which you want to start owning your agentic technology, right? And again, if you're like
right? And again, if you're like creating a brand new net new product, you probably don't need to do that for a while. You just don't have the scale for
while. You just don't have the scale for it out of the box. It's going to work for you for a while. But then there's going to be a point where you're going to need a specific solution, right? A
customized solution to solve a specific problem. And you want to boil that all
problem. And you want to boil that all the way down just like your application is a is a, you know, a detailed edge case covering solution. Your agent
should reflect that too. That's why we covered the PI coding agent. There are
many, but this one is mine. And the
whole idea here that I want to, you know, connect with you on is that specialization goes all the way up the chain, all the way into the agent harness, all the way to your stack of technology that you operate. So anyway,
big shout out to Stripe Engineers. This
was really fun. I like blogs like this.
You know, frankly, I'm getting a bit tired of everyone hyperfixating on models and prompts and skills. Like
let's let's uplevel this and talk about the systems that have agents inside of them that contain agents and that contain code and that contain, you know, modern engineering technology that puts
it all together to generate real value for you, your team, your company, and ultimately your users and customers, right? Because that's where the value
right? Because that's where the value really is. That's what makes all this
really is. That's what makes all this stuff actually matter at all. All right?
If you're still watching, first off, you know, big thanks to you. I hope these ideas make sense. You really want to be thinking about the agentic layer as a whole, not just your coding tool, not just the models. Let's let's ease up on
the obsession on these, you know, models and who's winning and what genera company is more just let's focus on solving problems by building agentic layers with the key pieces. All right?
And Stripe has outlined a lot of them, right? Like every agentic layer, every
right? Like every agentic layer, every product is going to run into the problems that each one of these nodes is a solution to. So let's pay attention to them, right? Let's think about how these
them, right? Let's think about how these are pieces to the puzzle of building at scale with agents. All right? And this
is just one interpretation. Uh no one has all the answers right now. But it's
about collecting the right context to solving the problem of agentic engineering. Right? And and pushing what
engineering. Right? And and pushing what you can do further beyond before the industry before the mainstream catches up. All right. Everything we do in
up. All right. Everything we do in engineering represents an asymmetry of information and then technology and then results with your product, with your tool, with your team, so on and so forth. All right? So, you want to be
forth. All right? So, you want to be pushing forward on this stuff. Don't let
up the gas. Stay focused on valuable information like this blog and, you know, me being biased, but like this channel. I really try to focus in on
channel. I really try to focus in on concrete signal in the industry, not hype, not slop. There's going to be a lot of both of those as we move week after week. But I want this to be a
after week. But I want this to be a place where you, the engineer, can come to focus and get some serious insight on how you can continue to win in the age of agents. If you made it to the end and
of agents. If you made it to the end and you like this content, definitely feel free to check out tactical agentic coding. This is my take on how to scale
coding. This is my take on how to scale far beyond AI coding and vibe coding with advanced agentic engineering so powerful your codebase runs itself. As
you can imagine, a lot of the ideas detailed in this blog, detailed in the architecture of how Stripes built their agentic coding tool has been detailed in here. All right, I'll I'll be honest,
here. All right, I'll I'll be honest, I'm not like gloating or anything. I've
been early to this. This is what happens when you're a first mover, when you bet big on an emerging technology.
Everything you're going to see over the next year, I have in tactical Agenta coding and Agenta Horizon, the second part of this course detailed here. So,
if you're interested, I'm going to leave a link to this. You can see all the ideas are really in stone here and thousands of engineers, some of your favorite engineers mind you, are inside of this course, have taken this course
and have gotten massive value and are moving ahead of the curve. So, I'm going to leave this in here, link in the description for you. Of course, I'm also going to link the minions post.
Definitely give this a look and I'll bet that if we search hiring. Yeah, so
Stripe is hiring. If you're an agent that's interested in this, you can tell them that Andy Deb Dan sent you if you want. And again, just big shout out to
want. And again, just big shout out to the Stripe team. This is really great stuff. really great engineering in the
stuff. really great engineering in the age of agents. No matter what, stay focused and keep building.
Loading video analysis...