Advanced Context Engineering for Agents
By YC Root Access
Summary
## Key takeaways - **Naive AI prompting fails in complex systems**: Simply back-and-forth prompting with AI agents is ineffective for complex tasks, leading to wasted effort and potential slowdowns, akin to committing compiled code and discarding the source. [01:16], [04:08] - **Spec-first development is key for AI coding**: Transitioning to spec-first development, where detailed specifications guide the AI, allows teams to stay aligned and enables reviewers to trust the output without reading every line of code. [03:13], [03:24] - **Intentional compaction optimizes context**: Instead of relying on generic compaction, intentionally curate what is committed to the file system and agent's memory to manage context windows effectively, avoiding tools that return large JSON blobs. [04:48], [05:32] - **Subagents enhance context control**: Subagents are not just about role-playing; they are crucial for context control, helping to locate information across components and reducing the context burden on the parent agent. [07:28], [08:18] - **Workflow built around context management**: The most effective approach involves building the entire development workflow around frequent, intentional context management, aiming to keep context utilization under 40% through research, planning, and implementation phases. [08:53], [09:04] - **Reviewing plans ensures alignment**: Code review's most critical function is mental alignment within the team; reviewing concise implementation plans, rather than lengthy code changes, effectively communicates system evolution and catches problems early. [10:38], [10:55]
Topics Covered
- Why AI-generated code still leads to rework.
- Adopt spec-first development for AI coding agents.
- Context is the ultimate lever for LLM output.
- Master frequent intentional context compaction.
- Bad research causes thousands of bad code lines.
Full Transcript
Hi everybody. Uh my name is Dex. I'm the
founder of a company called Human Layer.
Uh I was in the fall 20. Apparently
we're all YC founders on stage today. I
was in the fall 24 batch. Um
I'll give you a little itty bitty
history of context engineering in the
term. Um long before Toby and Andre and
Walden were tweeting about this in uh
June. In April 22nd, uh we wrote a weird
little manifesto called 12actor agents
um principles of reliable LLM
applications. And then on June 4th, um
and shouts out, I did not know Swix was
going to be here, but he's getting a
shout out anyways. Uh, changed the title
of the talk to context engineering, um
to give us a shout out for that. Um, so
everyone's been asking me what's next.
We did the context engineering thing.
Um, we talked about how to build good
agents. Um, I will point out my two
favorite talks from AI engineer this
year. Um, incidentally, the only two
talks with more views than 12 factor
agents. Um, number one is, uh, Sean
Grove, the new code. Um, he talked about
how, um, if I can figure out how to use
this. Um, he talked about how we're all
vibe coding wrong and how the idea of
sitting and talking to an agent for two
hours and figuring out and exactly
specifying what you want to do and then
throwing away all the prompts and
committing the code is basically
equivalent to um if you're a Java
developer and you spend six hours
writing a bunch of Java code and then
you compile the jar and then you checked
in the compiled asset and you threw away
the code. in the future where AI is
writing more and more of our code, the
specs, the the description of what we
want from our software is the important
thing. Um, and then we had the Stanford
study which was a super interesting
talk. Um, they ingested data from
100,000 developers of all giant
enterprises down to small startups. Um
and they found that like AI engineering
and software leads to a lot of rework.
So even if you get benefits, you're
actually throwing half of it away
because it's kind of sloppy sometimes.
Uh, and it just doesn't work for complex
tasks or brownfield tasks. Um, so old
code bases, legacy code bases, things
like that. Um, and especially for
complex brownfield tasks, it can be
counterproductive. Um, not even that it
doesn't really help that much, but it
can actually slow people down. Um, and
this kind of matched my experience, uh
talking to lots of smart founders. It's
like, uh, yeah, coding agents, it's good
for prototypes. Even Amjad from Replet
um, was on a podcast six months ago and
he's like, yeah, our product managers
use this to build prototypes and then
when we figure it out, we give it to the
engineers and they build production. um
doesn't work in big repos, doesn't work
for complex systems. Maybe someday when
the models get smarter, we'll be able to
have AI write all of our code. But that
is what context engineering is all
about. How do we get the most out of
today's models? Um so I'm going to tell
you a story about kind of a journey
we've been on the last couple months of
learning to do better context
engineering with AI generated code. Um
so, I, was, working, with, one, of the, best, AI
coders I've ever met. Um, they were
shipping every couple days I would get a
20,000line PR of Go code and this was
not a CRUD app or a Nex.js API. This was
complex systems code with race
conditions and shutdown order and all
this crazy stuff. Um, and I just
couldn't review it. I was like, I I hope
you, know, I'm, not, going to, read, this, next
2,000 lines of Go code. Um, and so we
were forced to adopt spec first
development because it was the only way
for everyone to stay on the same page.
And I actually learned to let go. I
still read all the tests, but I no
longer read every line of code because I
read the specs and I know they're right.
And it took a long time and it was very
uncomfortable. But over 8 weeks or so
we made this transformation and now
we're flying. We love it. So, I'm going
to talk about a couple of things we
learned on this process. Um, I know it
works because I shipped six PRs last
Thursday and I haven't opened a
non-markdown file in an editor in almost
two months. Um, so the goals, I didn't
set these goals. I was forced to adopt
these goals. Uh, but the goals are works
well in big complex code bases, solves
big complex problems, no slop, we're
shipping production code, and everyone
stays on the same page. Oh, and spend as
many tokens as possible.
Um, this is advanced context engineering
for coding agents. Um, I want to talk
about the most naive way to use a coding
agent, which is to shout back and forth
with it until you run out of context or
you give up or you cry. Um, and you say
"No, do this. No, stop. you're doing it
wrong. Um, you can be a little bit
smarter about this. Um, basically, if
you notice the agent is off track, a lot
of people have done this. I've seen some
people from OpenAI post about this. This
is pretty common. If it's really
screwing up, you just you just stop and
you start over and you say, "Okay, try
again, but make sure not to try that
because that doesn't work." Uh, if
you're wondering when you should
consider starting over with a fresh
context,
if you see this, it's probably time to
start over and try again.
Um, we can be smarter about this though.
Um, and this is what I call intentional
compaction. So, it's not just start over
and I'm going to tell you something
different. Put my same prompt in with a
little bit of steering. But even if
we're on the right track, if we're
starting to run out of context, um, be
very intentional with what you commit to
the file system and the agents memory. I
think slashcompact is trash. I never use
it. Um, we have it write out a progress
file very specifically, which is like my
vibe of what I found works really well
for these things. Uh, and then we use
that to onboard the next agent into
whatever we were working on. Um
what are we compacting? Why? Like how
did I get to this? Lots of people have
instincts about what works here. Um, so
the question is like what takes up space
in the context window? Looking for
files, understanding the flow, doing
edits, doing work. If you have MCP tools
that return big blobs of JSON, that's
going to flood your context window with
a bunch of nonsense. Um, so what should
we compact? We'll get on to like what
exactly goes in there. Um, but it looks
something like this. Um, and I'll talk
about the structure of this file a
little bit more. Um, why are we obsessed
with context? Because LMS are pure
functions. I think Jake said a lot of
interesting things about this. The only
thing other than like training your own
models and messing with the temperature
the only thing that improves the quality
of your outputs is the quality of what
you put in, which is your context
window. Um
and in a coding agent, your agent is
constantly looping over determining
what's the right next tool to call
what's the right next edit to make, and
the only thing that determines its
ability to do that well is what is in
your context window going in. And we'll
throw this one into everything is
context engineering. Everything that
makes agents good is context
engineering. So, we're going to optimize
for correctness, completeness, size, and
trajectory. I'm not going to talk about
a lot about trajectory because it's very
vibes based right now. Um, but to invert
that, the worst thing to have in your
context window is bad info. Second worst
thing is missing info and then just too
much noise. And if you wanted an
equation, we made this dumb equation.
Um,
Jeff figured this out. Uh, well, Jeff
lots of people are figuring this out
but Jeff Huntley works on source AMP.
Um, which I know Bang was supposed to be
speaking tonight. I'm sure I hope he
will appreciate this talk. Uh, in the
spirit of what they've been talking
about, um, you got about 170,000 tokens.
The less of them you use to do the work
the better results you will get. Um, he
wrote this thing called Ralph Wigum as a
software engineer. Um, and he talks
about, hey, this is the dumbest way to
use coding agents and it works really
really well, which is just to run a same
prompt in a loop overnight for 12 hours
while he's asleep in Australia and put
it on a live stream. I actually think
that he's being humble. It's a very
very smart way to use coding agents if
you understand LMS and context windows.
Um, I'll link that article as well. Um
I'll put up a QR code at the end with
everything. Um, the next step is you can
do inline compaction with sub aents. A
lot of people saw cloud code sub aents
and they jumped in and they said, "Okay
cool. I'm going to have my product
manager and my data scientist and my
front-end engineer and like maybe that
works." Um, but they're really about
context control. And so a really common
task that people use sub agents for when
they're doing this kind of like high
level coding agents is they will find
you want to find where something
happens. You want to understand how
information flows across multiple
components in a codebase. Um you will
say maybe you'll steer it to use a sub
agent. A lot of models have in their
system prompts to use a sub aent
automatically and you say hey go find
where this happens and then the parent
model will call a tool that says go give
this message to a sub aent. the sub
agent goes and finds where the file is
returns it to the parent agent. The
parent agent can get right to work
without having to have the context
burden of all of that reading and
searching.
Um, and the ideal sub agent response
looks something like this. And I'm not
going to talk about how we made this or
where it comes from yet. Um, there's a
lot to be said about sub agents. The
challenge of like playing telephone and
like you care about the thing that comes
back from the sub agent. So, how do you
prompt the parent model to prompt the
child model about how it should return
its information? Uh if you've ever seen
this thing, we're doing basically uh
what is it? Uh stochastic system. This
is a deterministic system and it gets
chaotic. Imagine with nondeterministic
systems. Um so what works even better
than sub agents and the thing that we're
doing every day now is what I call
frequent intentional compaction.
Building your entire development
workflow around context management. Um
so our goal all the time is to keep
context utilization under 40%. And uh we
have three phases research, plan and
implement. Um, the research is really
like understand how the system works and
all the files that matter and perhaps
like where a problem is located. This is
our research prompt. It's really long.
It's open source. You can go find it.
This is the output of our research
prompt. It's got file names and line
numbers so that the agent reading this
research knows exactly where to look. It
doesn't have to go search 100 files to
figure out how things work. Um, the
planning step is really just like tell
me every single change you're going to
make. not line by line, but like include
the files and the snippets of what
you're going to change and be very
explicit about how we're going to test
and verify at every step. So, this is
our planning prompt. This is one of our
plans. Um, and then we implement and we
go write the code. And honestly, if the
plan is good, I'm never shouting at
cloud cloud anymore. And if I'm shouting
at cloud, it's because the plan was bad.
And the plan is always much shorter than
the code changes sometimes, most of the
time. Um, and as you're implementing, we
keep the context under 40%. So, we
constantly update the plan. We say
"This is done. On to the next phase. is
new context window. Um, this is our
implement prompt. These are all open
source. I'll tell you where to find
them. Um, this is not magic. You have to
read this It will not work. And so
we build it around intentional human
review steps because a research file is
a lot easier to read than a 2000 line
PR. But you can stop problems early.
This is our linear workflow for how we
move this stuff through the process.
Um, and I want to stop. Does anyone know
what code review is for?
anybody? Yeah, me neither.
Um, code review is about a lot of
things, but the most important part is
mental alignment. Keeping the people on
the team aware of how the system is
changing and why as it evolves over
time. Um, I can't read 2,000 lines of
golining every day, but I can sure as
heck read 200 lines of an implementation
plan. Um, and if the plans are good
that's enough because we can catch
problems early and we can maintain
shared understanding of what's happening
in our code. Um, so putting this into
practice, uh, I do a podcast with
another YC founder named Vibbov. He
built, Bam., I, don't know,, has, anyone, here
you used BAML before? All right, we got
a couple BAML guys. Um, I decided, I
didn't tell Vibb I was doing this. We
decided to see if we could oneshot a fix
to a 300,000 line RS codebase. Um, and
the episode is 75 minutes and we go
through the whole process of all the
things that we tried and what worked and
what didn't work and what we learned.
Um, I'm not going to go into it. I'll
give you a link. But we did get it
merged. The PR was so good the CTO did
not know I was doing it as a bit and he
had merged it by the time we were
recording the episode. Um, so confirmed
works in brownfield code bases and no
slop. It got merged. Um, and I wanted to
see if it could solve a complex problem.
So I sat down with uh the boundary CEO
and for 7 hours we sat down and we
shipped 35,000 lines of code. Um, a
little bit of it was generated but we
wrote a lot of code that day and uh he
estimated that was 1 to two weeks of
work roughly. Um, so it can solve
complex problems. You can add WASM
support to a programming language. Um
and so the biggest insight from here
that I would ask you to take away is
that a bad line of code is a bad line of
code. And a bad part of a plan can be
hundreds of bad lines of code. And a bad
line of research, a misunderstanding of
how the system works and how data flows
and where things happen can be thousands
of bad lines of code. And so you have
this hierarchy of where do you spend
your time? And yes, the code is
important and it has to be correct. But
you can get a lot more for your time by
focusing on specifying the right problem
and what you want and by understanding
making sure that when you launch the
coding agent, it knows how the system
works. And of course, our cloud MD and
our slash commands are like we basically
like test those for weeks before
anyone's allowed to change them. Um, so
we review the re research and plans and
we have mental alignment. Um, I don't
have time to talk about this one because
I think I'm already over. Uh, but how
did we do? Um, we we did the goals. I
didn't I didn't ask for these goals, but
they were thrust upon me and we solved
them. Uh, we spent a whole lot of
tokens. This is a team of three in a
month. Um, these are credits, by the
way. Um, but I don't think we're going I
I don't think we're switching to the max
plan because this is working well enough
that I'm it's I mean, it's worth it's
worth spending because it saves us a lot
of time as engineers. Um, our intern Sam
is here somewhere. He shipped two PRs on
his first day. on his eighth day, he
shipped like 10 in a day. This
works. Um, we did the BAML thing. And
again, I I don't I don't look at code
anymore. I just read specs.
So, what's next? I kind of maybe think
coding agents are going to get a little
bit commoditized, but the team and the
workflow transformation will be the hard
part. Getting your team to embrace new
ways of communicating and structuring
how you work is going to be really
really hard and uncomfortable for a lot
of teams. Um, people are figuring this
out. you should try to figure this out
because otherwise you're gonna have a
bad time. Um, we're trying to help
people figure this out. We're working
with everybody from six person YC
startups to a thousand people uh public
companies. Um, there is a Oh, we're
doing an event tomorrow on
hyperengineering. Uh, it is very very
close to capacity, but if you come find
me after this and give me a good pitch
there are a couple spots left. Um, and
there's a link to the video where we
talk about this for 90 minutes and uh me
and Vibb bust each other's balls for a
while. That is advanced context
engineering for coding agents. Thank
you.
[Music]
Loading video analysis...