LongCut logo

Advanced Context Engineering for Agents

By YC Root Access

Summary

## Key takeaways - **Naive AI prompting fails in complex systems**: Simply back-and-forth prompting with AI agents is ineffective for complex tasks, leading to wasted effort and potential slowdowns, akin to committing compiled code and discarding the source. [01:16], [04:08] - **Spec-first development is key for AI coding**: Transitioning to spec-first development, where detailed specifications guide the AI, allows teams to stay aligned and enables reviewers to trust the output without reading every line of code. [03:13], [03:24] - **Intentional compaction optimizes context**: Instead of relying on generic compaction, intentionally curate what is committed to the file system and agent's memory to manage context windows effectively, avoiding tools that return large JSON blobs. [04:48], [05:32] - **Subagents enhance context control**: Subagents are not just about role-playing; they are crucial for context control, helping to locate information across components and reducing the context burden on the parent agent. [07:28], [08:18] - **Workflow built around context management**: The most effective approach involves building the entire development workflow around frequent, intentional context management, aiming to keep context utilization under 40% through research, planning, and implementation phases. [08:53], [09:04] - **Reviewing plans ensures alignment**: Code review's most critical function is mental alignment within the team; reviewing concise implementation plans, rather than lengthy code changes, effectively communicates system evolution and catches problems early. [10:38], [10:55]

Topics Covered

  • Why AI-generated code still leads to rework.
  • Adopt spec-first development for AI coding agents.
  • Context is the ultimate lever for LLM output.
  • Master frequent intentional context compaction.
  • Bad research causes thousands of bad code lines.

Full Transcript

Hi everybody. Uh my name is Dex. I'm the

founder of a company called Human Layer.

Uh I was in the fall 20. Apparently

we're all YC founders on stage today. I

was in the fall 24 batch. Um

I'll give you a little itty bitty

history of context engineering in the

term. Um long before Toby and Andre and

Walden were tweeting about this in uh

June. In April 22nd, uh we wrote a weird

little manifesto called 12actor agents

um principles of reliable LLM

applications. And then on June 4th, um

and shouts out, I did not know Swix was

going to be here, but he's getting a

shout out anyways. Uh, changed the title

of the talk to context engineering, um

to give us a shout out for that. Um, so

everyone's been asking me what's next.

We did the context engineering thing.

Um, we talked about how to build good

agents. Um, I will point out my two

favorite talks from AI engineer this

year. Um, incidentally, the only two

talks with more views than 12 factor

agents. Um, number one is, uh, Sean

Grove, the new code. Um, he talked about

how, um, if I can figure out how to use

this. Um, he talked about how we're all

vibe coding wrong and how the idea of

sitting and talking to an agent for two

hours and figuring out and exactly

specifying what you want to do and then

throwing away all the prompts and

committing the code is basically

equivalent to um if you're a Java

developer and you spend six hours

writing a bunch of Java code and then

you compile the jar and then you checked

in the compiled asset and you threw away

the code. in the future where AI is

writing more and more of our code, the

specs, the the description of what we

want from our software is the important

thing. Um, and then we had the Stanford

study which was a super interesting

talk. Um, they ingested data from

100,000 developers of all giant

enterprises down to small startups. Um

and they found that like AI engineering

and software leads to a lot of rework.

So even if you get benefits, you're

actually throwing half of it away

because it's kind of sloppy sometimes.

Uh, and it just doesn't work for complex

tasks or brownfield tasks. Um, so old

code bases, legacy code bases, things

like that. Um, and especially for

complex brownfield tasks, it can be

counterproductive. Um, not even that it

doesn't really help that much, but it

can actually slow people down. Um, and

this kind of matched my experience, uh

talking to lots of smart founders. It's

like, uh, yeah, coding agents, it's good

for prototypes. Even Amjad from Replet

um, was on a podcast six months ago and

he's like, yeah, our product managers

use this to build prototypes and then

when we figure it out, we give it to the

engineers and they build production. um

doesn't work in big repos, doesn't work

for complex systems. Maybe someday when

the models get smarter, we'll be able to

have AI write all of our code. But that

is what context engineering is all

about. How do we get the most out of

today's models? Um so I'm going to tell

you a story about kind of a journey

we've been on the last couple months of

learning to do better context

engineering with AI generated code. Um

so, I, was, working, with, one, of the, best, AI

coders I've ever met. Um, they were

shipping every couple days I would get a

20,000line PR of Go code and this was

not a CRUD app or a Nex.js API. This was

complex systems code with race

conditions and shutdown order and all

this crazy stuff. Um, and I just

couldn't review it. I was like, I I hope

you, know, I'm, not, going to, read, this, next

2,000 lines of Go code. Um, and so we

were forced to adopt spec first

development because it was the only way

for everyone to stay on the same page.

And I actually learned to let go. I

still read all the tests, but I no

longer read every line of code because I

read the specs and I know they're right.

And it took a long time and it was very

uncomfortable. But over 8 weeks or so

we made this transformation and now

we're flying. We love it. So, I'm going

to talk about a couple of things we

learned on this process. Um, I know it

works because I shipped six PRs last

Thursday and I haven't opened a

non-markdown file in an editor in almost

two months. Um, so the goals, I didn't

set these goals. I was forced to adopt

these goals. Uh, but the goals are works

well in big complex code bases, solves

big complex problems, no slop, we're

shipping production code, and everyone

stays on the same page. Oh, and spend as

many tokens as possible.

Um, this is advanced context engineering

for coding agents. Um, I want to talk

about the most naive way to use a coding

agent, which is to shout back and forth

with it until you run out of context or

you give up or you cry. Um, and you say

"No, do this. No, stop. you're doing it

wrong. Um, you can be a little bit

smarter about this. Um, basically, if

you notice the agent is off track, a lot

of people have done this. I've seen some

people from OpenAI post about this. This

is pretty common. If it's really

screwing up, you just you just stop and

you start over and you say, "Okay, try

again, but make sure not to try that

because that doesn't work." Uh, if

you're wondering when you should

consider starting over with a fresh

context,

if you see this, it's probably time to

start over and try again.

Um, we can be smarter about this though.

Um, and this is what I call intentional

compaction. So, it's not just start over

and I'm going to tell you something

different. Put my same prompt in with a

little bit of steering. But even if

we're on the right track, if we're

starting to run out of context, um, be

very intentional with what you commit to

the file system and the agents memory. I

think slashcompact is trash. I never use

it. Um, we have it write out a progress

file very specifically, which is like my

vibe of what I found works really well

for these things. Uh, and then we use

that to onboard the next agent into

whatever we were working on. Um

what are we compacting? Why? Like how

did I get to this? Lots of people have

instincts about what works here. Um, so

the question is like what takes up space

in the context window? Looking for

files, understanding the flow, doing

edits, doing work. If you have MCP tools

that return big blobs of JSON, that's

going to flood your context window with

a bunch of nonsense. Um, so what should

we compact? We'll get on to like what

exactly goes in there. Um, but it looks

something like this. Um, and I'll talk

about the structure of this file a

little bit more. Um, why are we obsessed

with context? Because LMS are pure

functions. I think Jake said a lot of

interesting things about this. The only

thing other than like training your own

models and messing with the temperature

the only thing that improves the quality

of your outputs is the quality of what

you put in, which is your context

window. Um

and in a coding agent, your agent is

constantly looping over determining

what's the right next tool to call

what's the right next edit to make, and

the only thing that determines its

ability to do that well is what is in

your context window going in. And we'll

throw this one into everything is

context engineering. Everything that

makes agents good is context

engineering. So, we're going to optimize

for correctness, completeness, size, and

trajectory. I'm not going to talk about

a lot about trajectory because it's very

vibes based right now. Um, but to invert

that, the worst thing to have in your

context window is bad info. Second worst

thing is missing info and then just too

much noise. And if you wanted an

equation, we made this dumb equation.

Um,

Jeff figured this out. Uh, well, Jeff

lots of people are figuring this out

but Jeff Huntley works on source AMP.

Um, which I know Bang was supposed to be

speaking tonight. I'm sure I hope he

will appreciate this talk. Uh, in the

spirit of what they've been talking

about, um, you got about 170,000 tokens.

The less of them you use to do the work

the better results you will get. Um, he

wrote this thing called Ralph Wigum as a

software engineer. Um, and he talks

about, hey, this is the dumbest way to

use coding agents and it works really

really well, which is just to run a same

prompt in a loop overnight for 12 hours

while he's asleep in Australia and put

it on a live stream. I actually think

that he's being humble. It's a very

very smart way to use coding agents if

you understand LMS and context windows.

Um, I'll link that article as well. Um

I'll put up a QR code at the end with

everything. Um, the next step is you can

do inline compaction with sub aents. A

lot of people saw cloud code sub aents

and they jumped in and they said, "Okay

cool. I'm going to have my product

manager and my data scientist and my

front-end engineer and like maybe that

works." Um, but they're really about

context control. And so a really common

task that people use sub agents for when

they're doing this kind of like high

level coding agents is they will find

you want to find where something

happens. You want to understand how

information flows across multiple

components in a codebase. Um you will

say maybe you'll steer it to use a sub

agent. A lot of models have in their

system prompts to use a sub aent

automatically and you say hey go find

where this happens and then the parent

model will call a tool that says go give

this message to a sub aent. the sub

agent goes and finds where the file is

returns it to the parent agent. The

parent agent can get right to work

without having to have the context

burden of all of that reading and

searching.

Um, and the ideal sub agent response

looks something like this. And I'm not

going to talk about how we made this or

where it comes from yet. Um, there's a

lot to be said about sub agents. The

challenge of like playing telephone and

like you care about the thing that comes

back from the sub agent. So, how do you

prompt the parent model to prompt the

child model about how it should return

its information? Uh if you've ever seen

this thing, we're doing basically uh

what is it? Uh stochastic system. This

is a deterministic system and it gets

chaotic. Imagine with nondeterministic

systems. Um so what works even better

than sub agents and the thing that we're

doing every day now is what I call

frequent intentional compaction.

Building your entire development

workflow around context management. Um

so our goal all the time is to keep

context utilization under 40%. And uh we

have three phases research, plan and

implement. Um, the research is really

like understand how the system works and

all the files that matter and perhaps

like where a problem is located. This is

our research prompt. It's really long.

It's open source. You can go find it.

This is the output of our research

prompt. It's got file names and line

numbers so that the agent reading this

research knows exactly where to look. It

doesn't have to go search 100 files to

figure out how things work. Um, the

planning step is really just like tell

me every single change you're going to

make. not line by line, but like include

the files and the snippets of what

you're going to change and be very

explicit about how we're going to test

and verify at every step. So, this is

our planning prompt. This is one of our

plans. Um, and then we implement and we

go write the code. And honestly, if the

plan is good, I'm never shouting at

cloud cloud anymore. And if I'm shouting

at cloud, it's because the plan was bad.

And the plan is always much shorter than

the code changes sometimes, most of the

time. Um, and as you're implementing, we

keep the context under 40%. So, we

constantly update the plan. We say

"This is done. On to the next phase. is

new context window. Um, this is our

implement prompt. These are all open

source. I'll tell you where to find

them. Um, this is not magic. You have to

read this It will not work. And so

we build it around intentional human

review steps because a research file is

a lot easier to read than a 2000 line

PR. But you can stop problems early.

This is our linear workflow for how we

move this stuff through the process.

Um, and I want to stop. Does anyone know

what code review is for?

anybody? Yeah, me neither.

Um, code review is about a lot of

things, but the most important part is

mental alignment. Keeping the people on

the team aware of how the system is

changing and why as it evolves over

time. Um, I can't read 2,000 lines of

golining every day, but I can sure as

heck read 200 lines of an implementation

plan. Um, and if the plans are good

that's enough because we can catch

problems early and we can maintain

shared understanding of what's happening

in our code. Um, so putting this into

practice, uh, I do a podcast with

another YC founder named Vibbov. He

built, Bam., I, don't know,, has, anyone, here

you used BAML before? All right, we got

a couple BAML guys. Um, I decided, I

didn't tell Vibb I was doing this. We

decided to see if we could oneshot a fix

to a 300,000 line RS codebase. Um, and

the episode is 75 minutes and we go

through the whole process of all the

things that we tried and what worked and

what didn't work and what we learned.

Um, I'm not going to go into it. I'll

give you a link. But we did get it

merged. The PR was so good the CTO did

not know I was doing it as a bit and he

had merged it by the time we were

recording the episode. Um, so confirmed

works in brownfield code bases and no

slop. It got merged. Um, and I wanted to

see if it could solve a complex problem.

So I sat down with uh the boundary CEO

and for 7 hours we sat down and we

shipped 35,000 lines of code. Um, a

little bit of it was generated but we

wrote a lot of code that day and uh he

estimated that was 1 to two weeks of

work roughly. Um, so it can solve

complex problems. You can add WASM

support to a programming language. Um

and so the biggest insight from here

that I would ask you to take away is

that a bad line of code is a bad line of

code. And a bad part of a plan can be

hundreds of bad lines of code. And a bad

line of research, a misunderstanding of

how the system works and how data flows

and where things happen can be thousands

of bad lines of code. And so you have

this hierarchy of where do you spend

your time? And yes, the code is

important and it has to be correct. But

you can get a lot more for your time by

focusing on specifying the right problem

and what you want and by understanding

making sure that when you launch the

coding agent, it knows how the system

works. And of course, our cloud MD and

our slash commands are like we basically

like test those for weeks before

anyone's allowed to change them. Um, so

we review the re research and plans and

we have mental alignment. Um, I don't

have time to talk about this one because

I think I'm already over. Uh, but how

did we do? Um, we we did the goals. I

didn't I didn't ask for these goals, but

they were thrust upon me and we solved

them. Uh, we spent a whole lot of

tokens. This is a team of three in a

month. Um, these are credits, by the

way. Um, but I don't think we're going I

I don't think we're switching to the max

plan because this is working well enough

that I'm it's I mean, it's worth it's

worth spending because it saves us a lot

of time as engineers. Um, our intern Sam

is here somewhere. He shipped two PRs on

his first day. on his eighth day, he

shipped like 10 in a day. This

works. Um, we did the BAML thing. And

again, I I don't I don't look at code

anymore. I just read specs.

So, what's next? I kind of maybe think

coding agents are going to get a little

bit commoditized, but the team and the

workflow transformation will be the hard

part. Getting your team to embrace new

ways of communicating and structuring

how you work is going to be really

really hard and uncomfortable for a lot

of teams. Um, people are figuring this

out. you should try to figure this out

because otherwise you're gonna have a

bad time. Um, we're trying to help

people figure this out. We're working

with everybody from six person YC

startups to a thousand people uh public

companies. Um, there is a Oh, we're

doing an event tomorrow on

hyperengineering. Uh, it is very very

close to capacity, but if you come find

me after this and give me a good pitch

there are a couple spots left. Um, and

there's a link to the video where we

talk about this for 90 minutes and uh me

and Vibb bust each other's balls for a

while. That is advanced context

engineering for coding agents. Thank

you.

[Music]

Loading...

Loading video analysis...