How Agents Use Context Engineering

By LangChain

Summary

## Key takeaways - **Task Length Doubles Every Seven Months**: A nice result from meter shows that the task length is doubling around every seven months, meaning AI agents are handling increasingly longer tasks with more tool calls, like average Manis tasks over 50 and production agents hundreds of turns. [00:30], [00:51] - **Context Rot Degrades Agent Performance**: As agents take on longer tasks, accumulating tool results bloats the context window, causing cost and latency to blow up, and performance to degrade as discussed in Chrome's report on context rot with respect to context length. [00:40], [01:07] - **Offload Context to File Systems**: Giving agents access to a file system lets them save and recall information during long-running tasks, persisting plans and memories across invocations, as seen in Anthropic's multi-agent researcher writing plans to files and Claude Code using a global .claude.md file. [03:38], [04:42] - **Offload Actions to Scripts with Few Tools**: Keep the function calling layer lightweight with few general tools like bash and file manipulation, pushing actions to scripts in a file system to expand the action space without bloating tool descriptions or confusing the agent, as Manus uses less than 20 tools to execute many scripts. [06:03], [07:17] - **Progressive Disclosure Saves Tokens**: Anthropic's Claude skills load only brief headers into the system prompt initially, reading full skill.md files and executing scripts only when needed using the bash tool, progressively disclosing actions without binding many tools or loading all instructions upfront. [08:23], [09:04] - **Reduce Context via Compaction and Summarization**: Manus compacts stale tool results by saving them to files and referencing them in history to reduce tokens, then applies summarization to the entire message history when nearing the context limit, while Claude Code summarizes at 95% utilization and deep agents uses middleware at 170,000 tokens. [11:41], [12:32]

Topics Covered

Why do AI agents' costs explode with longer tasks?
How does offloading context to files prevent forgetting?
Can few tools enable hundreds of actions via scripts?
What is progressive disclosure of actions?
Why isolate tasks in sub-agents' fresh contexts?

Full Transcript

Hey, this is Lance from Langchain.

I want to talk of a few general context engineering principles and how they show up in various popular agents like manis like cloud code and also in our recently released deep agents package and CLI.

So first agent can be simply thought of as an LLM calling tools in a loop.

An LLM kind of makes a tool call.

Tool is executed. observation from a tool goes back to the LM and this continues until some termination condition.

Now the length of tasks that AI agents can perform is getting longer. A nice result from meter shows that the task length is doubling around every seven months.

Now the challenge with this is that as agents take on longer tasks you accumulate more tool results.

For example, Manis mentioned that the average Manis task is over 50 tool calls.

Likewise, Anthropic has mentioned that production agents can often be hundreds of turns. The problem is that as you populate the context window with results from all these different tool calls, you're passing all those prior tool results back through the model at every turn.

And so the cost and latency associated with running your agent can really blow up. And not only that, performance can degrade. So, Chrome has a nice report on conx rot that discusses how performance degrades with respect to context length.

And so, what we've seen is that agents are increasingly being designed with a few different principles to help address this.

Of course, agents have a few common primitives, a model, prompting, tools, and often hooks.

Take cloud code as example using cloud series models.

The system prompt is actually available.

You can look at it at this link here. I'll make sure that this document is in the video description.

It has around a dozen native tools and it does allow for hooks which are basically scripts that can be programmatically run at different points in this agent life cycle.

For example, before each tool call or after each tool call.

Now, our deep agents package and our deep agent CLI is similarly set up with these primitives.

The package allows for any model provider.

The CLI uses openthropic currently. You can see the prompts.

It's all open source.

It's using eight native tools and 11 native tools for the package in the CLI.

I'll show those later in detail. And we also allow for hooks at various points in the agent life cycle.

Now these primitives that kind of make up what we call an agent harness in mind, what are the common techniques that we see across different agents for managing the problem of context rot and of accumulating tokens from many turns of tool calls? Well, context engineering is kind of the broad term that captures many of these principles.

Carpathy outlines it very nicely here.

It's the delicate art and science of filling the context with just the right information for the next step, which is very applicable to agents. you're trying to steer the agent to make the right next tool call along a trajectory of actions.

And the three common principles I like to distill are offload, reduce, and isolate.

So offloading is moving context from the LM context window to something external like a file system where it can be selectively retrieved later as needed.

Reducing is just simply reducing the size of context pass at each turn and there can be a bunch of different techniques to do that.

And finally, isolating context. So using separate context windows or separate sub aents for individual tasks and I share some references here.

I've talked about this on latent space podcast. I had a webinar with Manis where we talked through these principles and how Manis uses them.

I'm going to review them here and also talk about how deep agents package and CLI employs these ideas. So first offloading context a trend that we've seen repeatedly is that giving agents access to a file system is very useful.

It lets agents save and recall information during longunning tasks.

And this is pretty intuitive. I share a link here from Anthropics multi-agent researcher where they basically have the researcher write a plan, write it to a file, go do a bunch of work, and then they just retrieve that plan after a bunch of sub aents did work, make sure that everything's been addressed.

So you can just write to a file and read it back into context when you need to kind of reinforce the plan that was laid out.

And this is very useful to ensure that you actually don't forget specific steps in the plan. By externalizing it to file, reading it back into context, you ensure that it's persisted and that the agent can be more easily steered.

Since you're selectively pulling it back into the context window as needed to help keep the agent on track.

Now another interesting thing about the file system is often times it's persistent across different agent invocations.

For example, if you're running your agent locally on your laptop with cloud code, cloud code can always reference this cloud MD file which can live at various levels.

It can live at the project level and also there's a global cloud MD. This cloud MD can store information that you want to persist across all your different interactions with cloud code as an example.

So manis uses these same ideas.

Of course with manis it runs remotely.

So it uses a sandbox environment which contains a file system and gives the agent access to a computer and it supports user memory. Now the deep agents package allows for different backends.

So you can use the langraph state object which is just in memory or you can use a file system backend for example your local machine. And the deep agent CLI is a lot like cloud code running on your laptop where it will just use your local file system as a backend.

The deep agent CLI also support for memory using a memories directory as well as an agent.mmd file.

The principle here we've seen repeatedly is that giving agents the ability to offload context to a file system has a lot of benefits. You can persist information during long running trajectories and you can persist information and you can persist context across different invocations of the agent in things like a cloudmd file or an agent MD file or in the case of deep agent CLI a memories directory.

Now another benefit of the file system is that you can actually offload actions from tools to just scripts. Now what do I mean by this? We want agents to perform actions.

Let's say we want to give an agent 10 different actions.

Often you can think about that as okay, for every action I'm defining a unique tool.

I'll bind all those tools to the agent.

So I have an agent with 10 different tools.

Now the LM in that agent has to determine when to use each of those 10 tools. And you also have to load all those tool instructions into the system prompt. So there's two problems there.

One is confusion in terms of what tool to use.

And two, you're also bloating your instructions with a bunch of tool descriptions.

Now look, with three or four or even 10 tools, it's not a big issue. But if we talk about hundreds of tools, this can be significant tokens just spent on all the tool descriptions.

So one principle and in the webinar with Manis, we cover this in depth is actually keeping the function calling layer very lightweight. So give the agent only a few functions to call, but make sure they're are very general atomic functions that can do lots of things and push a lot of the actions out to something like scripts in a file system.

So for example, Manis gives the agent like a bash tool and file system manipulation tools.

And with those two things, it can just search a directory of scripts using various tools to navigate the file system and execute any one using the bash tool. So with like three or four simple tools for file manipulation as well as code execution, it can perform a very large number of actions as specified by the scripts that you give it. And so that's a way to expand the action space of the agent significantly while only giving it access to a small number of tools.

And this principle we see repeatedly if you look at cloud code, Boris Churnney and Cat Woo, the engineering and product leads of Cloud Co, were recently on a great podcast.

I have the link here where they mention that cloud code is only using around a dozen tools.

And when you're using it, you can kind of see it uses glob grip. It uses bash.

It uses fetch to grab URLs, but it's not using that many tools. It's only about a dozen.

Manis is using less than 20 tools.

With deep agents, we actually only have eight native tools.

And with the deep agent CLI, we have 11 native tools.

I'll show those below.

Now, a related idea is progressive disclosure of actions.

Anthropic talks about this specifically in its recent release of skills and this is an interesting quote from a nice blog post that I link here.

Claude skills are very simply a skills folder which a bunch of with a bunch of subfolders each of which is a specific skill and each subfolder just has this skill MD file a markdown file with a header.

The header just explains in very brief language what that skill does.

The header is the only thing that's loaded into cloud code initially and you can see in this diagram that's exactly what they show here.

So there's a brief snippet about each skill available.

Now in the case of claude skills if cla wants to use any given skill it just then can selectively read the full skill.

md file. So again just the header is read into the system prompt by default.

If Claude wants to actually execute a skill, it'll read that full skill MD file. Now, that skillmd file can reference any other files in that same skill directory. So, it could contain scripts.

It could contain other files that contain more context.

And so, what's really nice is Claude with only its bash tool as an example can just go ahead and read the full skill MD file and then if needed can execute any other scripts in that same skill directory or read any other files in. So, it's just a nice way of progressively disclosing actions to Claude without loading all that into the system prompt ahead of time and importantly without binding all those different capabilities or skills as tools.

Remember, you're only using for example in the simplest case the bash tool to read the skillmd file and then to execute any scripts in that skill folder or read any other files in that folder as well. So, I think about this as a very simple way to give agents access to different actions in a way that saves tokens because they're progressively disclosed only if in this case Claude needs the skill.

And it's only using simple built-in tools like the bash tool and maybe some file manipulation tools. So, Manis is using a very similar principle.

The manis agent has access to a large number of different scripts and it can discover those scripts using its native file search as well as bash tools.

Now, we don't yet have this notion of skills in the deep agent CLI, but I'm actually working on adding that right now because I think it's a very nice way to give an agent access to lots of actions without bloating its context window with instructions and without having to bind additional tools.

Now I do just want to briefly make it even more crisp what specific tools are in the deep agents package just to highlight this point that often we're seeing agents ship with small numbers of general atomic tools.

So deep agents package only has basic tools for file manipulation a task tool for creating subtasks with sub aents and a to-dos tool to generate to-dos.

The CLI extends this slightly with some search tools and a bash tool. Now let's talk about reducing context.

There's three interesting ideas here.

Compaction, summarization, and filtering.

So first I'll talk a little bit about what Manis does.

So Manis uses this idea of compaction.

So this on the left is showing a trajectory of tool calls and tool results.

And of course tool results can be quite tokenheavy. Now what they do is they just compact old tool results by saving their full result to a file and just referencing that file in the message history.

Now, they only do this with what you might call stale tool results that have already been acted on, but it's a very nice way to reduce tokens in the message history.

And so, this is kind of a neat diagram that they showed. Imagine your agent's running.

It's performing many turns.

So, after some number of turns, you get very close to the context window of the LM.

And that's when they apply this compaction.

So, they take all the historical tool results.

They're all bloating that message history and they compact them all down, offload them to the file system and that brings down the overall context utilization significantly.

The agent keeps running and this progressively starts to saturate and then they apply summarization.

So summarization looks at the entire message history which includes the full tool result messages and summarizes it all down to much more compact distilled summary which then the agent can use and you can see goes forward.

One interesting point is that this compaction step is actually reversible because you can always go back and look at the raw tool results which are saved to these files.

That's another benefit using the file system.

Summarization though is not. So that is a step that needs to be carefully thought through because when you do summarization you necessarily lose information.

Now you see these ideas employed by Enthropic as well.

So, Enthropic recently shipped context editing which just prunes the message history of old two results in a configurable manner and cloud code applies summarization when you hit around 95% of the context window.

Now, the deep agents package applies summarization with summarization middleware and so that automatically kicks off after some threshold 170,000 tokens and it preserves some number of messages.

Of course, is all open source and configurable.

Now, one of the other things employed in the deep agents package and CLI is that file system middleware will actually filter large tool results, which is a nice way to prevent excessively large tool results from being passed directly to the LM. Now, finally, let's talk about context isolation. This is a technique that we've seen employed repeatedly.

And this is a pretty simple idea.

Many tasks performed by an agent can be assigned to a sub agent.

That sub aent has its own context window.

And so it can start fresh on a particular task, particularly if that task is nicely self-contained, execute that task and just return the output back to the parent agent.

And that's this first pattern shown here. And this was discussed by Manis as well.

This communication pattern. So you have a parent or main agent. It wants to spawn a sub aent to do some task.

It passes some instructions to that sub aent.

The sub aent churns along and passes that result back to the main agent.

That's a very common pattern. Now there is some nuance here.

Sometimes you want to actually share more context with that sub agent and actually manice allows for sharing the full message history that the parent has with the sub agent.

Similarly with deep agents similarly with a deep agent CLI the sub agent actually has access to the same file system as the parent.

So there is some shared context between them.

So just to summarize agent harnesses typically employ at least three principles for managing context.

offloading, reducing and isolating.

So some of the most common ideas in context offloading include using the file system.

We see that across the board.

Cloud code, manice and the deep agent CLI all support use of the file system.

Enabling user memories.

This is intuitively the ability to remember information across agent invocations.

Cloud enables it with cloud MD.

Deep agent CLI has a memories folder as a memories directory as well as agents MD.

Manis also supports cross- session memory.

Use minimal tools. This can significantly save tokens in terms of tool descriptions and minimize the number of decisions that the agent has to make across different tools.

Cloud code uses only around a dozen tools.

Manis is less than 20.

DB agent CLI is 11. Give the agent a computer i.

e. a bash tool.

All these agent harnesses do that.

Progressive disclosure of actions. So cloud code does this with skills. Manis does this by basically giving the agent access to a directory with a whole bunch of different scripts and letting it peruse that directory on an as needed basis using its existing file system and bash tools.

Skills for deep agent CLI are work in progress. Now this idea of compaction basically pruning old tool messages.

Manis for sure does it.

The cloud SDK does support it in this idea of context editing they call it.

I assume it's being done in cloud code but I'm not positive. So actually should flag I should probably flag this as yellow because I'm not entirely sure but I imagine it is being done.

We know for sure that the cloud code does summarization once you hit around 95% of the context window.

Manis does this. Deep agent CLI does this.

And all three support sub agents for isolating different tasks to unique context windows.

Now the deep agent CLI is open source.

contributions are welcome and it's fun to try to employ the these ideas in open source harness that can be used with many different models. So hopefully this was a useful overview of how these principles operate across different popular agent harnesses and how they're being used in the deep agent CLI.

And any questions or contributions are very welcome.

Thanks.

Loading...

Loading video analysis...