Prompting for Agents | Code w/ Claude
By Anthropic
Summary
## Key takeaways - **Agents: Models using tools in a loop**: Agents are defined as models that utilize tools in a continuous loop, taking a task and independently working towards its completion by updating decisions based on tool call feedback. [01:29] - **Agent Use Cases: Complex, Valuable Tasks**: Agents are best suited for complex and valuable tasks, not for every scenario, as using them inappropriately can lead to wasted resources and suboptimal results. [02:22] - **Think Like Your Agent**: To effectively prompt agents, it's crucial to develop a mental model of their environment and actions, simulating their process to identify potential confusion or errors. [07:28] - **Reasonable Heuristics for Agents**: Provide agents with clear, reasonable heuristics, such as 'stop searching when the answer is found' or setting budgets for tool calls, to guide their behavior and prevent unnecessary actions. [08:11] - **Iterative Prompt Development**: Prompt development for agents should start simple and iterate based on observed failures or edge cases, gradually adding instructions and examples as needed for production consistency. [26:37] - **LLMs as Judges for Evaluation**: Leveraging LLMs with a clear rubric as judges can effectively evaluate agent outputs, offering robustness to variations in structure and content, which is crucial for complex agentic tasks. [22:40]
Topics Covered
- Don't Deploy Agents Everywhere: When Are They Truly Valuable?
- Master Agent Prompting: Think Like Your Agent, Not Just Words.
- Agents Are Unpredictable: Guide Thinking, Anticipate Side Effects.
- Extend Agent Context: Use Compaction, Files, and Sub-Agents.
- Evaluating Agents: Start Small, Use LLMs as Judges.
Full Transcript
All right, thank you. Thank you everyone for joining us. Uh, so we're picking up with prompting
for agents. Um, hopefully you were here for prompting 101 or maybe you're just joining us. U,
but I'll give a little intro. My name is Hannah. I'm part of the applied AI team in Anthropic. Hi,
I'm Jeremy. I'm on our applied AI team as well and I'm a product engineer. Uh, so we're going
to talk about prompting for agents. So, we're going to switch gears a little bit, move on from
the basics of prompting, um, and talk about how we do this for agents like playing Pokemon. Uh,
so hopefully you were here, uh, for prompting 101 or maybe you have some familiarity with
basic prompting. So, we're not going to go over um the really kind of basic console prompting or
interacting with Claude and the desktop today. But just a refresher, uh, we think about prompt
engineering as kind of programming in natural language. you're thinking about what your agent
or your model is going to be doing, what kind of tasks it's accomplishing. You're trying to
clearly communicate to the agent, give examples where necessary, um, and give guidelines. Uh,
we do, you know, follow kind of a very specific structure for console prompting. I want you to
remove this from your mind because it could look very different for an agent. So, for an agent,
you may not be laying out this type of very structured prompt. Uh, it's actually going to
look a lot different. We're going to allow a lot of different things to come in. So,
I'm going to turn it over I'm going to talk about what agents are and then I'll turn it over to
Jeremy to talk about how we do this for agents. So, hopefully you have a sense in your mind of
what an agent is. At Anthropic, we like to say that agents are models using tools in a loop. So,
we give the agent a task and we allow it to work continuously and use tools as it thinks fit. Um,
update its decisions based on the information that it's getting back from its tool calls and
continue working independently until it completes the task. So that's we kind of keep it as simple
as that. Um the environment which is where the agent is working, the tools that the agent has
and the system prompt is just where we tell the agent what it should be doing or what it should be
accomplishing. And we typically find the simpler you can keep this the better. Allow the agent to
do its work. Allow the model to be the model and kind of work through this task. So when do you use
agents? You do not always need to use an agent. In fact, there's many scenarios in which you won't
actually want to use an agent. There are other approaches that would be more appropriate. Um,
agents are really best for complex and valuable tasks. It's not something you
should deploy in every possible scenario. You will not get the results that you want. Um,
and you'll spend a lot more resources than you maybe need to. So, we'll talk a little
bit about checklist or or kind of ways of thinking about when you should be using an agent and maybe
you don't want to be using an agent. So, is the task complex? Is this a task that you, a human,
can think through a step-by-step process to complete? If so, you probably don't need an
agent. You want to use an agent where it's not clear to you how you'll go about accomplishing the
task. You might know where you want to go, but you don't know exactly how you're going to get there,
what tools, and what information you might need to arrive at the end state. Is a task valuable?
Are you going to get a lot of value out of the agent accomplishing this task? Or is this a kind
of a low value uh task or workflow? In that case, a workflow might also be better. You don't really
want to be using the resources of an agent unless this is something you get that's highly leveraged.
It's maybe revenue generating. It's something that's really valuable to your user. Again,
it's something that's complex. Uh the last next piece is are the parts of the task doable? So,
when you think about the task that has to occur, would you be able to give the agents the tools
that it needs in order to accomplish this task? If you can't define the tools or if you can't
give the agent access to the information or the tool that it would need, you may want to
scope the task down. Um, if you can define and give to the agent the tools that it would want,
that's a better use case for an agent. The last thing you might want to think about is the cost of
errors or how easy it is to discover errors. So, if it's really uh difficult to correct an error or
detect an error, that is maybe not a place where you want the agent to be working independently.
you might want to have a human in the loop in that case. If it the error is something that
you can recover from or if it's not too costly to have an error occurring, then you might continue
to allow the agent to work independently. So to make this a little bit more real, uh we'll talk
about a few examples. I'm not going to go through each single one of these, but let's pick out a few
that will be pretty clear or intuitive for most of us. So coding, obviously, um all of you are very
familiar with using agents and coding. Uh coding is a great use case. We can think about something
uh like a design document. And although you know where you want to get to, which is raising a PR,
you don't know exactly how you're going to get there. It's not clear to you what you'll build
first, how you'll iterate on that, what changes you might make along the way depending on what
you find. Um this is high value. You're all very skilled. If an agent, okay, if an agent is able,
this is like more like what the midway is like at night. I feel I feel more at home now. Um,
uh, Claude Claude is great at coding. Um, and this is a high value use case, right? If your agent is
actually able to go from a design document to a PR, that's a lot of time that you, a highly
skilled engineer, are saved and you're able to then spend your time on something else that's
higher leverage. So, great use case for agents. A couple other examples I'll mention here. Um,
maybe we'll talk about the the cost of error. So, search, if we make an error in the search,
there's ways that we can correct that, right? So we can use citations, we can use other methods of
double-checking the results. So if the agent makes a mistake in the search process, this is something
we can recover from and it's probably not too costly. Computer use, um, this is also a place
where we can recover from errors. We might just go back, we might try clicking again. It's not, uh,
too difficult to allow Claude just to click a few times until it's able to use the tool properly.
Um, data analysis, I think, is another interesting example, kind of analogous to coding. We might
know uh the end result that we want to get to. We know a set of insights that we want to gather
out of data or a visualization that we want to produce from data. We don't know exactly what the
data might look like. Uh so the data could have different formats. It could have errors in it.
It could have other uh it could have granularity issues that we're not sure how to disagregate. We
don't know the exact process that we're going to take in analyzing that data, but we know where we
want to get in the end. Um so this is another example of a great use case for agents. Uh,
so hopefully these make sense to you and I'm going to turn it over to Jeremy now. He has some really
rich experience building agents and he's going to share some best practices for actually prompting
them well and how to structure a great prompt for an agent. Thanks Hannah. Hi all. Um, yeah,
so prompting for agents. Um, I think some things that we think about here, I I'll go over a few of
them. We've learned these experiences mostly from building agents ourselves. So some agents that you
can try from enthropic are cla code which works in your terminal and sort of agentically browses your
files and uses the bash tool to really accomplish tasks um in coding. Similarly we have our new
advanced research feature in cloud.ai and this allows you to do hours of research. For example,
you can find hundreds of startups building agents or you can find hundreds of potential prospects
for your company. And this allows the model to do research across your tools, your Google Drive,
web search and stuff like that. And so in the process of building these products, one things
that we learned is that you need to think like your agents. This is maybe the most important
principle. Um the idea is that essentially you need to understand and develop a mental model
of what your agent is doing and what it's like to be in that environment. So the environment for the
agent is a set of tools and the responses it gets back from those tools. In the context of cloud
code, the way you might do this is by actually simulating the process and just imagining if you
were in cloud code's shoes given the exact tool descriptions it has and the tool schemas it has,
would you be confused or would you be able to do do the task that it's doing? If a human can't
understand what your agent should be doing, then an AI will not be able to either. And so this is
really important for thinking about tool design, thinking about prompting is to simulate and go
through their environment. Another is that you need to give your agents reasonable heristics.
And so, you know, Hannah mentioned that prompt engineering is conceptual engineering. What does
that really mean? It's one of the reasons why prompt engineering is not going away and why I
personally expect prompting to get more important, not less important as models get smarter. This is
because prompting is not just about text. It's not just about the words that you give the model. It's
about deciding what concepts the model should have and what behaviors it should follow to perform
well in a specific environment. So for example, cloud code has the concept of irreversibility.
It should not take irreversible actions that might harm the user or harm their environment.
So it will avoid these kinds of harmful actions or anything that might cause irreversible damage
to your environment or to your code or anything like that. So that concept of irreversibility is
something that you need to instill in the model and be very clear about and think about the edge
cases. How might the model in misinterpret this concept? How might it not know what it
means? For example, if you want the model to be very eager and you want it to be very agentic,
well, it might go over the top a little bit. It might misinterpret what you're saying and do more
than what you expect. And so, you have to be very crisp and clear about the concepts you're giving
the models. Um, some examples of these reasonable heristics that we've learned. One is that while
we were building research, we noticed that the model would often do a ton of web searches when
it was unnecessary. For example, it would find the actual answer it needed. like maybe you would find
a list of scaleups in the United States and then it would keep going even though it already had the
answer and that's because we hadn't told the model explicitly when you find the answer you can stop
you no longer need to keep searching uh similarly we had to give the model sort of budgets to think
about for example we told it that for simple queries it should use under five tool calls
but for more complex queries it might use up to 10 or 15 so these kinds of heruristics that you might
assume the model already understands you really have to articulate clearly. A good way to think
about this is that if you're managing maybe a new intern who's fresh out of college and has not had
a job before, how would you articulate to them how to get around all the problems they might get
run into in their first job? And how would you be very crisp and clear with them about how to
accomplish that? That's often how you should think about giving heristics to your agents,
which are just general principles that it should follow. They may not be strict rules,
but they're, you know, sort of practices. Another point is that tool selection is key.
So as models get more powerful able to handle more and more tools. Sonnet 4 and Opus 4 can handle
you know up to a hundred tools even more than that if you have great prompting. But in order
to use these tools you have to be clear about which tools it should use for different tasks.
So for example for research we can give the model access to Google Drive. We can give it access to
MCP tools like Sentry or Data Dog or GitHub. It can search across all these tools, but the model
doesn't know already which tools are important for which tasks. Especially in your specific
company context. For example, if your company uses Slack a lot, maybe it should default to searching
Slack for company related information. All these questions about how the model should use tools,
you have to give it explicit principles about when to use which tools and in which contexts. Um,
and this is really important and it's often something I see where people don't prompt the
agent at all about which tool to use and they just give the model some tools with some very
short descriptions and then they wonder like why isn't the model using the right tool? Well,
it's likely because the model doesn't know what it should be doing in that context. Another point
here is that you can guide the thinking process. So people often sort of turn extended thinking on
and then let their agents run and assume it will get out of the box better performance. Actually
that assumption is true. Most of the time you will get out of the box better performance, but you can
squeeze even more performance out of it if you just prompt the agent to use its thinking well.
So for example, for search, what we do is tell the model to plan out its search process. So in
advance, it should decide how complicated is this query? How many tool calls should I use here? What
sources should I look for? How will I know when I'm successful? We tell it to plan out all these
exact things in its first thinking block. And then a new capability that the cloud 4 models have is
the ability to use interled thinking between tool calls. So after getting results from the web, we
often find that models assume that all web search results are true, right? They don't have any,
you know, we we haven't told them explicitly that this isn't the case. And so they might take these
web results and run with them immediately. So, one thing we prompted our models to do is to use
this interleaf thinking to really reflect on the quality of the search results and decide if they
need to verify them, if they need to get more information, or if they should add a disclaimer
about how the results might not be accurate. Um, another point with when prompting agents is that
agents are more unpredictable than workflows or just, you know, classification type prompts.
Most changes will have unintended side effects. This is because agents will operate in a loop
autonomously. And so for example, if you tell the agent, you know, keep searching until you find the
correct answer, you know, find the highest quality possible source and always keep searching until
you find that source. What you might run into is the unintended side effect of the agent just not
finding any sources. Maybe this perfect source doesn't exist for the for the query. And so it
will just keep searching until it hits its context window. And that's actually what we ran into as
well. And so you have to tell the agent if you don't find the perfect source, that's okay. You
can stop after a few tool calls. Um, so just be aware that your prompts may have unintended side
effects and you may have to roll those back. Another point is to help the agent manage its
context window. The Cloud 4 models have a 200k token context window. Um, this is long enough for
a lot of longrunning tasks, but when you're using an agent to do work autonomously, you may hit this
context window and there are several strategies you can use to sort of extend the effective
context window. One of them that we use for cloud code is called compaction. And this is just a tool
that the model has um that will automatically be called once it hits around 190,000 tokens. So near
the context window. And this will summarize or compress everything in the context window
to a really dense but accurate summary that is then passed to a new instance of claude with the
summary. And it continues the process. And we find that this essentially allows you to run infinitely
with cloud code. You almost never run out of context. um occasionally it will miss details
from the previous session but the vast majority of the time this will keep all the important details
and the model will sort of remember what happened in the last session. Similarly you can sort of
write to an external file. So the model can have access to an extra file and these cloud for models
are especially good at writing memory to a file and they can use this file to essentially extend
their context window. Another point is that you can use sub aents. Um, we won't talk about this
a lot here, but essentially if you have agents that are always hitting their context windows, you
may delegate some of what the agent is doing to another agent. Um, which can sort of, for example,
you can have one agent be the lead agent and then sub agents do the actual searching process. Then
the sub agents can compress the results to the lead agent in a really dense form that doesn't
use as many tokens and the lead agent can give the final report to the user. So we actually use this
process in our research system and this allows you to sort of compress what's going on in the search
and then only use the context window for the lead agent for actually writing the report. So this
kind of multi- aent system can be effective for limiting the context window. Finally,
you can let Claude be Claude. And essentially what this means is that Claude is great at being
an agent already. You don't have to do a ton of work at the very beginning. So, I would recommend
just trying out your system with sort of a bare bones prompt and barebones tools and seeing where
it goes wrong and then working from there. Don't sort of assume that Claude can't do it ahead of
time because cloud often will surprise you with how good it is. Um, I talked already about tool
design, but essentially the key point here is you want to make sure that your tools are good. Um,
what is a good tool? It will have a simple accurate tool name that reflects what it does.
You'll have tested it and make sure that it works well. um it'll have a well-formed description
so that a human reading this tool like imagine you give a function to another engineer on your
team would they understand this function and be able to use it. You should ask the same question
about the agent computer interfaces or the tools that you are giving your agent. Make sure that
they're usable and clear. Um we also often find that people will give an agent a bunch of tools
that have very similar names or descriptions. So for example, you give it six search tools
and each of the search tools searches a slightly different database. This will confuse the model.
So try to keep your tools fairly distinct um and combine similar tools into just one.
So, one quick example here is just that you can have an agent, for example, use these different
tools to first search the inventory in a database, run a query. Based on the information it finds, it
can reflect on the inventory, think about it for a little bit, then decide to generate an invoice,
generate this invoice, think about what it should do next, and then decide to send an email. And so,
this loop involves the agent getting information from the database, which is its external
environment, using its tools, and then updating based on that information. until it accomplishes
the task. And that's sort of how agents work in general. So, let's walk through a demo real
quick. I'll switch to my computer. Um, so you can see here that this is our console. The console is
a great tool for sort of simulating your prompts and seeing what they would look like in a UI. Um,
and I use this while we were iterating on research to sort of understand what's really going on and
what the agents doing. This is a great way to think like your agents and sort of put yourself
in their shoes. So, you can see we have a big prompt here. Um, it's not sort of super long.
It's around a thousand tokens. It involves the researcher going through a research process. We
tell it exactly what should what it what it should plan ahead of time. We tell it how many tool
calls it should typically use. We give it some guidelines about what facts it should think about,
what makes a high quality source, stuff like that. And then we tell it to use parallel tool
calls. So, you know, run multiple web searches in parallel at the same time rather than running them
all sequentially. Then we give it this question. How many bananas can fit in a Rivian R1S? This
is not a question that the model will be able to answer because the Rivian R1S came out very
recently. It's a car. It doesn't know in advance all the specifications and everything. So, it'll
have to search the web. Let's run it and see what happens. You'll see that at the very beginning,
it will think and break down this request. And so, it realizes, okay, web search is going to
be helpful here. I should get cargo capacity. I should search. Um, woo. Um, and you see here
it ran two web searches in parallel at the same time. That allowed it to get these results back
very quickly. And then it's reflecting on the results. So it's realizing, okay, I found the
banana dimensions. I know that a USDA identifies bananas as 7 to 8 in long. I need to run another
web search. Let me convert these to more standard measurements. You can see it's using tool calls
interled with thinking, which is something new that the quad 4 models can do. Finally,
it's running some calculations. It's about how many bananas could be packed into the cargo space
of the truck. And it's running a few more web searches. You can see here that this is a fairly
pending
approximately 48,000 bananas. I've seen the model estimate anything between 30,000 50,000. I think
the right answer is around 30,000. So this is this is roughly correct. Um going back to the slides,
I think that you know this this sort of approach of testing out your prompt, seeing what tools
the model calls, reading its thinking blocks, and actually seeing how the model's thinking
will often make it really obvious. um what the issues are and what's going wrong. So you'll
test it out and you'll just see like okay maybe the model's using too many tools here,
maybe it's using the wrong sources or maybe it's just following the wrong guidelines. Um
so this is a really helpful way to sort of think like your agents and make them more concrete.
Um switching back to the slides.
Okay, so eval evaluations are really important for any system. Um, they're really important
for systematically measuring whether you're making progress in your prompt. Very quickly,
you'll notice that it's difficult to really make progress on a prompt if you don't have an eval
that tells you meaningfully whether your prompt is getting better and whether your system is getting
better. But eval are much more difficult for agents. Um, agents are longunning. They do a bunch
of things. They may not they may not always have a predictable process. classification is easier to
eval because you can just check did it classify this output correctly but agents are harder. So
a few tips to make this a bit easier. One is that the larger the effect size the smaller the sample
size you mean you need um and so this is sort of just a principle from science in general where
if an effect size is very large for example if a medication will cure people immediately you don't
really need a large sample size of a ton of people to know that the model is that that this treatment
is having an effect. Similarly, when you change a prompt, if it's really obvious that the system is
getting better, you don't need a large eval. I often see teams think that they need to set up
a huge eval of like hundreds of test cases and make it completely automated when they're just
starting out building an agent. This is a failure mode and it's an antiattern. You should start out
with a very small eval and just run it and see what happens. You can even start out manually. Um,
but the important thing is to just get started. I often see teams delaying evals because they
think that they're so intimidating or that they need such a sort of intense eval to really get
some signal, but you can get great signal from a small number of test cases. You just want to
keep those test cases s consistent and then keep testing them so you know whether the model and
the prompt is getting better. You also want to use realistic tasks. So don't just sort of come
up with arbitrary prompts or descriptions or tasks that don't really have any real
correlation to what your system will be doing. For example, if you're working on coding tasks,
you don't won't want to give the model just competitive programming problems because this is
not what real world coding is like. You'll want to give it realistic tasks that really reflect what
your agent will be doing. Similarly, in finance, you'll want to sort of take tasks that real people
are trying to solve and just use them to evaluate whether the model can do those. This allows you
to really measure whether the model is getting better at the tasks that you care about. Another
point is that LLM is judge is really powerful, especially when you give it a rubric. So agents
will have lots of different kinds of outputs. For example, if you're using them for search,
they might have tons of different kinds of search reports with different kinds of structure. But LMS
are great at handling lots of different kinds of structure and text with different characteristics.
And so one thing that we've done, for example, is given the model just a clear rubric and then
ask it to evaluate the output of the agent. For example, for search tasks, we might give
it a rubric that says, check that the model, you know, um, looked at the right sources,
check that it got the correct answer. In this case, we might say, um, check that the model
guessed that the amount of bananas that can fit in a Rivian R1s is between like 10,000 and 50,000.
Anything outside that range is not realistic. So, you know, you can use things like that to sort of
benchmark whether the model is getting the right answers, whether it's following the right process.
At the end of the day though, nothing is a perfect replacement for human evals. You need to test the
system manually. You need to see what it's doing. You need to sort of look at the transcripts, look
at what the model is doing, and sort of understand your system if you want to make progress on it.
Here are some examples of eval. So one example that I sort of showed uh talked about is answer
accuracy. And this is where you just use an LLM as judge to judge whether the answer is accurate.
So for example in this case you might say the agent needs to use a tool to query the number
of employees and then report the answer and then you know the number of employees at your company.
So you can just check that with an LM as judge. The reason you use an LMS as judge here is because
it's more robust to variations. For example, if you're just checking for the integer 47 in this
case in the output that is not very robust and if the model says 47 as text you'll grade it
incorrectly. So you want to use an LMS as judge there to be robust to those minor variations.
Another way you can eval agents is tool use accuracy. Agents involve using tools in a
loop. And so if you know in advance what tools the model should use or how it should use them,
you can just evaluate if it used the correct tools in the process. For example, in this case, I might
evaluate the agent should use web search at least five times to answer this question. And so I could
just check in the transcript programmatically did the tool call for web search appear five times or
not. Similarly, you might check in this case for in response to the question book a flight,
the agent should use the search flights tool and you can just check that programmatically
and this allows you to make sure that the right tools are being used at the right times. Finally,
a really good eval for agents is tobench. You can sort of look this up. Towen is a sort of open
source benchmark that shows that you can evaluate whether agents reach the correct final state.
So a lot of agents are sort of modifying a database or interacting with a user in a way
where you can say the model should always get to this state at the end of the process. For example,
if your agent is a customer service agent for airlines and the user asks to change their flight
at the end of the agentic process in response to that prompt, it should have changed the flight in
the database. And so you can just check at the end of the agentic process, was the flight changed?
was this row in the database changed to a different date and that can verify that the
agent is working correctly. This is really robust and you can use it a lot in a lot of different use
cases. For example, you can check that your database is updated correctly. You can check
that certain files were modified, things like that as a way to evaluate the final state that
the agent reaches. And that's it from us. Um, we're happy to take your questions. [Applause]
Can you talk about building prompts for agents? Are you giving it kind of long longer prompts
first and then iterating or you starting kind of chunk by chunk? Uh what's that look like?
And can you show sort of a little bit more on that thought process? That's a great question. Um can
I switch back to my screen actually? I just want to sort of show the demo. Thank you. Um, yeah. So,
you can see this is sort of a final prompt that we've arrived at, but this is not where
we started. I think the answer to your question is that you start with a short simple prompt.
Um, and I might just say search the web aentically. I'll change this to a different
question. Um, how good are the Cloud 4 models and then we'll just run that. And so you'll
want to start with something very simple and just see how it works. You'll often find that Claude
can do the task well out of the box. But if you have more needs and you need it to operate really
consistently in production, you'll notice edge cases or small flaws as you test with more use
cases. And so you'll sort of add those into the prompt. So I would say building an agent prompt
what it looks like concretely is start simple, test it out, see what happens, iterate from there,
start collecting test cases where the model fails or succeeds and then over time try to increase the
number of test cases that pass. Um, and the way to do this is by sort of adding instructions,
adding examples to the prompt. But you really only do that when you find out what the edge
cases are. And you can see that it thinks that the models are indeed good. So that's great.
when I do like normal prompting and it's not agentic, uh I'll often give like a few shot
example of like, hey, here's like input, here's output. This works really well for
like classification tasks, stuff like that, right? Uh is there a parallel here in this like agentic
world? Are you finding that that's ever helpful or should I not think about it that way? That is
a great question. Yeah. So should you include fewshot examples in your prompt and sort of
traditional prompting techniques involve like giving the saying the model should use a chain
of thought and then giving a few shot examples like a bunch of examples to imitate. We find
that these techniques are not as effective for state-of-the-art frontier models and for agents.
Um the main reason for this is that if you give the model a bunch of examples of exactly what
process it should follow, that just limits the model too much. These models are smarter than
you can predict and so you don't want to tell them exactly what they need to do. Similarly,
chain of thought has just been trained into the models at this point. The models know to think
in advance. They don't need to be told like use chain of thought. But what we can do here is one
you can tell the model how to use its thinking. So you know I talked about earlier rather than
telling the model you need to use a chain of thought. It already knows that. You can just
say use your thinking process to plan out your search or to plan out what you're going to do
in terms of coding. Reme or you can tell it to remember specific things in its thinking process
and that sort of helps the agent stay on track. As far as examples go, um you'll want to give
the model examples but not too prescriptive. I think we are out of time, but you can come
up to me personally and I'll talk to you all after. Thanks. Thank you. Thanks for coming.
Loading video analysis...