Build Your Own Deep Research Agent with Ivan Leo (Google DeepMind)
By Vanishing Gradients
Summary
Topics Covered
- Meta-Prompting: Asking the Model to Improve Its Own Instructions
- Verify First, Optimize Later: The Inverse of Classic ML
- Progressive Disclosure: Don't Pre-load Everything
- State Hooks as Production Guardrails Beyond Prompts
- Sub-agents as Context Filters for Main Agent Preservation
Full Transcript
and we are live. What is up Ivan Leo?
Hey, what's up you? Good to see you after a long time. I think the last time we met like for the previous AI assistant workshop. Yeah,
assistant workshop. Yeah, exactly. Exactly. So, welcome everyone.
exactly. Exactly. So, welcome everyone.
And Ivan and I actually recently did a workshop on building your own open core.
Really the the premise was to build agents that build themselves. and being
able to get them to build their own extensions, hooks, and skills in real time and hot reload so you don't even need to restart them was so powerful. Uh
love to welcome you all to this workshop on building your own uh deep research agent. We've put in YouTube that um the
agent. We've put in YouTube that um the Q&A will happen in Discord. Uh so please do uh join the discord in the workshops channel and I'm actually just going to put a link there to the previous
workshop we did. I've put the GitHub repository there as well. say hi,
introduce yourself, uh, and and let us know what your interest in such things is or are like. Do you work in AI? Do
you work in ML or or what's up? Um,
Ivan, congratulations on your new job at DeepMind.
Yeah, thanks so much, man. I'm really
excited to start on the team. I started
on Monday and it's been absolutely incredible. I think everyone is so
incredible. I think everyone is so friendly and I think, you know, I'm just really hoping to spread the good word about Gemini.
Absolutely. And uh on top of that, this is pretty funny, but um last year I did many workshops with a good friend of mine, Ravenkumar at at DeepMind um who
then was working a lot on Gemma. And we
actually put out 10 hours of free workshops on building AI products with Gemma. I'm going to link to that in the
Gemma. I'm going to link to that in the Discord as well. What's funny is after I did these, I was like, "Oh, okay. I got
to like diver diversify who I'm doing workshops with at the moment." Um and so you among other people. But hey, look, it's chill and you're all doing such amazing work at at Deep Mind as well.
Um, hey everyone. Uh, we'll get started in a minute or two, but please do um hit like and subscribe. Share this with a friend who might be interested right now. That's the best way to support the
now. That's the best way to support the channel. Uh, we have upcoming events as
channel. Uh, we have upcoming events as well. Check out Luma in the description
well. Check out Luma in the description in YouTube. On top of that, I'd like to say
YouTube. On top of that, I'd like to say a few words of introduction to uh introduce Ivan today. Ivan, um we met
when you were working with Jason Leu, right?
I think we met a long time ago when I was working with Jason back then. We're
doing a lot of course and instructor.
Yeah, exactly. Uh, and you're also working on
exactly. Uh, and you're also working on something super cool called Kura, which we did some some workshops on or lightning lessons and that type of stuff that I'll also put a link to. But
something I really appreciate about your work on Kura essentially like with agent traces, how to how to like get signal from them. Um, and and CURA in terms of
from them. Um, and and CURA in terms of clustering and having observability and this type of stuff into agentic traces was super powerful. It wasn't it wasn't Jason who introduced us though. It was
Swix from latent space. Initially, I was like, "Hey, sweets, I'm going to Singapore." Introduced me to some cool
Singapore." Introduced me to some cool people. He was like, "You've got to meet
people. He was like, "You've got to meet Ivan. You took me out to Hawk Hawker
Ivan. You took me out to Hawk Hawker Markets and we had skewers and I'm still I mean, we we got good I told you I'm going to Chinatown and City Night. We
got good skewers here, but the [laughter] food in Singapore was amazing." Then you were at Manis,
was amazing." Then you were at Manis, which is super relevant to what we're uh doing today. You were working on Manis
doing today. You were working on Manis agent. Um, you built the uh Manis uh
agent. Um, you built the uh Manis uh Manis mail as well, which I've I I used a bunch of and now you're a Deep Mind.
Is there anything I've I've missed out there?
Yeah. No, I think like that roughly covers it. A lot of open source. I was a
covers it. A lot of open source. I was a man for a bit building like some of the Asians and then now Deep Mind trying to spread the good word on Gemini. Um, but
yeah, I can't believe it's been so long.
I feel like I was working on open source maybe like almost two years ago.
It's pretty wild, isn't it? So, I'm just going to share my screen to just show a slide or two just to um give a sense of what
we're up to uh today to give people a bit of structure. And let me sick.
Okay. So, we're going to have a brief intro to deep research. What we're
building at why and why. Then we're
going to jump into building. Um, so
we're going to build with Gemini. Uh,
start with a raw API call, then get some tools, uh, running and getting a runtime up and running. Then we're going to start adding state hooks, the agent loop. We'll see what all of this stuff
loop. We'll see what all of this stuff is. Then we're going to let it loose
is. Then we're going to let it loose with sub aents, planning, and observability. Um, a schematic of the
observability. Um, a schematic of the type of thing we'll build. This isn't
quite it because uh, Gemini does something far cleverer than what one can actually draw draw here, but it's going to be a prompt. The agent can ask clarifying questions if if needed. I
mean the canonical example is if you say give me tell me about restaurants in Chelsea um the agent hopefully will say do you mean London or Manhattan um planner will write a research plan and
then uh the main agent will orchestrate sub aents now something that's really cool about what we're building today is it's not just three sub aents it spawns sub aents that are needed manages
context around them returns results and then spins up more if necessary uh and then the main agent synthesizes a final report so with respect to this the brief architectural diagram. Is there anything
architectural diagram. Is there anything you'd want to add or clarify, Ivan?
Uh yeah, I think the only thing I'll add is that instead of just like a single run of like sub aents, really what we've done is we've just um written a way that the model can just spawn as many sub aents as it needs on demand. And so it
can do this as much as it needs. And so
it's kind of a thing you can just tune in prompt to say like how many rounds of iteration do you want to do? Uh maybe
you want the model to stop and ask more questions before it spawns more sub agents. And I think the goal today is
agents. And I think the goal today is just to build like some sort of to sort of show from scratch how you can do you can get a lot out of these models and especially with something like Gemini where you have especially like Gemini 3
flash and Gemini 3.1 Pro preview where you have very good pricing you have a model that is essentially has a very low latency and I think in general it's like
pretty good for agentic stuff right so so I do want to first talk before we dive in about why deep research and I
want to preface by saying as you know um I work with a lot of builders and I consult and advise a lot of builders and teach a lot of builders and a lot of the time um what people really need are LLM
workflows uh not you know or and agents embedded in workflows as well but not like fullyfledged uh agents um and I'll actually link to Anthropic's blog post
on building effective AI agents which spells out the types of workflows we're we're talking about there then when we do get to agents though a lot of the time um you don't want something as
massive as deep research. You want you want customer service agent or travel assistant or whatever it is and you want the time to resolution super quick one or two or three turns. You don't
actually have to actively manage any context or anything anything like that because there are a handful of retrieval tool calls and that type of stuff. So
I'm just wondering if you can talk a bit about in particular in particular with respect to your work at Manis um about when deep research is important and when building these types of agents really comes into its own. Yeah, I think um the
most important thing to sort of think about is like what is the metric that matters the most for your users. And so
for example, if you're talking about like customer service um support agents, really what they want is something that can quickly retrieve the relevant data, you know, answer very quickly. And
honestly, for something that's just saying, hey, when are your opening hours or hey, are you just open on Saturday, you know, at like 6 p.m. Um that's a single term sort of agent. Um if you think a lot about let's say a lot of the
new integrations for example that have rolled out in say um your email clients and even like some of your other like applications a lot of these are like two or three steps for example generate a
quick like response to me ba for this email based on our previous conversations right and so what you want over here is think about from the end user perspective you don't always need to throw like a really big heavy model I
I know a lot of people for example like to throw like opus 46 at everything but the simple truth is If you have like a very well- definfined task, if you have good evaluations like I think what you
cover in your course, um once you've verified that a model can actually do the task like the best model is like a GBD5 and Opus 46 or Gemini 3.1 Pro preview, um you can actually start experimenting with your evaluation set
to start going down to like cheaper models that can do it a lot faster and cheaper. And ultimately as you start
cheaper. And ultimately as you start building a product that needs to scale to thousands of users, hundreds of thousands and hopefully at some point like millions of users that 20% 10% or
even like 2% difference in cost can add up very quickly. And so I think like a lot of these are quick iterations. And I
think like you want to realize that you can tune basically every single part of this pipeline whether it's the model you're using the prom and also I think the most important thing is the tools that you give it. Um and I think we also
discussed a bit about this whereby now these models are good enough whereby you can basically meta do a lot of meta prompting. So when I was actually
prompting. So when I was actually building this um if you go to the repository and you look at say step eight to step nine you notice that the prompt got a lot bigger um it got a lot more detailed and actually a lot of it
was me just asking Gemini hey um here are some examples of writing I like it tried to do a simple bit of a rewrite and then I would basically say hey um based on the initial information that you understood the final result that I
wanted how could we better make it clearer to you right how could we make our instructions clearer how could we provide better example examples and in Gemini she gave some great examples and so you can start to see that a lot of
these techniques that wouldn't have been possible a year or two ago are really valuable now you can do things like ask models hey how should we describe the tools what sort of tools do you need
what are the descriptions of tools that you want and how like what sort of information are you not getting in your system prompt or in the context that we have and I think that sort of calls back to this idea of like context management
um it's okay to be a bit less careful I think sometimes with your context it can be a little bit dirty whereby you can just throw the raw context in without cleaning it with a very long pipeline. But it's very important to think about like the type
of context you're providing. Um I think we talked about this in context versus capabilities where if you ask the model, hey, what's the date today? And it
doesn't have a either the date in a prompt or a batch tool to run it on your local computer. It can never really
local computer. It can never really answer that question to you, right?
That's like a capability. A capability
is a tool you can execute to get the date. And a context would be while the
date. And a context would be while the data is in the prompt. And so I think like that's a lot of stuff that you should think about when building agents like what is the use case you care about. What is the metric you want to
about. What is the metric you want to push? Starting to think a lot about
push? Starting to think a lot about evaluations from day one and then realizing that for a lot of these cases um you want to start there are a lot of these different knobs that you can turn and at the end of the day having a clear
enough star is really important.
Totally. And one thing I do want to index on there was so much wonderful stuff in there. One thing is you mentioned when building something try it first with the best model to just see if
it's something you can do right and then you can go to cheaper lower latency uh what whatever it may be and it's interesting that that's kind of the inverse or the opposite of what we do classically in machine learning where
you know let's say you got binary classification maybe you start with a baseline of multiclass classifier or sorry of um majority majority classifier um and so you start with kind of the silliest dumbest idea and then that's
your baseline and we've kind of flipped it because we've got these powerful pre-trained models that we can use.
Yeah, I think like that's a funny thing about like working with agents nowadays is that a lot of people try to apply those same concepts. Um really what you want to really investigate when it comes to building agents is whether or not
this task is possible for a language model to create or to operate. And I
think like oftentimes you can do a lot less prompting when you use a really powerful model like Gemini 3.1 Pro Preview. Um, and so I think a lot of it
Preview. Um, and so I think a lot of it is just verifying that the task can be done and then afterwards optimizing it for a production use case. I think a lot of people start by thinking about the
cost like oh my god like um Gemini 3.0 preview is like so much more expensive than flash or like I want to use opus and it's like x number of dollars opus fast. But when you're actually building
fast. But when you're actually building a prototype or you're trying to show that your product works, uh, you should get it right first and then you should get it then you should make it like optimize after that, you know.
Totally. And also, so let's let's jump in in a second, but if it if the models don't seem capable today, that doesn't mean they won't be tomorrow. So dream
big. And I'm actually going to link to a podcast and I did on a blog post with Nicholas Moy, who he was head of AI research at Windsurf. So now at DeepMind working on anti-gravity among other
things in a different part of part of DeepMind to yourself, but you know he built at Windsurf the first multihop agent. Um and what they were trying to
agent. Um and what they were trying to do just wasn't possible. Like they they couldn't do it but they knew what they wanted. So they were doing something
wanted. So they were doing something else in the meantime. Then a particular um release of Claude came out and they saw one day that it that it worked and then they just then they saw it shoot
off. I see they saw all of their users
off. I see they saw all of their users going absolutely mad. Um, so you know, dream big and let let your models and agents fly.
Yeah, for sure. I think a lot of times they say, you know, spend today for the models of tomorrow. So like back then, let's say back in the day when this was like a while back, so I'm exposing how
long I've been in like LM space, but back then when you look use like GBD 3.5 versus GBD4, you know, a lot of people were saying like, "Oh, GBD4 is really expensive at this cost." and they they
kept trying to make things work on GP3.5. But if you spend money for the
GP3.5. But if you spend money for the models of tomorrow and you just want to use the best models to make sure it's possible to make sure this specific very hard case is being catered for um oftentimes you can find very surprising
results and then when the model prices drop again then it's beautiful.
Exactly.
Yeah.
So let's let's jump in man. Do you want to share your screen? I've linked to the GitHub repository. Um please do
GitHub repository. Um please do introduce yourself in Discord. say
what's up if you'd if you'd like to. Uh
links in the chat and in the YouTube description. On top of that, if you're
description. On top of that, if you're going to code, feel free to code along if if you want. I would encourage you to pay more attention to what we're building and the conversation. Like
don't don't get stuck on trying to get an API key working and fall behind. You
can always, you know, run the code after the fact. So, um
the fact. So, um yeah, here we go.
Yeah, I think we can start pulling. Um
Can you just zoom in a bit a little bit so you can see? Um I've zoomed in a little bit here. Let me just pull this down so everyone can take a look at the code.
Um this enough or maybe like one more.
Yeah, that's well one more would be great. And if you want to actually also
great. And if you want to actually also just open the read me to start dude when I dictate because I dictate most of my
stuff to LLM now like most speech to text because of my accent say raid me. R
A I D M E. So there you go. Funny things
with accents. Yeah.
Yeah. I I think it happens. Yeah. I I
just pray that the language model gods can understand what I say and they just transcribe everything.
AGI. [laughter]
That's it. All right. So what what are we doing, man?
Right. So today we're going to be building a deep research agent. I think
previously Hugo talked a bit about the different steps. So I think the way that
different steps. So I think the way that we're going to try running it today is we're going to have these four distinct steps. The first one is just um hello
steps. The first one is just um hello Gemini where we go from a raw API call to a tool runtime. And really what a tool runtime is over here is just that it's a simple class where we can execute
tools um very easily without hard coding a lot of it. It'll make a bit more sense down the line. And I think the point of the first section is really to kind of point out like why do you need two calling, right? Um and then the next
calling, right? Um and then the next part we're going to be talking about is like we're going to be introducing like state hooks and an agent loop and this is going to be corresponding to four,
five, six and seven where we have like sub agents for the first time. Um and
then sorry let me go back. So it'll be four, five and six. So this would be um it's alive. This will be hello Gemini
it's alive. This will be hello Gemini where we talk a bit about raw API call to tool runtime. And then this part over
here will be it's alive going for state hooks and BH loop. Um and then the last the last part will just be letting it
loose with sub agents planning and observability.
Yeah. So this this will roughly be the structure of the workshop today. Um at
each step we're going to just do a quick walk through on like what we've done.
Um, and if you have any questions, we're just going to I think basically talk about it at the end of the steps. I
think that's how kind of how I was thinking about it so that we have sections and then we can do a quick recap at each point.
Um, probably start that that sounds great. Feel free to say no. I'm just wondering would it be would
no. I'm just wondering would it be would it be crazy to just show the trace you sent me the other day in Logire so we can see where we're going over here. So,
um, let me just keep a bunch of these things. Right. So what we have over here
things. Right. So what we have over here is a single agent that is you know it first you know what I did is I sent a a an agent turn what I had is I sent a
message right so you can see an agent run where you have like the tools you have like the request config and then I said hey I love to learn about Apple Airpods right and the agent responded
over here saying that um you look at the bottom of the response let's see what the agent responded with I'll be happy to help you learn about Apple AirPods, but there's a lot of information out there. I would love to
narrow down what would be most useful for you. And so what happened is that I
for you. And so what happened is that I basically just responded um and if you look at the message, I said, "Yeah, I like a comparison between the different models and the evolution of the product.
Look for whatever is relevant." Um you notice that my my spelling is pretty horrible and that's because I do a lot of dictation nowadays. [snorts] And
basically what happened was just that the agent generated a plan. And so what does it mean to generate a plan is that the model basically said okay like I
have understood what the user wants. So
let me now generate a bunch of todos that I want to execute. So let's expand this uh over here. And you can see that with this the the model has sort of
taken an initial look at my query which is that can you give me a breakdown of the history and the the rough evolution of the Apple AirPods and it's broken it down into these distinct sections where it's researching the history and
evolution of Apple AirPods generating a technical specification and feature sets comparing the current models across different key metrics. And what it did first is that it first created a bunch
of todos. Um we and once you have these
of todos. Um we and once you have these like todos, it looked at it again. It
then decided okay like it's time for us to start searching and understanding.
And the way it does it over here is that it basically dispatches a bunch of sub aents. So let's uh expand this a little
aents. So let's uh expand this a little bit because it doesn't look so good. Um
and so what these sub aents are basically doing is that let me just scroll down then you can see what it's
done. um to call
done. um to call um so basically the model created a tool call over here and it wrote a bunch of queries that it wanted answered. So what
did the first generation Apple when did the first generation Apple Airpods launch? How did the standard Apple
launch? How did the standard Apple AirPods evolve through generation two and three and what was the initial cultural impact and the state and the status of the Apple AirPods shortly after their release? So what actually
happened is that we actually dispatched three sub agents that each were meant to answer one of these questions and you can see that each of them essentially they called the web they called a search
web tool then they generate a response.
Some of them went through a few different um searches and so they would search given a search tool. For example,
um they would just ask for let's see so the query over here is how did the standard airports go and then you would get some sort of response and so you can see the whole thing in a trace. I think you can slowly dig
trace. I think you can slowly dig through it and understand and then what you'll find is that the next agent.run
run um when the model actually looked at the response. This was um basically the
the response. This was um basically the response that we had uh from delegate search and so you can see in the results you can see that let me just scroll this
out a little bit. Is it possible to see what I'm looking at on the screen, Hugo?
Or is it just two?
Okay, I can see. Y
So, for example, um the initial question was what did the first when the first when did the first generation Apple AirPods launched and what were the key technical innovations and it basically said that the first generation AirPods
were officially announced on September the 7, 2016 alongside the iPhone 7. I
talked a bit about key technological innovations. It provided citations and
innovations. It provided citations and all the citations are basically listed at the bottom. Now, why is this really useful is because at each step of the sub aents uh search web, if we actually
look at the code, what you'll find is that um let me just pull this up over here. Let's go to open telemetry.
here. Let's go to open telemetry.
So, you have the search tool. Um I
believe it's over here in Exo [clears throat] probably here.
For those who haven't played with Exa before, it's an API among other things that you know helps you search essentially if you've heard of Tavi or these types of things. It's all those lines.
Yeah. So I think in this case we chose to use Exa because I think a lot of people might not want to use Gemini or you might want to use like Sonnet or you might want to use like OpenAI GPD5 and so I thought it'd be a bit better to
show like a more general example rather than tying it to the search web that comes out of the box with Gemini. Um so
it that's actually a good point. Like a lot of people don't know that you can that the Gemini API has a ground in search functionality as well. So you can do that. But I actually yeah you and I
that. But I actually yeah you and I discussed this using something like XA allows you to change your code to another model whatever it may be. And
Exa has a generous free tier as as well.
So I use it a lot when I when I teach.
Yeah. So I think that that's why I made the decision here to use Exa. But
honestly out of the box like the Gemini API does come with support for grounding with the Google search API. So, you're
using the same index that you do when you search on Google. And at the same time, you can do things like search with Google Maps, which means you can answer queries like um I'm currently in this specific neighborhood. What are some of
specific neighborhood. What are some of the best coffee shops around? And you
actually get that pretty like solid um response grounded in the actual data that serves the same Google Maps website that I use all the time. So, yeah,
highly recommend trying those two out.
But in this case, we just use Excel. Um,
and so you can see immediately that at each step where we call the search web tool, we're actually getting back a lot of data because for every single result that we get back, we're reading 4,000 different characters and we format it
into this format over here. Let me see if I can find an example of it um of what the model actually sees. Um, you
should be able to see when you get the search web.
um search the web and then the response will be I believe the response will be here and run. [clears throat]
run. [clears throat] Oh, it was scrubbed due to cookie.
Sorry, log file is scrubbed. Um but you can see actually another thing that I wanted to point out is that actually the sub agents are limited in the number of iterations that they can run and so we
add this little message at the end when they actually like reach that limit. And
this ensures that the sub regions can only run for a fixed amount of time. And
so this is kind of like a cool hack that we can say like oh I want you to do at the most like two hops. Um but we'll cover it in the next in the next portion. Um but basically the way to
portion. Um but basically the way to understand is that if you use like XA for example to do these like queries you can imagine that this is a lot of data for your model to see through and if you have say like sub agents that can run
for very long to answer these natural language queries. the amount of data
language queries. the amount of data that you actually might have to go through. There's a lot of irrelevant
through. There's a lot of irrelevant data relative to the relevant data that you want. And so using something like
you want. And so using something like sub agent is actually really common and a really useful method for you to make sense of that. Um and so yeah, that's kind of what's happening inside here. At
the end of everything once we're done, you can see every single trace. You can
just expand it. Uh we basically wrote a report and so the same report that was written by this trace is actually in the repository over here. So you see it over
here in AirPods report, sorry. And so um it has a title. It has a short executive summary and it says, "Hey, when Apple unceremoniously killed the headphone jack."
jack." That's hilarious.
Yeah. And the best part about it is that you get all these citations over here like um these are referencing basically these websites at the bottom. Um you
could probably improve it by having the URLs added too. I think that's a prompt thing you can add. But this is essentially what was state-of-the-art in 2025. I would say when like OpenAI
2025. I would say when like OpenAI launched like deep research and everything came out and people are like oh how do you do it right because back then you probably have to orchestrate a lot of stuff and now you can see like
model like Gemini 3.1 Pro preview it honestly is just like if we count the code here that I've written this is 500
lines of code this is roughly 200 700 700 800 and tools are going to be to tools are a bit long tools are maybe like 600 so it's like 1,500 lines of
code But this is like a pretty robust like code like repository that has like sub agent support. Um you have like basically todos that can be tracked. You
have the ability to set the maximum number of iterations. You can swap out and swap in like the different like tools depending on which stage of the entire like plan you're at. And you can imagine for this amount of like
functionality that we're building in this is actually like quite it's like not that much code if you think about it.
Quite a small Yeah.
And just to remind everyone, what we've just shown you is what we're moving towards. That's what we're going to be
towards. That's what we're going to be building in in this workshop. To Ivan's
point as well, we hand roll a lot of these things now today. But you know, incoming APIs like we hopefully we'll have time to talk about the Gemini interactions
API which is currently in beta and I'll link to that in the discord as as well.
um you know abstracting away a lot of the things that we want to do and the ability for Gemini to do deep research itself and provide all the traces to to you um is getting increasingly powerful.
Um can we just go back to the readme just and just remind everyone what we're doing for those just joining now um and then we can jump in. So
hello Gemini and then we'll start with a minimal call add a tool call at a runtime and then we'll start adding states hooks and agent loops and then we're going to let it loose with sub aents planning and observability.
Sounds pretty good. Yeah. So, I think uh H was right. The interactions API is currently in beta. Um but it is something we're moving slowly towards um because we want to make it easier and more productive for people to build on
Gemini. And so we're trying to take away
Gemini. And so we're trying to take away some of the lessons that we had um serving Gemini, making it available to everybody and trying to make it a better developer experience for people that want to just build on Gemini. Yeah. Um
yeah, so let's get started. Um, one of the small changes is that there's actually another section I added called no tools, which I hope makes it a bit clearer because I realize not everyone has heard of like tool calling and like
what is a tool for a model. Um, and so let's take a look at right. So the
easiest way to get started using um Google Gemini models is by going to AI Studio and you'll find if you just search it in Google um you can get a free API key. we have a pretty generous
free tier and that gives you a good amount of like RP requests per minute to work with and so traditionally the method to do a lot of this inference has been a generate content method so let's
see what happens when we run it right so let's go agent it's got agent py and we want this model to read this read ment right let's run this code give it a
while and it should come back relatively fast right so we told Gemini, hey, please read the read me file. And Gemini says, hey, I'm happy to read it, but you
haven't provided the file or the content yet, right? Please do one of the
yet, right? Please do one of the following so I can help you. So before
we had something like tool calling, what we would essentially do is let's say we had this response from Gemini. Then we
would say, hey, um I want you to output the file name that you want to read in a maybe an X, maybe in a JSON, um tag,
right? So you might say dash like read
right? So you might say dash like read file and then you would say file name equals
something like this and slash read file right and so this becomes very clunky down the line because parsing this is very difficult what if the model say
adds a space over here or let's say you have like a Chinese characters over here instead of like an English character or if it's only output something like unic code and So one of the ways that we got
around this was by using what was called two calling, right? And so what is two calling? Basically what this means is
calling? Basically what this means is that the model is going to output a fixed um sort of oh wow that was really fast. Um so what
happens that instead of the model outputting this very long response that we saw earlier um which was I'll be happy to help you you know like please give me the read me please provide it.
um what the model outputs now is basically just going to be this right [clears throat] I'm just going to put this on top of it here so there it's easier to see so this
is the output that we get now right and so comp compare this to the output that we got previously where it was I um I'll be happy to help you you know please
provide the file name that you want me to read all the content So what's the difference between the first
one and the second one, right? The first
one is that you know that the model will always output some sort of valid um JSON string or that you can parse it using a JSON library. If you're not familiar
JSON library. If you're not familiar with JSON is basically a way for you to do these like object keys. So for
example, path it would basically read a string. Um you can have like nested
string. Um you can have like nested JSON. So like for example like working
JSON. So like for example like working directory would be like this directory and then path might be say read.md
right um and so this is just a way for you to represent data in a way that you can read reliably and if you use any sort of um online website say for example like Facebook you've logged into
like hacker news and you have like a username and password then basically what you do is you might be sending something like username is Ivan Leo and
then the password is going to be password right just a very simple example when you have like a online login form you might just take the values from that like online log form
put it into a string that looks something like this and then when you parse it you get back this object that you can then say let's say you want the response do username then this would
just give you say Leo as a string right and then response password would just give you in this case like password. Uh
and this is really useful because what this means is that we can now connect our model from say whatever it thinks whatever it wants to do to the rest of the world. Uh I'm sure you've seen a lot
the world. Uh I'm sure you've seen a lot of things like cloud code. You've seen a lot of things like co-work or even like um some of the newer demos where you have these like incredible agents that just doing great work. And the way that
they're all doing this is through to call because that's the way that you can reliably expect your model to ask you to help it do some things and you can execute some logic and give it back the context it needs and this sort of takes
it away from being like a simple chatbot where you say hey what's the capital of France right and says like Paris too can you read like what's the authentication logic in this library and it will
actually use commands and and tools like a bash tool it will run commands it will run tests it will write scripts and all this is done by using two calling and Ivan I'm actually just going to show
one thing very quickly for those who haven't done a lot of this before um and you've made this clear but just to really double down on on this um what happens is and this is a wonderful
figure from from Google actually um what we see is that uh you send a prompt and it will either you know give you text with a response completion endpoint or
it will give you text still because it's text out but with the function name and argument. Right? Now, when that happens,
argument. Right? Now, when that happens, when you're hand rolling these things, you then need to pause that, execute the tool. Um, let's say it's the get weather
tool. Um, let's say it's the get weather API, for example, it gives you the function name argument. You need to execute the function, then send a result back to the LLM, and it will then
hopefully give you a final answer. So,
you're doing two two calls to the LLM there. One where it gives the result of
there. One where it gives the result of the tool call, the other where it gives you the completion. Now what we're going to see is that um these things become
very powerful when the LLM or agent will loop over to over tool calls, right? Um
on top of that, this is a good way to introduce be introduced to it and because it is a bit funky in in all honesty. Uh it's not always easy to gro
honesty. Uh it's not always easy to gro until you see it in action. Um but in the end, there are SDKs uh from the Frontier Lab providers that that that we all use to build a agents on top of
these APIs as well.
for sure. Yeah, I think like that's uh probably like a good way to put it and honestly I think it was a great diagram.
Um so yeah, that's kind of what happens like under the hood like for a lot of these things. And so with the genai SDK,
these things. And so with the genai SDK, you can do it by using a types tool.
Then you add it inside function decorations. And it's a bit convoluted,
decorations. And it's a bit convoluted, but basically what you need to do is just add name equals to read file.
Description is what this specific tool do. And then you have parameters, which
do. And then you have parameters, which is basically you telling the model, hey, I have these specific values I like you to provide. And you can see that you
to provide. And you can see that you have a type and a description for each of them. Um, oftentimes a lot a lot of
of them. Um, oftentimes a lot a lot of people think about like what is the right tool? How should I name it? How
right tool? How should I name it? How
should I, you know, write the descriptions? And going back to what we
descriptions? And going back to what we talked about meta prompting, I feel like that's actually a really powerful tool.
So that's basically what happens with a tool called and in Gemini, just going back up a little bit, the gen SDK, you notice that we have this thing called a part. And the way to know whether
part. And the way to know whether something is a tool call is basically by checking whether or not you have this function called parameter filled out. So
you can see that you have arguments over here which is path readme.md when we ask it hey can you please read the readme empty file and then you have the name over here that we specify. So this is
how to define a tool and how we call a tool. So let's um go into the next
tool. So let's um go into the next portion where we're going to basically take this and run it in a single loop.
So let's run this for now and let's see what happens. So the model says hey I
what happens. So the model says hey I want we told the model hey can you help me read the readme.md file.
It then gets the information from the readme.md file and now it can actually
readme.md file and now it can actually say hey the readme.md file describes a 10step workshop for building a deep research agent from scratch using the gemini API and has a sequence of folders
and steps where each step introduces a new concept. So what is the difference
new concept. So what is the difference between the code we wrote before and the new code? Well, it's a lot longer, but
new code? Well, it's a lot longer, but we've actually added this new thing called a handler, right? And so now we can actually handle a lot of the logic that we're doing when a tool is called.
So this file definition hasn't really changed. Um, and all that's changed is
changed. Um, and all that's changed is that if we see function calls, what we want to do is we want to just basically execute the function call by using call.
ARCs.get. Recall if you scroll up over here that you know if the model decides to call a function you will see that basically arcs you're going to get this args value and that's going to be the
function call that we have right remember what we talked about previously where with two calling you can get very predictable like objects and strings that you can parse and execute some logic and this is kind of what we're
doing so with that we were then able to read the actual file and in the Google gen SDK the way that you get the content back is you use is types.part from function response
is types.part from function response meth method that you basically just provide the name of the function call and you have a response over here and what this allows the model to do is that
you tell the model hey for this path what was the content and then we then generated a follow-up so going back to what we talked about just now I think Hugo you were showing the nice diagram where you had the initial complete
function call then you run another completion and the model generates a response um that's sort of what's happening over here where the model was able to make a tool call asking for more information and then once the tool call
was executed it then came back and it was like okay right um so the next step I'm going to do we're going to make it a bit more general because you can imagine
that this isn't a very nice way to add new tools so um to get around that we're going to be introducing a simple tool runtime right so let me just create a
tools up py file and this will just simplify the process proess of working with a agent's tools a lot more.
Okay, let's so what a tool runtime is that it just makes it a bit easier for us to define tools. So a lot of the initial complexity around what we had previously with defining and using the
types of tool making sure we using the right definitions from the GI SDK are sort of going to be handled by this tool object that we created. Um what I'm using over here is just a simple data
class um with slots equals to true. And
what this means in Python is that it ensures that the values in the data class only are only have they're only going to contain these four values. So
we have a name, a description, an arcs model and a handler. We then basically define a tool called two genai tool. Uh
this is a function sorry that will basically take this identic model and then convert it with the JSON schema. It
then passes it into a types tool where you have function decorations. Um, and
you basically massage it into the right format. And this is going to make it a
format. And this is going to make it a lot easier for us to define what was originally um, a very complicated, very lengthy piece of code. You might say, hey, the previous code was a lot easier
because I had everything there at one go, right? I could just easily go in and
go, right? I could just easily go in and change the tool. But the reason why you want to think a lot about the ergonomics of your code is that at some point you're going to have a coding agent that's going to be writing some of this
code. Um maybe you want the agent to be
code. Um maybe you want the agent to be able to write its own code. And making
it easy and abstracting away a lot of these small problems is is often quite useful. And so what the tool this new
useful. And so what the tool this new tools py does is that makes it very easy for us to define new tools. So you can see the read file tool becomes has a name of read file. It has a description
over here. These are the arg this is the
over here. These are the arg this is the argu and then this is the handler right and then in agent.py py we just basically now make it such that instead of us hard coding the logic around how
we read the file um we just initialize it with a list of tools um when we need to actually make a call to the model we actually provide a list of tools by calling the two geni tool that we
previously defined and then when we define when we execute the tool call if and we just basically make sure that we validate the arguments passed in execute
the handler and then we just return name and response So a lot of this basically is very similar to what we had before and what we're going to be doing over here is
just that um it's just a runtime execute to call. So let's verify that this is
to call. So let's verify that this is the same thing that we previously and Ivan it may be worth because we're talking about coding agents talking about so I've just linked to the
previous workshop we did on building coding agents building agents that build build themselves as as well and people have probably heard of open claw definitely check that out but check out pi and I'll link to a few things about
pi which is a coding agent with four tools read writeed and and bash which is incredibly powerful I the reason I mention this is Because Ivan, I before when we were planning to do some
workshops, I said to you, should we do a workshop building an agent that isn't a coding agent? And you said, Hugo, isn't
coding agent? And you said, Hugo, isn't every agent a coding agent? And I I think there's something something in there. So, I I I do wonder if you could
there. So, I I I do wonder if you could just speak to like the general purpose nature of what coding agents are and and the utility of coding agents more generally. Yeah, I think um at this
generally. Yeah, I think um at this point a lot of useful general purpose agents end up being coding agents because what you'll find is that actually a lot of the agents or models
that you're using have memorized so much of the internet that they're actually very useful and because of that they can basically do things like regurgitate URLs or like APIs wholesale. Like if you ask some of the models, for example,
Gemini, hey, can you help me find like the Pokemon API? Instead of going to the web, it actually has it memorized. Um
and so what you find is that for a lot of the general purpose work that you're doing, whether it's um doing some research, doing prospecting or let's say even writing a document or writing code, a lot of models are actually trained in
a way that they can actually understand and they have enough context that they can do it, right? And code allows them to create the visualizations, run the numbers, and try to figure out whether
or not this is actually um like to achieve the goal that they have. And so
I think a lot of these general purpose agents have to be coding agents, but they often will do a bit more. Yeah,
exactly. And I'll just add on top that I've linked to a blog post I wrote that you you you [clears throat] helped me with over. You gave wonderful feedback
with over. You gave wonderful feedback on it and inspired a lot of it. How to
build a general purpose AI agent in 131 lines of Python. Build a basic coding agent there and a search agent. And a
demo I do is showing how a coding agent allows you to just um for example clean your desktop or sort files uh or um you know take all of your music library and
add metadata at scale and these types of things. And these are things that
things. And these are things that sometimes technical people would write scripts for. But the reason I want to
scripts for. But the reason I want to make this clear is that if your agent can write code it then becomes a general purpose computer use agent that can do any like it isn't just about building
software. It's about using computers.
software. It's about using computers.
And I think when you internalize that, what that means with respect to what you can do with coding agents, I think I still haven't quite figi fig figured it
out. I think there's and that's why open
out. I think there's and that's why open claw is so important. That's why co-work and now computer use with anthropic and perplexity is doing what they're doing.
everyone's fast following um openclaw which co-workers kind of gave gave the promise of I think and of course manis before that was kind of OG with this this type of stuff and I think that's really
that doesn't mean when you're building a search agent you want to build a coding agent though so I want to be very clear about that right um but yeah that's a bit of context around kind of how to
think about coding agents as general uh general use computer use agents yeah I think um a lot of times when building agents it's like I think the coding capab capabilities are the easiest to provide because it just comes
out of the box for a lot of these models. They can write like really good
models. They can write like really good batch commands, you can write good scripts and it's sort of like if you see problems with your model like if the model struggles to provision a specific
database or you know like get to certain steps then maybe you you introduce like a tool a new tool to sort of like circumvent some of that. And I think at the end of the day, a lot of it is about being very thoughtful about your design.
Um, and looking at what the agent or like where the agent like messes up and going back to like what we talked about stuff like using a lot of techniques like metap prompting, looking at your traces, making sure you have open telemetry setup. I think those are quite
telemetry setup. I think those are quite important to call off. Yeah.
Awesome. Let's keep going.
Yeah, let's keep let's keep it going.
So, I think the next step we're going to start looking at context. So
um previously we had a agent that was going to run and so we showed basically how we could go from a simple prompt where the agent responded in free text and then it made it very difficult to
productionize it and then we looked at like tool calling and how we could simplify the definition of basically writing and how we could simplify a lot of the code around like getting the tools in the right definition so they
can be used with the Google genk.
The next portion what we're going to be talking about is we're going to be building a simple agent but we're going to be doing it in a few different ways.
The first thing we're going to be doing is introducing the concept of state. Um
a really common thing that you have is you might have max iterations. You don't
want an agent to keep going forever because it's really expensive at some point. But you might also want the agent
point. But you might also want the agent to track to-dos, right? And so the way that you can do that is by having a state variable. So what we've done over
state variable. So what we've done over here is we just added this thing called run state. And what run state allows you
run state. And what run state allows you to track is that um it's just a collection of information some data that you pass around while your agent is running. Now what could be some other
running. Now what could be some other information bits of information or things that you care about here? Well it
could be as you see later it could be the phase that you're in. Right? In the
deep research agent that we built we had a plan phase and an execute phase. It
could be an iteration count and it could also be a user configuration, right?
Think if you're building like cloud code, then what you want to make sure you do is you respect like user permissions, right? If the user says
permissions, right? If the user says like, hey, do not go into my downloads folder, do not go into my documents or I want you to always ask before you write
do any scripts. Um, storing in some sort of state variable that you can carry around is often quite easy. And so here I've just chosen to basically make it
um a single data class that you can throw around. So to-dos over here would
throw around. So to-dos over here would just be a list string. Um it would just be a default factory list and iteration count always starts from zero. And in
terms of the tools, we need to introduce a new tool. And what we introduce is a modify todo tool. This allows the model to add or remove different todos from the current run state. And when we look
at our agent later for the first time, you notice that it's now able to track and show its todos.
Notice that um for us to be able to add a new tool, all we need to do is add this definition.
Add a newetic model that shows like what are the different classes that it has and then we just define a handler which will take in these arguments. So if
let's say our action is add, we'll add a to-dos and then we just tell the model over here to are updated to this specific result. And I'm just using a
specific result. And I'm just using a simple bit of like XML over here. Same
thing for remove. Um we're just going to throw an error at each point if some sort of invariant is not um basically caught. And then we're just going to
caught. And then we're just going to essentially just like return a result which is what is the latest set of to-dos. So one of the common things that
to-dos. So one of the common things that you might be thinking about is if I have todos what if I throw it in the system prompt right? So I have maybe my system
prompt right? So I have maybe my system prompt and I have all the messages below. But the problem when you do
below. But the problem when you do something like this is that you're going to break your system your cache every single time. And I just want to call out
single time. And I just want to call out also that like when you're building an agent, you should be thinking about a few things from the start. And caching
is something that's incredibly important.
Sorry, I've been talking for quite a bit.
So Ivan just we've got a great question from Venode who's actually in the course at at the moment. Um, and Venode asks, "If the main frontier model providers, Anthropic, Gemini, of course, uh, OpenAI
are quote unquote fast forwarding their agent workflows, what is the benefit to create our own? Isn't it better for us simply to enjoy what they have created?
They are the experts after all." He also says, "Thank you so much for a wonderful session." I would love your thoughts on
session." I would love your thoughts on that. I do want to just frontload it by
that. I do want to just frontload it by saying we had uh, Sebastian Rajka, as you know, came and chat chat with the course the other day. Um and we asked him the same question about LLM
internals. Um and one reason you know we
internals. Um and one reason you know we want to know about LM internals is understanding what's happening under the hood. It does impact how you build as
hood. It does impact how you build as well. But also the joy of things. There
well. But also the joy of things. There
are so many people who want to know about the internals of cars cuz they drive cars and want to know about them who don't necessarily build cars is is one answer. I also do think um
one answer. I also do think um understanding [clears throat] how how to handroll agents at the moment and how to put
observability into our agentic systems is important because the provider SDKs.
I do agree that in the future most of us will use interactions APIs and SDKs and all of this this type of thing. Um but
building it allows us um to understand it and adopt the frameworks and SDKs once we know what we're building.
Yeah, I think so. I think at the end of the day like a lot of this comes down to the fact that um the models are going to get more capable. Um model providers will make it easier um because I think
that's in everyone's interest. And so I think like a lot of times what will happen is just that um a lot of the current core logic or like flimsiness of these models um will be solved right
like when we look at GBD 3.5 it's in context learning was not as strong as even like the Gemini 3 flash today and back then GBD3.5 had a context window of like 8K versus like Gemini's 1 million
context window and so I think a lot of times what will end up happening is that model providers will innovate on their on their APIs like we make it easier for people to to use our
models. But part of the value of
models. But part of the value of building on like these models is that you can then implement like what matters the most for users, right? Um so if you use something better like easier to work
with like the interactions API that we're slowly shifting towards, um you can focus a lot less on like the core agent logic, maybe even the core agent loops, I don't know, right? Or however
it changes. you can instead focus on the other things that matter for your business like finding users, getting good valuable feedback, understanding what to build and so I think um a lot of
it would just be um the models will get better um hope and then the model APIs will get better but then that just means that you have more time to focus on what matters to you which is building an application that you're really
passionate about. I'm hoping
passionate about. I'm hoping and the other thing is yeah that does answer it very well. I think also adopting whatever frameworks and SDKs that abstract over the things you don't want to think about and don't want to
look at is is super important. On top of that, just to be frank, like I've built agents and consulted people, built agents and taught people who built
agents with claude SDK, for example, and then updates have broken what they uh wanted to do and what they what [clears throat] they were doing. Um, and
if you want to have fallbacks to different providers, that's not necessarily a a solved problem with with these SDKs. So, we're still it's still
these SDKs. So, we're still it's still very early days now. adopting provider
SDKs is is what we we want to do, but I but I do think being mindful about it.
It isn't as easy as saying they they know what they're doing, so we should just blindly follow.
Yeah, I think so. It's like I think it depends on what is the amount of engineering capacity you have and what do you want to use for a library and what do you not, right?
It's like at the end of the day, you could also implement your entire agent framework in like C++. [laughter] I
think that's always an option, but then you miss out on a lot of the good tooling that's been built out in either like JavaScript, TypeScript, or say like um you know the Python ecosystem that helps to deal with a lot of these
problems. Um if let's say you're a oneman shop maybe the solution is to use something like paidic AI where you have very like strongly typed um SDKs you have a very good like understanding of
like you can switch between different models you can use the pidentic like gateway um and I think it kind of depends on like what what is your current like priority and what are the
things that um what are the what's the bandwidth that you have at the end of the day. Yeah.
the day. Yeah.
Awesome. Thank you for all that context and clarity and particularly with like all the stuff you were building at at Manis these are things you you thought through very deeply.
Yeah, I think at Manis we were unfortunately forced to roll our own uh framework because we were just so far I think like we just had there weren't really like very strongly established frameworks I guess at that point. Um so
we were also using our own bespoke framework at that point but you know I think it really depends on like what you are working on who is working like what
kind of capacity you have. Yeah.
All right. Um I guess I can continue. Uh
is my stream?
Okay. Cool. Awesome. I was just worried for a moment. I didn't hear like a response to some sort of audio.
That's right.
I was like oh didn't break. But
no I was focusing on the Discord actually.
Oh okay. Okay. Cool. Cool. Um I think um let's try to finish uh so now we're at four then we can look at hooks creating an agent and then I think once we finish that portion we can take some questions
and we can think a bit about sub agents etc. Um okay so let's dive in I guess.
So right now what we're going to do is we've actually changed a chunk of our code. And the reason for this is that um
code. And the reason for this is that um we want to just abstract more and more logic into the agent runtime. And so
we're going to have like four distinct portions, right? First one is we're just
portions, right? First one is we're just going to be showing you how you can now have this concept of like state. And we
talked about previously how state is a really useful abstraction for you to store things like um does the user have specific permissions? Does the user have
specific permissions? Does the user have um some sort of data that you need to use to determine like what the executor tool will do? Um and I guess in this case we have the maximum number of iterations and also the maximum number
of like to-dos.
So we have a pretty simple piece of code over here. Um we have some defensive
over here. Um we have some defensive logic where if let's say this tool doesn't exist then we're just going to tell the model hey um this tool is is
unknown tool. And then if let's say some
unknown tool. And then if let's say some specific arguments are missing like due to model provider issues um then we just say hey it didn't include arguments
right and then we try to validate it um using pyetic because we do have these arguments listed that we expect as a pythic based model. So it's relatively easy. And so if the arguments are
easy. And so if the arguments are invalid, then we're going to go for invalid arguments provided. And then
once we finish it all, we're going to say tool.andler where we've defined a
say tool.andler where we've defined a handler. Right? Going back to tools,
handler. Right? Going back to tools, what you see over here is that, you know, we're going to just be essentially uh let me just modify some of these.
Okay, cool. So remember for every single tool that we have, we have a set of handlers. in this case a refile and over
handlers. in this case a refile and over here we have a modified to-do right and we want to basically tell the model what is the result of this tool that you've asked us to execute in the event that
it's a tool that doesn't exist in the event that it's a tool that um the arguments are wrong for some reason um oftent times we can just tell the model like hey um you've called a wrong tool and so I think like having some sort of
um defensive coding around like a lot of your um agent harnesses is quite important because you can often have a lot of these issues use. Um, so good things to call out here are things like
Sentry and things like uh some sort of hotel telemetry. I generally like open
hotel telemetry. I generally like open source standards. I think they're easier
source standards. I think they're easier to build around. Um, but it's entirely up to what your stack is currently using. Yeah.
using. Yeah.
So, we pass in a reference to the run state and we add to-dos in the event that we get more to-dos. And if we're trying to remove it, then we just validate to make sure that we have all the necessary to-dos over here. Right.
So how does this change the way that our agent is now working? Well, in the runtime, uh we can just specify these tools and it's pretty straightforward for you to do so. Uh then we just tell the model, hey, first add a todo to read
the read me file and then read the read me file and then we're just going to keep iterating as long as you have function calls. So let's run h.py and
function calls. So let's run h.py and
let's see how it works. So the first iteration the model this is pretty fast.
Holy Um maybe let's ask it to do a bit more. read the read me file and then
bit more. read the read me file and then write hiq at the end.
Make sure to check out check off all of your views or end the hiku.
Cool. Let's have it run again. So you
see okay in the first iteration count um it's running a bit and then it would just skip a haiku and so ha coup at the end was Asian build itself from a call to complex plans gemini guide search.
It's the same sort of loop that we're talking about just now where you have this case where the agent can output or in this case the model can output two different things. One is that it can
different things. One is that it can output like a completion which is say natural language response like what you see over here when you have the text field and another one is when you have a
function call in this case it removed the todo because it had done it right and the results were that you know the todos updated to this new set of no to-dos.
And up here you can see that it read the readme file. And so let's walk through
readme file. And so let's walk through this bit by bit. So at the first step when it got the the task to say hey add a todo to read readme.md and then read the readme file. The first thing it did
was actually to add a todo. And so it added a todo called read a readme.md
file. And this was what the model saw.
to-dos updated to read me. On the second iteration with this readme in mind, the model then called this function called called readme.md. And the result was that we
readme.md. And the result was that we just threw this whole chunk of code in.
So this is the entire um logic uh the entire content of the readme. If you
look at it over here, once it read the readme, it then realized, okay, I'm done with this to-do. Let me take it off and let's continue about my way. So actually
remove the readme, remove the todo. So
the readme was done. So the todos are now empty. And then it output is
now empty. And then it output is response. And this is sort of us getting
response. And this is sort of us getting to the first agent that we're building where, you know, it's now running in a loop. As long as there are function
loop. As long as there are function calls, we're going to keep letting it run. And this is a really useful way for
run. And this is a really useful way for us to be able to make our agent not just a chatbot, but a agent that can go out and interact with the world. [snorts]
So what can we go? Where do we go from here?
One of the things that you probably want to do when you're building like any sort of agent is that um you want to basically execute some logic at different parts of the whole life cycle,
right? And so we'll do so by using
right? And so we'll do so by using hooks. Um so let me just paste this in.
hooks. Um so let me just paste this in.
So what changes for a hook? Um I
do you want to remind us what a hook what a hook is? Like why why why do we even want to think about hooks? Because
it is a software principle that I think people in ML and and data and AI don't necessarily know so much about.
Sure. For sure.
Yeah. So I think the concept of hook here is just that a lot of times when we think about the logic that we want to build out in our agent, right? Um when a user sends a message, we want to save it in a database or let's say when the user
sends a message, we want to show some sort of content filter um in our app or let's say we want to make sure that we save it in our database when the agent responds.
hooks are busy.
Also, we did a nice example in our build build um agents that build themselves build your own open core um like triggering a message to Telegram, right?
Yeah. When when building uh an agent like that, that's a great example though because actually like that's kind of what a hook is, right? Um
is, right? Um I'm looking I'm going to link to this.
I'm looking at our blog post actually.
We wrote when the agent responds or calls a tool, we often want to trigger other things. send a telegram message,
other things. send a telegram message, log to an observability platform like logfire, render pretty output in the terminal and and so on. So yeah.
Yeah, that's basically what a tool does and it's kind of this is kind of going to simplify a little bit about what we're working on. Um because all we need to do is define these hooks and we're
basically able to do quite a fair bit.
Um so we're just going to define an on method that allows you to register like functions to be called when different things are being executed. And you'll
kind of see later when we want to do for example like very nice rendering in like rich hooks actually allow us to do this too in a very like clean and simple way.
Um we do actually kind of define some logic over here just because you know it's a bit when you prepare the request um you want a bit more like logic you
want this is the only thing you want to run um so but other than everything else is kind of the same. Um so why why do we have like prepare request over here and
the reason for that is that if you look at this state py we actually have two things over here the first thing we have is the iteration count and second thing we do is the todos previously we looked
a lot at like the to-dos showing how you can add and remove todos from the iteration and what we want to do over here is we want to basically just be
able to tell like you know the model like hey if you've reached the maximum number of iterations, then you want to basically just add a little reminder at the end saying you've reached the maximum number of iterations. We
summarize what you've done and return your response. And you realize that the
your response. And you realize that the tools are actually going to be empty that happens. So this is sort of like a
that happens. So this is sort of like a system reminder that we thrown in. Um
you see this in a lot of these different agent implementations where you try to nudge the model in different ways and these like very ephemeral messages that are just appended at the end or you modify some of the stuff that you send.
And so it kind of depends on what works the best. Um, in my case, I've just
the best. Um, in my case, I've just appended a user message that mimics what would happen if, let's say, the model had done like five iterations and then I just say, "Hey, like, hey man, you got
to stop." And so that's all we do over
to stop." And so that's all we do over here where we just have like a certain fixed number of iterations. The rest of the logic is kind of the same. And if
you look at a lot of tools, um, I'm just going to change this to result. we always return like a
result. we always return like a dictionary with the word result because that's what the model is going to see.
And if you look at the agent of py logic over here, we're just going to basically pass in uh once we've done the execution, we're just going to throw in this dictionary over here. And so I've
just chosen to use result as a very common um dictionary phrase, but with the genk, you have the flexibility to call whatever you want. And I think that gives it pretty nice like property that
I actually like.
So Ivan, we have a great question in the chat quickly. Um, they said they may
chat quickly. Um, they said they may have missed it, but do we have a system prompt? How you how are we controlling
prompt? How you how are we controlling the agents behavior such as write to do before do something, check off to do that?
Yeah, I think in this case like I've actually not really used much of a system prompt at the start. Um, I think um, system prompts are really useful and you should definitely use them, but I just haven't added in at this point in
time. Uh, I can give an example a really
time. Uh, I can give an example a really good like system prompt or rather like a much longer one. If you look at for example tools py and step nine um you see that it's much longer because we
have the system prompt over here and so I call it I give it a name I say you're a deep research planner uh when the user provides a query make sure you clarify the scope how do you want to ask at most
two time two times then some general like tips around that you should be polite positive and helpful and what like tools use each step now the second thing that we did over here is we said
that hey for every single topic IC, how do you want to handle the task? How do
you want to draft the notes, source diversity, writing style, citations, rabbit holes, etc. And this is where you give a lot of information for the model to basically look at when it's writing
its responses, when it's writing its notes etc. Um, maybe I'll just add Oh, please go on and then I'll add add one thing.
Oh, no. I was just saying like I think you should try to keep your system prom a bit brief before you start expanding it. Um, a lot of times, uh, you'll find
it. Um, a lot of times, uh, you'll find that as you build applications, especially with like a large team, everyone's going to add in like random after a while. Um, and so you want to kind of think very carefully about what you want to add in the system
prompt.
Yeah. And you kind of want to gatekeeper. You don't want people to
gatekeeper. You don't want people to have random YOLO access to just write to the system prompt. I do sometimes encourage people to see what they can do without a system prompt and then add things as needed. And I'll actually just
show you one. Well, I'll show you two things actually.
Um, can let's just see this. I'll add this to the stage. Yeah, this is I've shared
the stage. Yeah, this is I've shared this. This is the post I wrote on
this. This is the post I wrote on building a general purpose Python agent.
And there's no system prompt here at all. Okay. And one of the reasons I did
all. Okay. And one of the reasons I did that is to demonstrate that you can get away I wouldn't do this in production, but you can get away without writing a system prompt if your tools also have good descriptions and writing and
iterating on good tool descriptions. um
is incredibly important. Uh I also and I'll link to this. This is uh the PI coding agent. This is the system prompt
coding agent. This is the system prompt for the PI coding agent. You can see how short and this is what openclaw is based on, right? Um built around. You can see
on, right? Um built around. You can see you are an expert coding assistant. You
help blah available tools, guidelines, documentation, and that's it. And if you compare that to, you know, Claude's um system prompt, this may actually I
mean it's so long. This is the diff. Um,
it may take, you know, like I mean this is wild, right? And of
course because Claude is an enterprise product, they, you know, have a lot of things they they need there. But and
this is to your point, Ivan, about how good the models are getting. The models
are getting so so good that we actually in the end maybe write long system prompts, but starting off minimally can actually get you a huge huge amount of the way. And that's all I wanted to
the way. And that's all I wanted to really uh demonstrate there.
Yeah. No, I totally agree. I think like a lot of times um if the model can do it out of the box um why write it in the system prompt and also I think like it's a good call out now to like this idea of
progressive disclosure which is a really like complex word for saying that you should let the model see the information if it needs it and you should provide the model with the ability to discover it. For example, like if the model has a
it. For example, like if the model has a few different tools or like functionality that you wanted to learn, like how do you edit like a Google documents? How do you like access your
documents? How do you like access your Google Drive? Um, instead of you pasting
Google Drive? Um, instead of you pasting everything and this is the prompt where let's say the user only ask one question at a time like if the user is going to ask questions like, oh, what's in my Google Drive? Can you create a new
Google Drive? Can you create a new Google Docs for me? Can you modify Google Sheets so the title is now um build your own deep research agent with Ivan Leo? then you don't need all that
Ivan Leo? then you don't need all that information in the system prompt. You
can have a skill, right? And so I think thinking about context and managing it so that you don't aggressively add things to your system prompt is going to be a bit better for you in the long run.
Um that's two sense. Yeah. So this is the concept of like hooks like what Huba and I talked about just now where basically with hooks um a lot of the rendering logic can now be abstracted
out. And so what you see over here is
out. And so what you see over here is that our rendering logic is now just entirely expressed over here where at each distinct step depending on what tool is called what sort of logic is
being expressed um you can you can basically just have um this logic over here that's simplified.
So now that we have this, let's see how we can abstract a lot of the logic into like a simple agent. Right? So I'm just going to add these three things.
Let's go up a little.
There we go.
So [snorts] this is the last part in the second second portion where we're going to be talking about state hooks and agent loop. But previously we had like a while
loop. But previously we had like a while through loop. It was a bit complicated
through loop. It was a bit complicated to manage. We're just going to take that
to manage. We're just going to take that and bring it inside of the agent. And so
we're going to define a function called run until idle. And so what run idle will do that's what changed from the previous one to this one is that you know basically run to idle. We're going
to just keep iterating for as long as we need to. And we're going to iterate for
need to. And we're going to iterate for as long as the model is calling function calls. Now
previously we talked about how todos are a great way to sort of ground the model in some sort of logic. And the way that we can actually force a model to
basically add like a to to finish all the to-dos, you actually just define a do is incomplete method. Um what I've chosen to do over here is that it either returns a string reminder or none. So
you can think of let's say for example you have a model that uh you can say like oh you've run out of a token budget. you can maybe append at the end
budget. you can maybe append at the end like oh you have like 4,000 left 2,000 1,000 right um and and these like small reminders depending on what application you're building might actually be quite
useful uh in this case over here what I've chosen to do is just if the model tries to get to the end without taking off all of its to-dos I'm just going to force it to go for another round and you can start to see how you can start
implementing a lot of deterministic checks in your agent harness without actually um relying on like a prompt I think um a prompt is a really useful thing and especially as models get
better um you can honestly have less of these deterministic checks but while we're still waiting for these models to get like sign to get better and like improve over time it's not a bad idea to
think a lot about these like small deterministic checks right let's take a look at this um clear it and let's run it again so now we have a agent with to-dos so let's say can you read the
readme file and and uh let's and then tell me what it says. Right.
So now the model is actually going to add the to-dos. It's going to read the read me file.
Let's give it a bit of time.
It's read the file over here. So you can see that now it's done and it's ticked off all its to-dos and it basically went for another round because it hadn't ticked off all the todos. We look at
what happened up here. It still had two more like read me.mmd and then it force it was forced to go for another round.
So this is where like a system problem might be very useful because you can tell it things like hey I want you to read to pick up all of your to-dos added another layer of deterministic checks to make sure that twos are essentially like
removed and then once you've done that you actually allow the model to finish its turn. Um, and so I think this is
its turn. Um, and so I think this is quite important when you think about building out your agent because you want to think a lot about like what are the checks that you have um back when you're
writing if you're writing a lot of code you know you need things like unit test you want things production like try catches um I think of this kind of like the same thing um you want a mix of different guardrails whether it's the
output that your model is throwing out the kind of like conditions for you to stop the model like for example if you detect that it's going to like a loop then maybe you want to stop it there.
Um, stuff like this can be implemented deterministically and this is an example where a system problem might have been quite useful to have along with it because that's why it went like twice had to call like the remove to-do method
and then it went through that whole loop again. Um, so this is the end of our
again. Um, so this is the end of our first part where we talked about second part where we talked about state hooks and the agent loop. We're going to be trying to cover as much as we can with
the letting it loose sub agents planning observability, but I want to just sort of stop here to see whether there are any sort of questions.
We've got a couple of great questions.
Um, and it's a it's a good time to to take them. We have a [clears throat]
take them. We have a [clears throat] cool question from Conrad that I'll I'll get to in a second, but I just want to give a sh a shout out to Conrad who he's taking our building AI applications course at the moment and he's been
building frantically and he actually um I let him know about Gemini file search and we introduced the course to to modal because they're they're sponsoring it and he actually spent an hour mixing Gemini file search. For those who don't
know, Gemini file search is actually a really powerful managed rag retrieval system where you throw in your docs, you can specify your chunky what whatever you want essentially and it gives you an
API. So he mixed uh Gemini file search
API. So he mixed uh Gemini file search with modal and gradio to get a private knowledge base up and running um in an hour and is now thinking about evals maybe changing out to streamlit making
sure it works with ma maths heavy resources and this type of stuff. Um,
and he has a great question. Um, if you could elaborate just a bit more on the defensive logic in terms of what methods tools uh libraries um you'd encourage people to check out and the
usefulness of Pyantic AI. And I'll I'll just say one thing. Pyantic AI actually has so many different parts to it that you don't necessarily have to talk about all of them, right?
Uh, yeah. I think the beauty of using like pantic AI is just that um, it's probably one of the few agent frameworks I've tried. I normally roll my own
I've tried. I normally roll my own thing. So I think like that might be
thing. So I think like that might be like a bit of a problem. But um I will say though like I think Pentic AI is written by the Pyantic team and I generally do trust taste of that team a
lot. Um and you know I I like it. I
lot. Um and you know I I like it. I
think the few times I've tried it, it's worked without any sort of glitches and it works natively with um Logite. Um in
terms of like defensive stuff, I would say that the there are I kind of think of it in two ways. is one is you want to look at like historical data. Um going
back to your example of like hey I have a Gemini file search um and then modal and gradio um you might say for example for some user queries what is the cosign
similarity between the query and the sort of like results are being returned right how similar are they in the embedding space then you might find that okay if I if I want to have say relevant
queries maybe I'm I'm just going to get my model to do a second pass through right or I'm just going to get the model to check these historical queries to see whether or not the I'm using the right embedding model for example these are
like small checks you can add at the end like once the query has been made and the agent has responded right um another thing you can do is like as the agent is like running um determinism checks are
often useful when you have like open telemetry running and you kind of know like where the model gets like trapped so for example if let's say you're using a model that tends to get into a loop where it says like oh like once it sees
like this specific set of words or once it's once you detect like or from the chunks that are streaming out the last like 10 chunks or the five last five chunks are like roughly the same then
you should stop the response there and just evict like this message from memory that's like a defensive thing that you can do by seeing what you have in production. Um a lot of times I think
production. Um a lot of times I think you are limited by what sort of data you capture and what sort of um things you're looking out for and so I think it's very important to start when you're building your own agent frame when
you're building like your own agents think about like how can I find problems that I might face in production uh is there some sort of logging that I have out of the box for this um you can use
whatever you want I generally have been using logfire um they have also a nice ebouse package out of the box um but I I think you should just use whatever you're comfortable with. Um but I would say the most important thing is just you
have to be logging things and checking them um in post-production. uh not postp production like you need to be checking them on a regular basis looking at your data making sure that you know if you
see that a lot of conversations you're having perhaps are bit more sensitive um very deal with a lot of private data then perhaps you might want to have a small like language model as a judge um
that you just look at that just make sure that if the conversation gets a bit like tricky maybe you want to just sound the alarm um but these are quite difficult questions I think to deal with but most important thing is you must be
able to lock you're not afraid to deal with them.
Yeah.
Ivan, I'd really have to find myself if I didn't ask you and I I appreciate you can't give away any secret source, but is there anything you can tell us specifically about how you approach these things at Manis when building out
agents there?
Uh, I think I think Manis, we had a really good team. Um, so, but I will say like one of the most important things for me that made a difference when building agents was this idea of metaprompting. Um really what you want
metaprompting. Um really what you want is like to look at what the model where the model is messing up and then you can ask the model questions given the conversation history you save. Oh, why
did you make this decision? You can go back, you can edit, you can replay certain states. Um, and I think like
certain states. Um, and I think like that's pretty important like the logging forms the infrastructure for you to experiment fast. And I think like
experiment fast. And I think like whether it's at Manis or anywhere else, like being able to experiment quickly, being able to identify like issues um in production quickly is very important and that's something you should be thinking about once you get a working prototype.
Yeah.
Awesome. [snorts] Um, we got a another great question. Uh, with respect to I've
great question. Uh, with respect to I've got there's so much cool action in the Discord. I'm just like,
Discord. I'm just like, um, Seth Tam, welcome back, Seth Tam.
Seth Seth comes to a lot of our live streams and workshops. Always great to have you here, Seth. What about using a model like Quinn 3.5 35B? Um, if you want to do local stuff, less likely to
have tooling fall apart since it's not updated randomly like soda providers do.
I agree and disagree. I I think tool calling with soda providers, even when it falls apart, is better than most local open weight tool calling
currently. I actually think 2026 will be
currently. I actually think 2026 will be a year of tool calling getting mad with open open weight models. We're going to see um huge um developments there. Um
and I do think if you have a good reason to use open weight models, do but they're currently like 6 months give or take plus behind Frontier Lab closed models. So that can be a lot of
models. So that can be a lot of performance. But Ivan, do you do you
performance. But Ivan, do you do you play around much with with openw weight models? And it says asking about 35b. Of
models? And it says asking about 35b. Of
course you can. The beefier system you have the more you can run as well.
Um I think there's definitely a time and place for local models. For example,
like if you work on stuff that's a bit more private, um you might have requirements for like on-prem deployments or in your case like you want something local. Um the main problem with using a local model is that
you don't benefit from the frequent updates of a lot of the state of a lot of these solo providers. Um especially
as we upgrade and train our models and find fixes. Um second thing is you don't
find fixes. Um second thing is you don't need to own inference which is quite an expensive thing to own. So for example like if let's say using quen 35b I think it works great but let's say you want to
extend it and you have like a lot of context it tends to slow down quite a lot even if you have a very beefy local machine. Um but if you say for example
machine. Um but if you say for example use something like Gemini the KB caching is optimized right we have very good um sort of latencies when it comes to
responses um you saw just now basically how when we ran like Gemini tree flash preview the model was basically like so speedy that it felt like it was streaming right in the it was so speedy
that the responses felt like they were streaming right in and so I think a lot of these things especially as you're building your initial prototypes are quite important you can decide what matters to you but I personally I found that when I'm building like prototypes
and I'm trying different things, I don't really want to own inference. And I
think like building on top of providers like for example Gemini um tend to be a good call because um you don't have to own inference. I think
inference is an incredibly important thing, a very tricky thing to own. Yeah,
totally. All right. Um let's let's move on and we can have more questions after the the next section.
Sure. Um, I think I'm just going to skip like uh seven. I'm just going to go straight to like beautifying the outputs so we can kind of see our first sub aents intake. Uh, so let's let's see our
aents intake. Uh, so let's let's see our first few sub aents. Uh, let's clear this and let's go to Asian and then app.y. Right. And so what we're going to
app.y. Right. And so what we're going to do now is we're going to say for example, can you look up uh for me what is the current GDP
GDP GDP of California?
what is the current revenue of like California and how it's changed from 2023 to how and so let's let it run for a bit. Um you can see it start to
the to-dos but now we actually have sub agents that are running. So as the sub aents are spinning up um what the main agent is doing is that it's actually creating these search queries and these sub agents are basically running like
what we talked about before right they have access to the search web from exa and based on whatever they found they create a they write a final report and then this gets passed over to the main
agent who would then take this information and then decide do I check up my to-dos do I continue checking um do I continue searching and in this case like it's chosen to go for another
route, right? And this is basically our
route, right? And this is basically our first taste of what a sub agent can do.
Um I'm going to let it run for a bit and then we're going to look at the code. So
what exactly is happening over here?
Well, what we've done is that we basically created a new tool called delegate search. And this is the tool
delegate search. And this is the tool that's being used to create these sub aents over here. So with these sub aents running you can see that as of March 2026 California's revenue has seen a
significant recovery and growth and we have you know all of this information provided out of the box for you right um and this is really useful because we can
delegate a lot of this right so let's see the delegate search tool um so what we have now are two agents one is an agent that has read file modify to-do
and delegate search and we have a system instruction ction now which says that you know when do you use ask for web research or current information use delegate search and the way we're doing
that is that in app.py py um when we actually you know activate a sub agent per say um all we're doing is essentially creating a instance of this
agent up here where we specify that the run config should have a max of two we provided agent context where we actually have exa and then you are basically
going to have sub agent and then you you pass in a reference over here to a live panel enrich so a lot of what you saw just now where you know you were printing out the query then you saw search web the query search web query.
Um, what that is is basically under the hood we're using a library called rich and we can actually update like these panels in the in the terminal and this becomes really useful because we're starting to see the first few hints or
the first few steps where we have like a sub agent that's essentially running pretty well and so I think um the code is over here where you say that you know
if let's say it call search web then we're just going to update this specific panel. So you saw like how you had a
panel. So you saw like how you had a table and you had these three individual rows and then you can do a lot of this by basically using hooks like what we saw just now like it's the same thing.
It's you have an agent that calls tools that tool spins up its own agents. It
spins up and then all these agents are basically at one point that this patch they're running their own like inference queries when things change in their overall state their context. Um, and
what you do is you basically just grab that information and then pull it back.
Yeah. And then you update this like live table. This is sort of like the cool
table. This is sort of like the cool thing that we were looking at just now where you know, oh yeah, all these queries are basically running in sub aents with their own context and their
own prompts and all they return to the main agent is essentially just hey like here's the final report that I have. Um,
so this is our first taste of like what a sub agent is. So how can we go from this to let's say like a deep research query and maybe you can just say a bit more
about why we even want to use things like sub agents especially with respect to context I suppos yeah for sure I think um what you find a lot as you build like more and more
complex uh applications especially ones that are looking across like a whole bunch of like these like data sets or like how very complex like things um is that you're going to find basically um
there's going to be more context that you can do that you can deal with. So as
a result, I think like it's going to be quite um difficult to basically manage it all. And so sub agents are a way for
it all. And so sub agents are a way for you to basically ensure that um instead of your model having to deal with like all of these complex like bits and parts um you can basically allow your your
agent to basically go like okay I'm going to use a smaller language model a cheaper one. In this case, we're using
cheaper one. In this case, we're using 3.1 flash to read a lot of this information and then 3.1 Pro to do a lot of orchestration. And you can basically
of orchestration. And you can basically start seeing that 3.1 Pro's context will only contain the clean information of the written reports at the end with the right citations. And that's what you saw
right citations. And that's what you saw in like the overall trace. And so that's kind of why you want to use sub agents.
A common way a common like term that we're seeing is like recursive language models where language models are basically being trained or you can they're provided with tools where they can call like other language models or they can run Python scripts and they can
only see like the final results because if I have a large data set maybe what I want my model to do is basically be able to run like pandas and see like snippets of it run filters on the overall thing.
These are all things that sub agents are very useful for because they can preserve the main agents context. Um
that's kind of why you want to use sub agents here. Yeah.
agents here. Yeah.
Awesome. And just on that note, I have linked to a podcast I did with Brian Bishoff, who's a colleague and friend of both myself and Ivan. And um it's a conversation in which we really talk
about that sub agents can be a useful term, but they're also really just tools right?
For sure. For sure. Yeah. So I think like that's that's basically it. And
then if you look at uh for example the next portion where we're generating a plan, um what changes from this step to the other is that we introduce this idea of like phases. Uh so if you look at
state over here now we have this thing called a a mode um and so basically what changes is that we have a get tools um
prepare request prepare request over here and so what you have is just that you have a bunch of tools which are planning tools and executing tools and then we just swap them based on like what is active and what's not depending
on the mode it's so all we do is that at the end um once we finish like a tool call Um if let's say we have a
um I think the handle [clears throat] over here if I remember correctly it's inside the uh generate plan handle
and so once you've done this we just change the mode of the current run state to be basically um execute and so it goes from a plan state to a run state and then you can basically go from like
you it'll go from asking you questions like oh like if you want find out about California's GDP. Are you interested in
California's GDP. Are you interested in like official safe figures or do you want like me to look into like economist reports etc. So this is like I think a pretty cool and simple example of how you can like swap the tools on demand
depending on what needs to be done. So
yeah I think like we've run out of time like I actually got a bounce but all the code is available online. Uh I'll be releasing a blog series after this that has a walkthrough of all the code snippets and some of the ideas we've
talked about. I think Hugo will probably
talked about. I think Hugo will probably do the same. And so, um, if you have any questions, you drop in Discord. I think
Hugo and me will try our best to answer it. Hugo and I will try our best to
it. Hugo and I will try our best to answer it. Yeah.
answer it. Yeah.
Amazing. Thank you so much, Ivan. I was
actually just about to link to your website and and blog. Uh, you do have a sign up form for your newsletter. I
encourage everyone to jump in and get updates from Ivan, particularly as dude, I mean, once again, congratulations on your new job at at Deep Mind, which you started earlier this week. can't wait to see
what you do there with the team.
Congratulations on the move to America as as well. Um on top of that, like so thank you everyone for joining, but Ivan always thank you for coming and sharing your wisdom and expertise with the
community, but even more so having done all this prep knowing that you like you've been just starting a new job moving to America from from Singapore.
So big props um and gratitude to you as as well. Um please do um follow Ivan on
as well. Um please do um follow Ivan on all the platforms, Ivan Leo. Uh do
subscribe to Vanishing Gradients on Substack on Luma. Like and subscribe on YouTube. Like these are the ways. Share
YouTube. Like these are the ways. Share
with friends. The best ways you can support um this type of independent AI ML data science education is sharing it.
The more people engage um the more that we can we can do this and support this.
Any final things um you'd like people to check out?
Uh yeah, I mean um the blog article will come out probably over the weekend. Um
you know, as usual, follow us, follow the Gemini, the Google Deepmind team on on Twitter. We're releasing a lot of
on Twitter. We're releasing a lot of updates. Um we're trying our best to
updates. Um we're trying our best to make Gemini great and try the interactions API. Let us know what you
interactions API. Let us know what you think and we hope you like it. Yeah.
Awesome. All right. Thanks everyone once again and most of all, thank you Ivan for doing yet another fun fun workshop.
It's always fun to work with you, man.
Yeah. Thanks, man. Thanks for having me.
Loading video analysis...