LongCut logo

How to Build Reliable AI Agents (without the hype)

By Dave Ebbelaar

Summary

## Key takeaways - **Ignore 99% of AI Hype**: You can ignore 99% of everything you see online and focus on core foundational building blocks to build reliable AI agents, as most frameworks and tools are abstractions over LLM providers. [00:48], [04:19] - **Agents Are Deterministic Software**: The most effective AI agents aren't agentic at all; they're mostly deterministic software with strategic LLM calls placed exactly where they add value. [05:09], [05:14] - **Minimize LLM Calls in Production**: In production environments for clients, we almost never rely on tool calls; use as few LLM API calls as possible and only when deterministic code can't solve the problem. [06:48], [06:57] - **Structured Output Over Tool Calls**: Prefer structured output to classify intent into categories with reasoning, then use if-else routers in code, rather than tool calls, for easier debugging in complex systems. [20:04], [20:48] - **Seven Foundational Building Blocks**: The seven building blocks are intelligence layer, memory, tools, validation, control, recovery, and feedback to break down problems and build reliable agents. [08:08], [26:53] - **Human-in-Loop for Complex Tasks**: For important or complex processes like sensitive emails, insert human approval steps with a full stop before execution to ensure reliability. [23:35], [24:07]

Topics Covered

  • Ignore 99% of AI Hype
  • Agents Are Deterministic Software
  • Minimize LLM Calls in Automations
  • Structured Output Beats Tool Calls
  • Human Feedback Ensures Reliability

Full Transcript

If you're a developer, then right now it feels almost impossible to keep up with everything that's going on within the AI space. Everyone's talking about AI

space. Everyone's talking about AI agents. Your whole LinkedIn feeds and AX

agents. Your whole LinkedIn feeds and AX feeds are full of it. Everyone makes it seem super easy. Yet, you are still trying to figure out, should I use Lang Chain or Llama Index and trying to debug

and figure out all of these AI agent systems that you're building and tinkering with. All of the tutorials

tinkering with. All of the tutorials that you'll find either are messy or contradicting. And every week there is a

contradicting. And every week there is a new fire ship video dropping something new where you're like, "Oh do we now also need to know this?" So all in all, it's a complete mess. And my goal

with this video is to help calm your AI anxiety and give you clarity on what's currently really going on within the AI space and why you can pretty much ignore

99% of everything that you see online and just focus on the core foundational building blocks that you can use to build reliable and effective agents. So

in this video, I'm going to walk you through the seven foundational building blocks that you need to understand when you want to build AI agents regardless of what tool you are using. Now I'm

going to give these code examples in the Python programming language, but honestly it doesn't matter what tool you use whether that's NAN, TypeScript, Java or any other programming language. If

you boil it down to these seven foundational building blocks, you can implement it in anything because they're so simple. So, I will execute these

so simple. So, I will execute these simple code blocks and show you the output and walk you through everything step by step with diagrams as well. So

that even if you've never written a single line of Python, you can still follow this video and I can guarantee you that after watching this video, you'll have a completely different perspective on what it takes to build

effective AI agents and you'll be able to look at almost any problem, break it down, and know the patterns and the building blocks you need in order to solve it and automate it. And then you might be thinking, okay, like Dave, but why should I listen to you? Aren't you

just part of the 99% of noise you just mentioned? Well, I'll let that up to

mentioned? Well, I'll let that up to you. But I have somewhat of a unique

you. But I have somewhat of a unique perspective on the AI market right now.

And that's because I have over 10 years of experience in AI. So, I have background in machine learning and I run my own AI development company where over the past years we've done over 20 full system deployments. Next to that, I have

system deployments. Next to that, I have plenty of communities with thousands of members combined. So, that gives me a

members combined. So, that gives me a really good pulse on everything that's going on, what's working and what's not.

And next to that, I also regularly interview industry leaders like Jason Leu or Dan Martell, for example. So, I

briefly wanted to bring that up just so you know that I'm not just some random guy building NAT workflows from his bedroom. And there's nothing wrong with

bedroom. And there's nothing wrong with that. It's just a different category.

that. It's just a different category.

Clients work with us and hire us to build production ready systems that they can rely on. And that's that knowledge from that I want to bring to you in this video. And the big big problem that I

video. And the big big problem that I see right now within the AI space and why you feel so confused as a developer all comes from this simple image over here. There is simply a lot of money

here. There is simply a lot of money flowing into the market. And every time throughout history when there's an opportunity like that, what happens?

People jump on it because people want to try and capitalize on it. So even if you're remotely interested in AI, this is what most of your social media feeds will look like. And they all make it

seem super easy. There are all these tools that you can use to build full agent armies. Yet you are still

agent armies. Yet you are still wondering like where do I start and how do I make this all work in really a production ready environment. And on top of that you have of course all of the

frameworks and libraries that follow a similar trend right so developer tools GitHub repositories all kinds of tools that make it seem super easy to build these AI agents. And then of course we

have the news, everything that's going on and the plenty of other tools that are built on top of that that you can also use. And this all results in you

also use. And this all results in you feeling really overwhelmed and having no idea what's going on and what to focus on. And now there's a clear distinction

on. And now there's a clear distinction between the top developers and teams that are actually shipping AI systems that make it to production versus the developers that are still trying to

debug the latest agent frameworks. and

that is that most developers follow all the hype you see on social media, the frameworks, the media attention and the plethora of AI tools that are out there while the smart developers realize that everything that you see over here is

simply an abstraction over the current industry leaders the LLM model providers and once you realize that as a developer building AI systems and you start to work directly with these model providers

APIs you realize that you can actually ignore 99% of the stuff that you see online and also O realized that fundamentally nothing has pretty much changed since function calling was

introduced. Yes, models get better, but

introduced. Yes, models get better, but the way we work with these LLMs, it's still the same. And every time I bring this up, people are like, "What? How is

this possible?" But our code bases from 2 years ago, they still run. They still

work. We only have to change the model endpoints through the APIs because we've engineered them in such a way to not be reliant on frameworks that are essentially built on Quicksand. So all

this context that I'm providing right now is super important just like with LLMs because otherwise the rest of this video and the seven core building blocks won't make a lot of sense. So the first most important thing to understand is

that if you look at the top teams building AI systems, they use custom building blocks, not frameworks. And

that is because the most effective AI agents aren't actually that agentic at all. They're mostly deterministic

all. They're mostly deterministic software with strategic LLM calls placed exactly where they add value. So, the

problem with most agent frameworks and tutorials out there is that most of them push for giving your LLM just a bunch of tools and let it figure out how to solve the problem. But in reality, you don't

the problem. But in reality, you don't want your LLM making every decision. You

want it handling the one thing that's really good at reasoning with context while your code or application handles everything else. And the solution is

everything else. And the solution is actually quite straightforward. It's

just software engineering. So instead of making an LLM API call with 15 tools, you want to tactfully break down what you're actually building into fundamental components, solve each

problem with proper software engineering best practices and only include an LLM step when it's impossible to solve it with deterministic code. Making an LLM API call right now is the most expensive

and dangerous operation in software engineering and it's super powerful but you want to avoid it at all cost and only use them when it's absolutely necessary and this is especially true

for background automation systems. This is a super important concept to understand. There is a huge difference

understand. There is a huge difference between building personal assistants like chat GPT or cursor where users are in the loop versus building fully automated systems that process

information or handle workflows without humor in defension. And let's face it, most of you aren't building the next chat or cursor. You're building backend automations to make your work or your company more efficient. So when you are

building personal assistant like applications, using tools and multiple LLM calls can be more effective. But

when you're building background automation system, you really want to reduce them. And for example, for our

reduce them. And for example, for our production environments for our clients, we almost never rely on tool calls. So

you want to build your applications in such a way where you need as little LLM API calls as possible. Only when you can't solve the problem anymore with deterministic code, that's when you make

a call. And when you get to that point,

a call. And when you get to that point, it's all about context engineering.

Because in order to get a good answer back from an LLM, you need the right context at the right time sent to the right model. So you need to pre-process

right model. So you need to pre-process all the available information, prompts, and user inputs so the LLM can easily and reliably solve the problem. This is

the most fundamental skill in working with LLMs. And then the final thing that you need to understand is that most AI agents are simply workflows or DAGs if you want to be precise or just graphs if

you include loops. And most steps in these workflows should be regular code, not LLM calls. So what I'm trying to do in this video is really help you understand AI agents from a foundational

level from first principles. And now

that we've set the stage, we get into the foundational building blocks that you need. And there are really only

you need. And there are really only seven or so that you use in order to take a problem, break it down into smaller problems, and then try and solve each of those sub problems with these building blocks that I will introduce to

you right now. And building block number one is what I call the intelligence layer. So super obvious, right? This is

layer. So super obvious, right? This is

the only truly AI component in there.

And this is where the magic happens. So

this is where you make the actual API call to the large language model.

Without this, you just have regular software. The tricky part isn't the LLM

software. The tricky part isn't the LLM call itself. That's super

call itself. That's super straightforward. It's everything else

straightforward. It's everything else that you need to do around it. So the

pattern here is you have a user input, you send it to the LLM, and the LLM will send it back to you. Now we can very easily do this in the Python programming

language using for example the open AI Python SDK where we connect with the client. We pretty much select which

client. We pretty much select which model we want to use. We plug in a prompt and then we wait for the response. So simply running this can be

response. So simply running this can be done in any programming language. It can

be done directly with the API. It can be done with N8. But this is the first foundational building block. you need a way to communicate with these models and get information back from it. Then

building block number two is the memory building block and this ensures context persistence across your interactions with these models because LLMs don't remember anything from previous

messages. They are stateless and without

messages. They are stateless and without memory each interaction starts from scratch. So you need to manually pass in

scratch. So you need to manually pass in the conversation history each time. This

is just storing and passing a conversation state, something we've been doing in web apps forever. So to build on top of the intelligence layer that we just saw, now next to just providing a

user input prompt, we also get the previous context and we structure that in a conversation like sequence where we have a sequence of messages. Then we can

get the response and within that process we also have to handle updating our conversation history. And within the

conversation history. And within the second file over here called memory.py

Pi, we see an example where we ask the AI to tell us a joke. Then we ask a follow-up question to ask what was my previous question, but we don't handle the conversation history correctly. And

because LLMs are stateless, if we run this, it will simply don't know. And

then here in this function, we have a proper example of how to handle memory where we pass in the conversation history and we have an alternating sequence between user and assistant. We

are now programming this dynamically within our code. In a more realistic example, you would store and retrieve this from a database. And to demo this, we can simply run the first joke. So why

do programmers prefer dark mode? Because

light attracts bugs. Then we ask it a follow-up question to ask it what was my previous question. And it says, I'm

previous question. And it says, I'm unable to recall previous interactions.

And lastly, we do it properly where we pass down the previous answer. And then

your previous question was asking for a joke about programming. So now it understands the context of the conversation history. And then building

conversation history. And then building block number three is what we call tools for external system integration capabilities. Because most of the time

capabilities. Because most of the time you need your LLM to actually do stuff and not just chat because pure text generation is limited. You want to call

APIs, update databases or read files.

tools let your LLM say I need to call this function with these parameters and your code handles the actual execution of that. So if we then look at the

of that. So if we then look at the diagram over here, we augment the intelligence layer, the LLM API call potentially also with memory and we now

also provide the LLM with tools. And for

every API call with tools, the LLM decides should I use one or more of the tools that I have available? Yes or no?

If no, I'll give a direct response, a text answer back. If yes, then select the tool. Then your actual code is

the tool. Then your actual code is responsible for catching that and executing the tool. then pass the result one more time to the LLM for it to format the final response yet again in a

text answer for you. And tool calling is also directly available in all of the major model providers. So no need for any external frameworks or libraries. We

just specify the function that we want to call. We transform that into a tool

to call. We transform that into a tool schema that we then make available to the LLM. Our code will perform a simple

the LLM. Our code will perform a simple check to see if the LLM actually decided to call the tool. then pass in the parameters into the actual function and run it and then pass it back to the LLM

one more time. And if we run all of this, we can now see that we now have an LLM that using the get weather function is able to get the weather information for any given city or place that you can

think of. So we augment the LLM beyond

think of. So we augment the LLM beyond just the text generation capabilities based on what the model was trained on.

So through tools, we give the model a way to integrate and to connect with external systems. And now if you're entirely new to tool calling, you've never done it, this can actually be a little tricky to understand. You need to

see a couple of examples. So for that, I highly recommend to check out the official documentation from OpenAI on function calling. Or if you want to go

function calling. Or if you want to go really deep with this, I have this full beginner course here on YouTube, building AI agents in pure Python, which is actually a really good follow-up for this video, which I will link in the description as well. All right. So now

that we understand the basic operations that you can do with a large language model, we get to building block number four, which is arguably the most important one, and that is validation.

So this can help us with quality assurance and structured data enforcement. So if you want to build

enforcement. So if you want to build effective applications around large language models, you need a way to make sure the LLM returns JSON that matches your expected schema because LLMs are

probabilistic and can produce inconsistent outputs. If you ask it one

inconsistent outputs. If you ask it one question and then another user asks a question in a slightly different way, they can get a completely different answer. One might be correct, the other

answer. One might be correct, the other one might be wrong. So you validate the JSON output against a predefined structure. If the validation fails, you

structure. If the validation fails, you can send it back to LLM and fix it. And

this concept is known as structured output. And we need that structured

output. And we need that structured output, which is super crucial so we can engineer systems around it. So rather

than just asking a question to a large language model and getting text back, we want a predefined JSON schema where we are 100% sure that what we're getting back contains the actual fields that we

can use later down the line within our application. So the diagram looks like

application. So the diagram looks like this. We ask an LLM to provide us with

this. We ask an LLM to provide us with structured output which is simply just JSON. We validate it against a schema

JSON. We validate it against a schema using a library like Pentic or S or Python data classes. We check whether it's valid. If it's valid, we have the

it's valid. If it's valid, we have the structured data. If it's not valid, we

structured data. If it's not valid, we take the error response and we send it back to the LLM in order to correct it.

So, let's say you want to build some agentic task management tool that can transform natural language and put it into task with due dates and priorities.

Instead of just building an agent and let it interact with the user and then just provide a text output back of what it thinks is the goal of the user, we can define a specific data structure. In

this example, I'm doing this with the pyentic library within Python. And

getting structured output from large language models is yet again also supported by all of the major model providers. So here you can see that when

providers. So here you can see that when we're making the LLM API call to OpenAI, we have the parameter of text format that we can put in here and we set that

equal to the defined task result object that we want to get back from the LLM.

So now with our system prompt and instructing our model to extract task information from the user input and our prompt I need to complete the project presentation by Friday is high priority.

We can run this and get an actual validated structured output data object back where we can clearly see we have a task completed and a priority field that

we can now also programmatically call.

So we can now actually come to our result object. We can call the task on

result object. We can call the task on it and we actually get the information that's on there. And with techniques like this we can validate both the incoming data so what we send to the LLM

as well as the outcoming data of what the LLM is sending back to us. And you

now already understand that context engineering is one of the most important skills when it comes to building reliable LLM applications. So using

libraries like Pientic are really at the core of that. And then real quick, if you're a developer and you ever thought about starting as a freelancer to maybe make a little bit more money on the side, work on fun projects, or just to learn, but you don't really know where

to start or how to find that first client, you might want to check out the first link in the description. It's a

video of me going over how my company can help you with this. We've been

running this program for over 3 years.

We have hundreds of case studies of developers who already successfully made the leap and are now working on exciting projects either next to their full-time job or even doing this full-time. And

now, if you feel like you're not technically ready yet for freelancing or don't feel comfortable, there's a second link in there as well that will teach you everything that you need to know about how to get ready for freelancing as an AI engineer. And that brings us to

building block number five, and that is control for deterministics, decision-m, and process flow. So you don't want your LLM making every decision. Some things

should be handled by regular code. You

can use if else statements, switch cases, and routing logic to direct flow based on conditions. This is just normal business logic and routing that you would write in any application. So if we

look at the diagram over here, we have incoming data. We can for example use an

incoming data. We can for example use an LLM to classify the intent using structured output where we tell the LLM look here's the incoming message. It can

either be a question, it can be a request, a complaint or the category of other. We can now program our

other. We can now program our application using simple if statements where we do a quick check. If the

category equals question, we call this specific function that handles just that part of the application. If it's a request, we do something else. If it's a complaint, we have a different way of

handling it. We now make our workflow in

handling it. We now make our workflow in our process modular. We take a big problem and we break it down into smaller sub problems and categories that we can better solve individually. And

now here's what that looks like in a simple code example. So we again use Pentic here to define a data model where we can now specify the intent and we set that equal to a literal which is pretty

much a category. It can either be a question, a request or a complaint. If

it's set to something else, it will throw an error. We also have the confidence score and reasoning. We then

have a simple function that we can use to filter for the type of intent that we have. And then based on that, we can

have. And then based on that, we can call a specific function within our application. This is nothing more than

application. This is nothing more than using simple if else statements to create the router that we see over here in this diagram. So if I now run this, I will run through these three examples.

So three questions. What is machine learning? Please schedule a meeting for

learning? Please schedule a meeting for tomorrow. And I'm unhappy with my

tomorrow. And I'm unhappy with my surface quality. So you'll find that the

surface quality. So you'll find that the LLM is now going through this and it's going to determine the intent. So for

the first question, uh it determined that this is a question. Now it's a request in the second one and the third one is a complaint. And based on that, it handled it differently based on the

functions that we have sent it towards.

And now you can see that you can also start to chain multiple LLM calls together where if it's a question, we simply have another function that makes another call to OpenAI to handle it. For

a request, we might do something else.

Here we just do a simple print statement, but this can be anything.

This is how you make modular workflows where you implement the logic based on certain conditions. And now remember

certain conditions. And now remember this is all possible because we are first of all using structured output. So

we know that we get this data model back. Then in our code we look at the

back. Then in our code we look at the response from the LLM and we can say take the intent and then we can create a simple if else statement to do a simple

check. If the intent is equal to

check. If the intent is equal to question we do this. If the intent is equal to request we do this and so on.

And now at this point I can also get back to something I mentioned earlier in this video and that was being very careful with tool calls and that for our production environments we rarely use

tool calls at all. So what do we do?

Well, we almost always prefer to use structured output and let an LLM decide a specific type of category and then based on that category create simple

routers within our code that you just saw using if else statements to decide what function or tool if you will to use. Now, in simple cases, the result

use. Now, in simple cases, the result will be exactly the same whether you use a tool call or the approach that I just mentioned. But when your systems get

mentioned. But when your systems get more complex and you need to debug things, it can get very tricky as to figure out why an LLM did not decide to

use a tool call. Whereas if you use a classification step with categories and a reasoning for why it decided to use that category, you have a full log of

figuring out, okay, look, we have a bug within this step of our workflow for this particular data point where the LLM actually thought it was this category and it gave a reasoning for why it thought it was that category. And that's

also exactly what we were doing over here. So beyond just printing the

here. So beyond just printing the response and the classification, we also list the reasoning. So you can see that over here if I make this a little bit bigger. So the input, what is machine

bigger. So the input, what is machine learning reasoning? The input asks for

learning reasoning? The input asks for information or explanation about the concept etc. So this gives you an entire log as to the decision-making process of the LLM, which is super helpful for debugging. All right. And that brings us

debugging. All right. And that brings us to building block number six, and that is recovery. So things will go wrong in

is recovery. So things will go wrong in production. and APIs will be down. LLMs

production. and APIs will be down. LLMs

will return nonsense. Rate limits will hit you. And you need try catch blocks,

hit you. And you need try catch blocks, retire logic with back off and fallbacks responses when stuff breaks. This is

building reliable applications 101. This

is just standard error handling that you would implement in any production system. So it could look something like

system. So it could look something like this. You have a request coming in. You

this. You have a request coming in. You

check whether it's a success, yes or no, based on either an error that is happening or some kind of data that is present or not. If it's a success, you can simply return the result. All good.

But if it's not a success, there is an error, you can for example retry that first of all check is that even possible. So you can retry with a back

possible. So you can retry with a back off. Or if that's not possible at all,

off. Or if that's not possible at all, you have some kind of fallback scenario where you for example let the user know sorry I cannot help with this question because it couldn't find the right information in the knowledge base for

example. And here quickly a simple

example. And here quickly a simple example within the Python programming language you can use the try accept blocks where if something goes wrong within this first part of the code it

raises some type of error it will fall back to the exception. Now you can expand on this with a finally clause where you can try something in another case do this and then if that doesn't

work you can finally do something else.

Here you can build some kind of a recovery mechanism where we can for example check if a certain key here is available. So we try to access a field

available. So we try to access a field within a dictionary that is not present in this case. So it's not available.

We're using the fallback information and that results in a general output or a standard reply in this case. It's a very simple illustration. Uh this is

simple illustration. Uh this is infinitely complex when it comes to doing this properly within your own applications because every try accept block will be completely unique to what

problem you're trying to solve and what errors may or may not pop up. And then

the final building block is what we call feedback. So human oversight and

feedback. So human oversight and approval of workflows because some processes are just too tricky right now to be fully handled by your AI agents.

Sometimes you just want a human in the loop to check an LM's work before it goes live or sent to someone. So when a task or decision is too important or complex for full automation, like

sending very sensitive emails to customers or making purchases, adding approval steps where humans can review and approve or reject before execution

is crucial. This is a basic approval

is crucial. This is a basic approval workflow like you would build for any app, but then have really the human in the loop where it takes a full stop and it waits for that point. So let's say an

LLM generated a response or created some piece of content and before sending that out into the world, you want human review in there. So a human for example

getting a popup within Slack with a button like yes or no to review it, approve it, and then if it's all good, perfect. We'll send it, we'll execute

perfect. We'll send it, we'll execute it. If it's a no, we can potentially

it. If it's a no, we can potentially provide feedback as to what we need to adjust. And then we send that back to

adjust. And then we send that back to the LLM and we repeat the process one more time. And this is where we get back

more time. And this is where we get back to the point that I brought up about the importance of humans in the loop and the difference between building what I call AI assistants that directly work with

humans in the loop like chat GPT or cursor. the user asks something, the LM

cursor. the user asks something, the LM does something and the user immediately sees the feedback and can adjust and they can work in this dance going back and forth versus fully autonomous

systems that work in the background.

Let's say a customer care ticketing system fully autonomous. It comes in, the AI should solve that ticket, generate a response and then send it back. There is a big distinction between

back. There is a big distinction between the two. And almost all very effective

the two. And almost all very effective and great AI products, if it gets to a point where it gets too tricky, you instead of like just optimizing your prompt further and further and further,

you might just need a human in the loop in order to be on the safe side of things before shipping something into production where it works 80% of the time, but 20% of the time it's a

complete shitow. And how you can

complete shitow. And how you can actually implement that within your application or codebase here in seven feedback.py, by you can integrate or

feedback.py, by you can integrate or create a strategic moment where you have a full stop and do not let the agent continue until it gets an approval. Now

there are various ways that you can do this. This is a very simple example. You

this. This is a very simple example. You

can integrate some kind of front-end application. You can integrate something

application. You can integrate something like Slack. You'll probably need to set

like Slack. You'll probably need to set up some web hooks to ping back and forth. Super technical beyond the scope

forth. Super technical beyond the scope of this video, but the principle is the same. You want to create a full stop

same. You want to create a full stop where if you for example run this and I'm running this in the terminal over here to showcase this. Let's say we're generating a piece of content. So here

is the content generated right now. The

agent before it continues it waits for approval. So I'm here right now doing

approval. So I'm here right now doing this in the terminal but again in a real application you want to have some systems that users can easily interact with and get a notification from. So

let's say I do yes then final answer is approved. When I run it one more time, I

approved. When I run it one more time, I let it generate another piece of content. I can now ignore that or give

content. I can now ignore that or give essentially a no and just not approve this workflow. Now, you would also

this workflow. Now, you would also ideally then have some feedback around that. But again, the principle is the

that. But again, the principle is the same, a full stop before sending it off.

All right? So, those are your seven building blocks that you need to understand in order to build reliable AI agents. And now what you do is you take

agents. And now what you do is you take a big problem. You break it down into smaller problems. And for every smaller problem, you try to solve it using the

building blocks available only using an LLM API called the intelligence layer when you absolutely cannot get around it. And now if you want to learn how you

it. And now if you want to learn how you can orchestrate entire workflows using these building blocks, so how to actually combine them and piece them together, you want to check out the

workflow orchestration that I have over here. So this is the GitHub repository

here. So this is the GitHub repository for the course that I already shared that I have on YouTube. So this is the same same resource, same video. I will

link it below. It's a super good follow-up from this video where we will bring everything together. So, make sure to like this video, subscribe to the channel, and then go check out this video

Loading...

Loading video analysis...