AI Agents, Clearly Explained
By Jeff Su
Summary
## Key takeaways - **LLMs are passive, not proactive.**: Large Language Models like ChatGPT are passive tools that wait for a human prompt before generating an output. They lack access to proprietary or real-time personal information, limiting their knowledge base. [01:52] - **AI workflows follow human-programmed paths.**: AI workflows execute a series of predefined steps set by a human. If a query falls outside this programmed path, the workflow will fail, even if it seems like a logical next step. [03:04] - **AI Agents reason and act autonomously.**: The key difference between AI workflows and AI agents is that agents can reason about the best approach to achieve a goal and autonomously take action using tools, iterating until the goal is met. [06:16], [07:22] - **ReAct framework powers AI agents.**: The ReAct framework, which stands for Reason and Act, is the most common configuration for AI agents, enabling them to plan their actions and execute them using various tools. [06:51] - **AI agents automate complex tasks.**: An AI agent can automate tasks previously done by humans, such as reviewing footage to identify specific subjects like a 'skier,' and then indexing and returning relevant clips. [08:36]
Topics Covered
- Why human decision-making limits AI workflows.
- AI Agents: The shift from passive tools to autonomous decision-makers.
- Reason, Act, Iterate: The core of AI agent intelligence.
- The Three Levels of AI: LLMs, Workflows, and Autonomous Agents.
Full Transcript
AI. AI. AI. AI. AI.
AI. You know, more agentic. Agentic
capabilities. An AI agent. Agents.
Agentic workflows. Agents. Agents.
Agent. Agent. Agent. Agent. Agentic.
All right. Most explanations of AI
agents is either too technical or too
basic. This video is meant for people
like myself. You have zero technical
background, but you use AI tools
regularly and you want to learn just
enough about AI agents to see how it
affects you. In this video, we'll follow
a simple one, two, three learning path
by building on concepts you already
understand like chatbt and then moving
on to AI workflows and then finally AI
agents. All the while using examples you
will actually encounter in real life.
And believe me when I tell you those
intimidating terms you see everywhere
like rag, rag, or react, they're a lot
simpler than you think. Let's get
started. Kicking things off at level
one, large language models. Popular AI
chatbots like CHBT, Google Gemini, and
Claude are applications built on top of
large language models, LLMs, and they're
fantastic at generating and editing
text. Here's a simple visualization.
You, the human, provides an input and
the LLM produces an output based on its
training data. For example, if I were to
ask Chachi BT to draft an email
requesting a coffee chat, my prompt is
the input and the resulting email that's
way more polite than I would ever be in
real life is the output. So far so good
right? Simple stuff. But what if I asked
Chachi BT when my next coffee chat is?
Even without seeing the response, both
you and I know Chachi PT is gonna fail
because it doesn't know that
information. It doesn't have access to
my calendar. This highlights two key
traits of large language models. First
despite being trained on vast amounts of
data, they have limited knowledge of
proprietary information like our
personal information or internal company
data. Second, LLMs are passive. They
wait for our prompt and then respond.
Right? Keep these two traits in mind
moving forward. Moving to level two, AI
workflows. Let's build on our example.
What if I, a human, told the LM, "Every
time I ask about a personal event
perform a search query and fetch data
from my Google calendar before providing
a response." With this logic
implemented, the next time I ask, "When
is my coffee chat with Elon Husky?" I'll
get the correct answer because the LLM
will now first go into my Google
calendar to find that information. But
here's where it gets tricky. What if my
next follow-up question is, "What will
the weather be like that day?" The LM
will now fail at answering the query
because the path we told the LM to
follow is to always search my Google
calendar, which does not have
information about the weather. This is a
fundamental trait of AI workflows. They
can only follow predefined paths set by
humans. And if you want to get
technical, this path is also called the
control logic. Pushing my example
further, what if I added more steps into
the workflow by allowing the LM to
access the weather via an API and then
just for fun use a text to audio model
to speak the answer. The weather
forecast for seeing Elon Husky is sunny
with a chance of being a good boy.
Here's the thing. No matter how many
steps we add, this is still just an AI
workflow. Even if there were hundreds or
thousands of steps, if a human is the
decision maker, there is no AI agent
involvement. Pro tip: retrieval
augmented generation or rag is a fancy
term that's thrown around a lot. In
simple terms, rag is a process that
helps AI models look things up before
they answer, like accessing my calendar
or the weather service. Essentially, Rag
is just a type of AI workflow. By the
way, I have a free AI toolkit that cuts
through the noise and helps you master
essential AI tools and workflows. I'll
leave a link to that down below. Here's
a real world example. Following Helena
Louu's amazing tutorial, I created a
simple AI workflow using make.com. Here
you can see that first I'm using Google
Sheets to do something. Specifically
I'm compiling links to news articles in
a Google sheet. And this is that Google
sheet. Second, I'm using Perplexity to
summarize those news articles. Then
using Claude and using a prompt that I
wrote, I'm asking Claude to draft a
LinkedIn and Instagram post. Finally, I
can schedule this to run automatically
every day at 8 a.m. As you can see, this
is an AI workflow because it follows a
predefined path set by me. Step one, you
do this. Step two, you do this. Step
three, you do this. And finally
remember to run daily at 8 am. One last
thing, if I test this workflow and I
don't like the final output of the
LinkedIn post, for example, as you can
see right here, uh, it's not funny
enough and I'm naturally hilarious
right? I'd have to manually go back and
rewrite the prompt for Claude. Okay? And
this trial and error iteration is
currently being done by me, a human. So
keep that in mind moving forward. All
right, level three, AI agents.
Continuing the make.com example, let's
break down what I've been doing so far
as the human decision maker. With the
goal of creating social media posts
based off of news articles, I need to do
two things. First, reason or think about
the best approach. I need to first
compile the news articles, then
summarize them, then write the final
posts. Second, take action using tools.
I need to find and link to those news
articles in Google Sheets. Use
Perplexity for real-time summarization
and then claw for copyrightiting. So
and this is the most important sentence
in this entire video. The one massive
change that has to happen in order for
this AI workflow to become an AI agent
is for me, the human decision maker, to
be replaced by an LLM. In other words
the AI agent must reason. What's the
most efficient way to compile these news
articles? Should I copy and paste each
article into a word document? No, it's
probably easier to compile links to
those articles and then use another tool
to fetch the data. Yes, that makes more
sense. The AI agent must act, aka do
things via tools. Should I use Microsoft
Word to compile links? No. Inserting
links directly into rows is way more
efficient. What about Excel? M. So the
user has already connected their Google
account with make.com. So Google Sheets
is a better option. Pro tip. Because of
this, the most common configuration for
AI agents is the react framework. All AI
agents must reason and act. So
react. Sounds simple once we break it
down, right? A third key trait of AI
agents is their ability to iterate.
Remember when I had to manually rewrite
the prompt to make the LinkedIn post
funnier? I, the human, probably need to
repeat this iterative process a few
times to get something I'm happy with
right? An AI agent will be able to do
the same thing autonomously. In our
example, the AI agent would autonomously
add in another LM to critique its own
output. Okay, I've drafted V1 of a
LinkedIn post. How do I make sure it's
good? Oh, I know. I'll add another step
where an LM will critique the post based
on LinkedIn best practices. And let's
repeat this until the best practices
criteria are all met. And after a few
cycles of that, we have the final
output. That was a hypothetical example.
So let's move on to a real world AI
agent example. Andrew is a preeeminent
figure in AI and he created this demo
website that illustrates how an AI agent
works. I'll link the full video down
below, but when I search for a keyword
like skier, enter the AI vision agent in
the background is first reasoning what a
skier looks like. A person on skis going
really, fast, in, snow,, for example,, right?
I'm not sure. And then it's acting by
looking at clips in video footage
trying to identify what it thinks a
skier is, indexing that clip, and then
returning that clip to us. Although this
might not feel impressive, remember that
an AI agent did all that instead of a
human reviewing the footage beforehand
manually identifying the skier, and
adding tags like skier, mountain, ski
snow. The programming is obviously a lot
more technical and complicated than what
we see in the front end, but that's the
point of this demo, right? The average
user like myself wants a simple app that
just works without me having to
understand what's going on in the back
end. Speaking of examples, I'm also
building my very own basic AI agent
using Nan. So, let me know in the
comments what type of AI agent you'd
like me to make a tutorial on next. To
wrap up, here's a simplified
visualization of the three levels we
covered today. Level one, we provide an
input and the LM responds with an
output. Easy. Level two, for AI
workflows, we provide an input and tell
the LM to follow a predefined path that
may involve in retrieving information
from external tools. The key trait here
is that the human programs a path for LM
to follow. Level three, the AI agent
receives a goal and the LM performs
reasoning to determine how best to
achieve the goal, takes action using
tools to produce an interim result
observes that interim result, and
decides whether iterations are required
and produces a final output that
achieves the initial goal. The key trait
here is that the LLM is a decision maker
in the workflow. If you found this
helpful, you might want to learn how to
build a prompts database in Notion. See
you on the next video. In the
meantime, have a great one.
Loading video analysis...