AI Agents, Clearly Explained

By Jeff Su

Summary

## Key takeaways - **LLMs are passive, not proactive.**: Large Language Models like ChatGPT are passive tools that wait for a human prompt before generating an output. They lack access to proprietary or real-time personal information, limiting their knowledge base. [01:52] - **AI workflows follow human-programmed paths.**: AI workflows execute a series of predefined steps set by a human. If a query falls outside this programmed path, the workflow will fail, even if it seems like a logical next step. [03:04] - **AI Agents reason and act autonomously.**: The key difference between AI workflows and AI agents is that agents can reason about the best approach to achieve a goal and autonomously take action using tools, iterating until the goal is met. [06:16], [07:22] - **ReAct framework powers AI agents.**: The ReAct framework, which stands for Reason and Act, is the most common configuration for AI agents, enabling them to plan their actions and execute them using various tools. [06:51] - **AI agents automate complex tasks.**: An AI agent can automate tasks previously done by humans, such as reviewing footage to identify specific subjects like a 'skier,' and then indexing and returning relevant clips. [08:36]

Topics Covered

Why human decision-making limits AI workflows.
AI Agents: The shift from passive tools to autonomous decision-makers.
Reason, Act, Iterate: The core of AI agent intelligence.
The Three Levels of AI: LLMs, Workflows, and Autonomous Agents.

Full Transcript

AI. AI. AI. AI. AI.

AI. You know, more agentic. Agentic

capabilities. An AI agent. Agents.

Agentic workflows. Agents. Agents.

Agent. Agent. Agent. Agent. Agentic.

All right. Most explanations of AI

agents is either too technical or too

basic. This video is meant for people

like myself. You have zero technical

background, but you use AI tools

regularly and you want to learn just

enough about AI agents to see how it

affects you. In this video, we'll follow

a simple one, two, three learning path

by building on concepts you already

understand like chatbt and then moving

on to AI workflows and then finally AI

agents. All the while using examples you

will actually encounter in real life.

And believe me when I tell you those

intimidating terms you see everywhere

like rag, rag, or react, they're a lot

simpler than you think. Let's get

started. Kicking things off at level

one, large language models. Popular AI

chatbots like CHBT, Google Gemini, and

Claude are applications built on top of

large language models, LLMs, and they're

fantastic at generating and editing

text. Here's a simple visualization.

You, the human, provides an input and

the LLM produces an output based on its

training data. For example, if I were to

ask Chachi BT to draft an email

requesting a coffee chat, my prompt is

the input and the resulting email that's

way more polite than I would ever be in

real life is the output. So far so good

right? Simple stuff. But what if I asked

Chachi BT when my next coffee chat is?

Even without seeing the response, both

you and I know Chachi PT is gonna fail

because it doesn't know that

information. It doesn't have access to

my calendar. This highlights two key

traits of large language models. First

despite being trained on vast amounts of

data, they have limited knowledge of

proprietary information like our

personal information or internal company

data. Second, LLMs are passive. They

wait for our prompt and then respond.

Right? Keep these two traits in mind

moving forward. Moving to level two, AI

workflows. Let's build on our example.

What if I, a human, told the LM, "Every

time I ask about a personal event

perform a search query and fetch data

from my Google calendar before providing

a response." With this logic

implemented, the next time I ask, "When

is my coffee chat with Elon Husky?" I'll

get the correct answer because the LLM

will now first go into my Google

calendar to find that information. But

here's where it gets tricky. What if my

next follow-up question is, "What will

the weather be like that day?" The LM

will now fail at answering the query

because the path we told the LM to

follow is to always search my Google

calendar, which does not have

information about the weather. This is a

fundamental trait of AI workflows. They

can only follow predefined paths set by

humans. And if you want to get

technical, this path is also called the

control logic. Pushing my example

further, what if I added more steps into

the workflow by allowing the LM to

access the weather via an API and then

just for fun use a text to audio model

to speak the answer. The weather

forecast for seeing Elon Husky is sunny

with a chance of being a good boy.

Here's the thing. No matter how many

steps we add, this is still just an AI

workflow. Even if there were hundreds or

thousands of steps, if a human is the

decision maker, there is no AI agent

involvement. Pro tip: retrieval

augmented generation or rag is a fancy

term that's thrown around a lot. In

simple terms, rag is a process that

helps AI models look things up before

they answer, like accessing my calendar

or the weather service. Essentially, Rag

is just a type of AI workflow. By the

way, I have a free AI toolkit that cuts

through the noise and helps you master

essential AI tools and workflows. I'll

leave a link to that down below. Here's

a real world example. Following Helena

Louu's amazing tutorial, I created a

simple AI workflow using make.com. Here

you can see that first I'm using Google

Sheets to do something. Specifically

I'm compiling links to news articles in

a Google sheet. And this is that Google

sheet. Second, I'm using Perplexity to

summarize those news articles. Then

using Claude and using a prompt that I

wrote, I'm asking Claude to draft a

LinkedIn and Instagram post. Finally, I

can schedule this to run automatically

every day at 8 a.m. As you can see, this

is an AI workflow because it follows a

predefined path set by me. Step one, you

do this. Step two, you do this. Step

three, you do this. And finally

remember to run daily at 8 am. One last

thing, if I test this workflow and I

don't like the final output of the

LinkedIn post, for example, as you can

see right here, uh, it's not funny

enough and I'm naturally hilarious

right? I'd have to manually go back and

rewrite the prompt for Claude. Okay? And

this trial and error iteration is

currently being done by me, a human. So

keep that in mind moving forward. All

right, level three, AI agents.

Continuing the make.com example, let's

break down what I've been doing so far

as the human decision maker. With the

goal of creating social media posts

based off of news articles, I need to do

two things. First, reason or think about

the best approach. I need to first

compile the news articles, then

summarize them, then write the final

posts. Second, take action using tools.

I need to find and link to those news

articles in Google Sheets. Use

Perplexity for real-time summarization

and then claw for copyrightiting. So

and this is the most important sentence

in this entire video. The one massive

change that has to happen in order for

this AI workflow to become an AI agent

is for me, the human decision maker, to

be replaced by an LLM. In other words

the AI agent must reason. What's the

most efficient way to compile these news

articles? Should I copy and paste each

article into a word document? No, it's

probably easier to compile links to

those articles and then use another tool

to fetch the data. Yes, that makes more

sense. The AI agent must act, aka do

things via tools. Should I use Microsoft

Word to compile links? No. Inserting

links directly into rows is way more

efficient. What about Excel? M. So the

user has already connected their Google

account with make.com. So Google Sheets

is a better option. Pro tip. Because of

this, the most common configuration for

AI agents is the react framework. All AI

agents must reason and act. So

react. Sounds simple once we break it

down, right? A third key trait of AI

agents is their ability to iterate.

Remember when I had to manually rewrite

the prompt to make the LinkedIn post

funnier? I, the human, probably need to

repeat this iterative process a few

times to get something I'm happy with

right? An AI agent will be able to do

the same thing autonomously. In our

example, the AI agent would autonomously

add in another LM to critique its own

output. Okay, I've drafted V1 of a

LinkedIn post. How do I make sure it's

good? Oh, I know. I'll add another step

where an LM will critique the post based

on LinkedIn best practices. And let's

repeat this until the best practices

criteria are all met. And after a few

cycles of that, we have the final

output. That was a hypothetical example.

So let's move on to a real world AI

agent example. Andrew is a preeeminent

figure in AI and he created this demo

website that illustrates how an AI agent

works. I'll link the full video down

below, but when I search for a keyword

like skier, enter the AI vision agent in

the background is first reasoning what a

skier looks like. A person on skis going

really, fast, in, snow,, for example,, right?

I'm not sure. And then it's acting by

looking at clips in video footage

trying to identify what it thinks a

skier is, indexing that clip, and then

returning that clip to us. Although this

might not feel impressive, remember that

an AI agent did all that instead of a

human reviewing the footage beforehand

manually identifying the skier, and

adding tags like skier, mountain, ski

snow. The programming is obviously a lot

more technical and complicated than what

we see in the front end, but that's the

point of this demo, right? The average

user like myself wants a simple app that

just works without me having to

understand what's going on in the back

end. Speaking of examples, I'm also

building my very own basic AI agent

using Nan. So, let me know in the

comments what type of AI agent you'd

like me to make a tutorial on next. To

wrap up, here's a simplified

visualization of the three levels we

covered today. Level one, we provide an

input and the LM responds with an

output. Easy. Level two, for AI

workflows, we provide an input and tell

the LM to follow a predefined path that

may involve in retrieving information

from external tools. The key trait here

is that the human programs a path for LM

to follow. Level three, the AI agent

receives a goal and the LM performs

reasoning to determine how best to

achieve the goal, takes action using

tools to produce an interim result

observes that interim result, and

decides whether iterations are required

and produces a final output that

achieves the initial goal. The key trait

here is that the LLM is a decision maker

in the workflow. If you found this

helpful, you might want to learn how to

build a prompts database in Notion. See

you on the next video. In the

meantime, have a great one.

Loading...

Loading video analysis...