10分钟讲清楚 Prompt, Agent, MCP 是什么
By 程序员老王
Summary
## Key takeaways - **User vs. System Prompts: Defining AI's Role**: User Prompts are direct requests, while System Prompts define the AI's persona, background, and tone, making interactions more natural by separating the AI's role from the user's explicit query. [00:48] - **AI Agents: Beyond Chatbots with AutoGPT**: AI Agents, exemplified by AutoGPT, enable AI to perform tasks by integrating user-defined functions. The agent acts as an intermediary, processing AI responses to call these functions and iteratively completing tasks. [03:00] - **Function Calling: Standardizing AI Tool Interaction**: Function Calling standardizes how AI interacts with tools by using structured JSON definitions for tool names, descriptions, and parameters, leading to more reliable and efficient AI responses than relying solely on System Prompts. [05:03] - **MCP: Centralizing AI Tools as Services**: MCP (Message Communication Protocol) standardizes communication between AI Agents and Tool services, allowing tools to be hosted as central services accessible by multiple agents, thus avoiding code duplication and simplifying management. [07:39] - **AI Collaboration: Gears in an Automated System**: User Prompts, System Prompts, AI Agents, Agent Tools, Function Calling, MCP, and AI models are not replacements but interconnected components, functioning like gears to form a complete system for AI automated collaboration. [10:11]
Topics Covered
- AI Agents: More Than Just Chatbots
- Function Calling: Standardizing AI Tool Interaction
- AI's Coexisting Communication Standards
- MCP: The AI's Universal Translator for Tools
- Embrace AI's Evolution: Consciously Meet the Future
Full Transcript
Lately, various AI terms have been flooding my screen.
Agent, MCP, Function Calling...
What exactly are they?
Some say an Agent is an intelligent entity.
But what is an intelligent entity?
Some say MCP is the USB protocol of the AI era.
So, can it connect to a USB drive?
What do they all really mean?
Today, in this video,
I'll use the simplest language possible
to connect these key concepts: Agent, MCP, Prompt, and Function Calling.
Let's tie them all together.
Let's get started now.
In 2023,
when OpenAI just released GPT,
AI seemed like just a chatbox.
We send a message to the AI model through the chatbox,
and the AI model generates a reply.
The message we send is called the User Prompt.
That is, the user's prompt.
Usually, it's the question we ask or what we want to say.
But in real life,
when we chat with different people,
even with the exact same words,
they will give different answers based on their own experiences.
For example, if I say my stomach hurts,
my mom might ask if I need to go to the hospital.
My dad might tell me to go to the bathroom.
My girlfriend might just blurt out,
"Get lost, I'm hurting too!"
But AI doesn't have such a persona.
So it can only give a generic,
safe, and standard answer,
which seems very boring.
So, we wanted to give AI a persona too.
The most direct method
is to combine the persona information with the user's message
and package it as one User Prompt to send over.
For example: "Act as my girlfriend.
I say my stomach hurts."
Then the AI might reply,
"Get lost, I'm hurting too!"
That sounds about right.
But the problem is, the phrase "Act as my gentle girlfriend"
isn't what we actually want to say.
It feels a bit out of character.
So, people decided to separate the persona information
into another prompt.
This is the System Prompt.
The system prompt.
The System Prompt mainly describes the AI's role, personality,
background information, tone, etc.
Basically, anything that isn't directly said by the user
can be put into the System Prompt.
Every time the user sends a User Prompt,
the system automatically sends the System Prompt along with it to the AI model.
This makes the whole conversation feel more natural.
In web-based chatbots,
the System Prompt is often preset by the system,
and users can't easily change it.
But generally,
the website will provide some settings.
For example, ChatGPT has a feature called "Customize ChatGPT".
Users can write their preferences in there,
and these preferences automatically become part of the System Prompt.
However, even with the perfect persona,
at the end of the day, AI is still a chatbot.
You ask a question,
at most it gives you an answer
or tells you how to do something.
But you're still the one doing the actual work.
So, can we let the AI complete tasks itself?
The first attempt was an open-source project
called AutoGPT.
It's a small program that runs locally.
If you want AutoGPT to help manage files on your computer,
you first need to write some file management functions,
like `list_files` to list directories,
`read_file` to read files, and so on.
Then you register these functions, along with their descriptions and usage instructions,
into AutoGPT.
AutoGPT generates a System Prompt based on this information,
telling the AI model what tools the user has provided,
what they do,
and if the AI wants to use them,
what format it should return.
Finally, it sends this System Prompt along with the user's request,
like "Help me find the Genshin Impact installation directory,"
to the AI model.
If the AI model is smart enough,
it will, in the requested format,
return a message to call a specific function.
After AutoGPT parses this,
it can call the corresponding function.
Then it sends the result back to the AI.
The AI then decides, based on the function call result,
what the next step should be.
This process repeats
until the task is completed.
People call programs like AutoGPT, responsible for relaying messages
between the model, tools, and the end user,
an AI Agent.
And these functions or services provided for the AI to call
are called Agent Tools.
However, this architecture has a small problem.
Although we clearly stated in the System Prompt
what format the AI should return,
an AI model,
is ultimately a probabilistic model.
It might still return content in the wrong format.
To handle these non-compliant situations,
many AI Agents will automatically retry
if they detect the AI returned the wrong format.
If it fails once, we try a second time.
Many well-known Agents on the market today,
like `llm-compiler` (formerly Cline), still use this method.
But this repeated retrying always feels a bit unreliable.
So, the major model providers stepped in.
ChatGPT Claude Gemini etc.
have successively launched a new feature called Function Calling.
The core idea of this feature is: unified format, standardized description.
Going back to the Genshin Impact example,
we used the System Prompt
to tell the AI what tools are available and the return format.
But these descriptions were written casually in natural language,
as long as the AI could understand.
Function Calling standardizes these descriptions.
For example, each Tool is defined using a JSON object for the tool name,
tool name in the 'name' field, function description in the 'description' field,
required parameters in 'parameters', and so on.
Then, these JSON objects were also separated from the System Prompt
and placed into a separate field.
Finally, Function Calling
also dictates the format the AI should return when using a tool.
So the format definition in the System Prompt can also be removed.
This way,
all tool descriptions are in the same place,
all tool descriptions follow the same format,
and AI responses when using tools also follow the same format.
This allows people to train AI models more specifically
to understand this calling scenario.
Even in this case,
if the AI still generates an incorrect response,
because the response format is fixed,
the AI server itself can detect it
and perform a retry.
The user wouldn't even notice.
This way,
it not only reduces development difficulty on the user side
but also saves token costs from client-side retries.
Because of these benefits,
more and more AI Agents now
are shifting from System Prompts to Function Calling.
But Function Calling has its own problem:
there's no unified standard.
Each major company's API definition is different,
and many open-source models don't yet support Function Calling.
So, writing a truly cross-model compatible AI Agent
is actually quite troublesome.
Therefore, both methods, System Prompt and Function Calling,
currently coexist in the market.
Everything we've discussed so far
is about the communication between the AI Agent and the AI model.
Next, let's look at the other side:
How does the AI Agent communicate with Agent Tools?
The simplest way is
to write the AI Agent and the Agent Tool in the same program.
Direct function call, done.
This is how most Agents work currently.
But later, people gradually realized
that some Tool functions are quite common.
For example, a web Browse tool
might be needed by multiple Agents.
I can't just copy the same code
into every Agent, right?
It's too cumbersome and inelegant.
So, people came up with an idea:
turn Tools into services, host them centrally,
and let all Agents call them.
This is MCP.
MCP is a communication protocol
specifically designed to standardize how Agents and Tool services interact.
The service running the Tool is called an MCP Server.
The Agent calling it is called an MCP Client.
MCP defines how the MCP Server communicates with the MCP Client,
and what interfaces the MCP Server must provide.
For example, interfaces to query which Tools are in the MCP Server,
the Tool's function, description, required parameters, format, etc.
Besides regular Tools in the form of function calls,
an MCP Server can also directly provide data,
offering services similar to file read/write, called Resources,
or provide prompt templates for the Agent, called Prompts.
The MCP Server can run on the same machine as the Agent,
communicating via standard input/output,
or it can be deployed on a network,
communicating via HTTP.
It's important to note here,
although MCP is a standard customized for AI,
MCP itself actually has no relationship with the AI model.
It doesn't care which model the Agent is using.
MCP is only responsible for helping the Agent manage tools, resources, and prompts.
Finally, let's summarize the entire process.
I hear my girlfriend has a stomach ache.
So I ask the AI Agent,
or rather, the MCP Client,
"What should I do if my girlfriend has a stomach ache?"
The Agent packages the question into a User Prompt.
Then, the Agent, using the MCP protocol,
retrieves information about all available Tools from the MCP Server.
The AI Agent converts this Tool information either into a System Prompt
or into the Function Calling format.
Then, it bundles it with the user's request (User Prompt)
and sends it all to the AI model.
The AI model notices a web Browse tool called `web_browse`.
So, through a regular reply or the Function Calling format,
it generates a request to call this Tool,
hoping to search the web for an answer.
After the Agent receives this request,
using the MCP protocol,
it calls the `web_browse` tool in the MCP Server.
After `web_browse` accesses the specified website,
it returns the content to the Agent.
The Agent then forwards it to the AI model.
The AI model, based on the web content and its own brainstorming,
generates the final answer:
"Drink more hot water."
Finally, the Agent displays the result to the user.
How my girlfriend praised me for being thoughtful afterwards...
I won't go into details here.
Anyway, this is how System Prompt,
User Prompt, AI Agent,
Agent Tool, Function Calling, MCP, and the AI model are connected and differ.
They don't replace each other.
Instead, like gears,
they together form the complete system of AI automated collaboration.
Many people say AI progress causes anxiety,
that humans will eventually be replaced by AI.
Will the future really turn out like that?
I don't know.
But what often comes to my mind isn't fear,
but a subtle excitement.
Because each of us is so small,
living in the cracks of the era.
Most of the time,
we can only watch helplessly as the torrent rushes by.
In every major technological shift we've experienced in the past,
ordinary people were just pushed forward.
Maybe they changed their way of work or life,
but they always remained outside the change itself.
I don't want to be swept along like this.
If we are destined to collide head-on with the times,
then this time, I want to meet it consciously.
Perhaps starting with understanding a small AI Agent.
This is Programmer Wang.
See you next time.
Loading video analysis...