Model Context Protocol (MCP) explained (with code examples)
By AssemblyAI
Summary
## Key takeaways - **MCP Standardizes AI Agent Interfaces**: MCP eliminates the need to build custom bridges for each service like local storage, databases, or APIs by standardizing the interface between AI agents and the tools and resources they need to be useful. [01:16], [01:30] - **LLMs Fail on Precision Math**: Claude incorrectly calculates subtracting 201,122 from 316,043 then taking the square root, giving 339 instead of the correct 339, and repeats different wrong answers due to its stochastic nature. [04:07], [04:14] - **Tools Make LLMs Reliable**: By providing Claude with subtraction and square root tools as wrappers for Python functions, it reliably plans steps, calls tools, and computes the correct answer of 339. [06:34], [08:02] - **Custom Tools Cause Maintenance Hell**: Building wrappers for APIs like Google Docs adds maintenance burden, security risks from credential leaks, and constant updates when services change. [11:14], [12:07] - **MCP Shifts Burden to Providers**: MCP requires service providers like Google to run MCP servers exposing tools, allowing developers to connect AI agents via a unified standard without building custom bridges. [15:11], [15:40] - **OpenAI Adoption Boosts MCP**: Stars on the MCP servers repository show steady climb in interest, accelerated by OpenAI's recent adoption, proving standards are useful only if people use them. [01:48], [02:06]
Topics Covered
- LLMs Fail Complex Precision Tasks
- Tools Make LLMs Reliable
- Custom Tools Create Maintenance Hell
- MCP Standardizes Agent Interfaces
- MCP Shifts Burden to Providers
Full Transcript
The model context protocol. What is it?
How does it work? And why does it matter? You've likely been seeing a lot
matter? You've likely been seeing a lot about the model context protocol recently. It's a relatively new protocol
recently. It's a relatively new protocol invented by Enthropic, and it seeks to standardize the way in which AI agents interact with resources like local storage, databases, or APIs. In this
video, I'll teach you everything you need to know about the model context protocol through a series of illustrated examples. First, you'll learn about the
examples. First, you'll learn about the stochastic limitations of LMS and their unreliability in certain scenarios. And
then we'll look at how tools were invented to overcome this issue with an example of having an LLM create and upload a document to my Google Docs.
Then we'll look at the problem MCP solves and we'll reframe this problem in terms of building an MCP server. And
finally, we'll end with a discussion of why MCP matters from a bird's eye view and what it really means for the future of AI agents. All right, let's jump in.
AI agents are currently able to interact with a variety of systems including local storage databases, remote tools like Assembly AI's API and other remote resources. Currently, this means that
resources. Currently, this means that for each of these services, we have to build a custom bridge so that the AI agent can interact with each of these resources. And that's represented by the
resources. And that's represented by the different shapes in this image. This
approach means that every time we want to add a capability to our LLM to perform some task reliably or interact with an external resource, we have to build such an adapter. MCP eliminates
this issue by standardizing the interface between AI agents and the tools and resources they need to be useful. We want AI agents that can
useful. We want AI agents that can perform a wide range of tasks for us.
But it's become clear in recent years that these are not going to be monolithic models. Instead, they'll be a
monolithic models. Instead, they'll be a network of interconnected models resources, and tools. So standardizing
the interface between these components means that developers and researchers can focus on building the actual pieces of these networks rather than focus on the nitty-gritty details of the wiring of these networks. But beyond the
standard itself, MCP is important because people are actually adopting it.
Standards are only useful if people are actually using them. And we can see a steady climb in the interest towards MCP when we look at stars on the servers repository. And OpenAI's recent adoption
repository. And OpenAI's recent adoption of MCP has only served to accelerate this growth. To understand the value of
this growth. To understand the value of MCP, let's take a step back and see how we got here. LM have been in the public eye for only 2 years, and they've already transformed the way in which we work. For a thorough history of LM and
work. For a thorough history of LM and how they developed, you can see our relevant piece on that. But I'll recap the important points here which are relevant to a discussion for MCP. LLMs
are fundamentally probabilistic systems. They're trained on a huge corpus of text and through this they learn patterns in natural language. This means that they
natural language. This means that they can generate coherent text that's relevant to the input they receive. For
example, we can use a language model to predict the next word in a sentence. A
language model represents a probability distribution over word sequences. So
given an initial word sequence like I went to the we can predict the most likely next word and then append that to the sentence and recursively keep doing this to generate text. Don't worry if you don't understand the notation just
focus on the concept. And in reality LM inference is actually more complicated than this but it's beyond the scope of this video. So check out some of our
this video. So check out some of our other resources if you want to learn about that. So this makes LMS good for
about that. So this makes LMS good for creative writing, drafting emails really any task in which the goal is to just create relevant coherent text.
However, people quickly learn that we can get LMS to do other things simply by phrasing the question in terms of natural language. For example, if we put
natural language. For example, if we put the word sequence, translate this from Spanish to English, elgato rojo, into a large language model, we can get the LM to predict the most likely next word.
And it turns out it's actually pretty good at giving the right answer. These
abilities, the so-called emergent abilities of LMS were an unexpected surprise. Very quickly, people began
surprise. Very quickly, people began using LMS for a variety of tasks benchmarking the performance and debating the ways in which we can measure this sort of general competence.
However, it quickly became apparent that performance varied drastically across tasks. For tasks that require high
tasks. For tasks that require high precision and reliable data processing LMS were not a valid choice. Even for
tasks that did not require high precision, but required many steps, LMS were still not a good choice. You can
see in this example, I've asked Claude to subtract 201,122 from 316,043 and then calculate the square root of the result. I define the prompt
here and then pass it into Claude and then print out the response. The correct
answer to this problem is 339. Here's
Claude's response. The answer is incorrect. And worse yet, if you repeat
incorrect. And worse yet, if you repeat the question, you can get a different answer every time. This is completely unsurprising given the stochastic nature of LMS, but frustrating to researchers and developers who thought that smart
enough LLMs could be a panacea to a wide range of problems. LLMs may be good at some targeted tasks and trivial tasks but their performance breaks down when the problem reaches a sufficient complexity. And this unreliability
complexity. And this unreliability precludes them from being deployed to serious applications. After all, we use
serious applications. After all, we use computers for their precision and reliability. And LMS are neither of
reliability. And LMS are neither of these, at least not for problems that are of a complexity that's important to humans. At first, researchers tried to
humans. At first, researchers tried to brute force the problem by simply continuing the process which made LLM useful in the first place. Scaling LLMs
are generally performant because they're well large. They have billions of
well large. They have billions of parameters and they're trained on massive data sets. It's precisely the scaling process from small language models to large language models in which LLMs gain many useful skills. So, it
seemed reasonable that scaling the models even bigger would yield even better results. However, it soon became
better results. However, it soon became clear that this scaling process could not continue and was not the way forward to more useful and competent general AI.
The inherent probabilistic nature of LM is a fundamental limitation and not something that can be scaled away as was predicted by some AI leaders like Yamakun. So, LLMs can't do everything we
Yamakun. So, LLMs can't do everything we need done on their own. But why should they? Over decades, we built up a huge
they? Over decades, we built up a huge ecosystem of digital technology standards and libraries and frameworks and infrastructure that has allowed computers to already do much of what we need done and transform the way we live.
Why not allow LMS to leverage this pre-existing work? The idea is to
pre-existing work? The idea is to provide the LM with access to this functionality and by composing these tools in different ways, it can solve problems that are too complicated for it to solve on its own. In this way, all
the language model needs to do is determine what needs to be done, not how to actually do it. Let's see an example of how we can get Claude to reliably answer the question from before by giving it access to simple arithmetic
tools. These tools are just simple
tools. These tools are just simple wrappers for the underlying arithmetic operations of the programming language on which the LM is called. In this case Python. So, first let's create a couple
Python. So, first let's create a couple of tools. Here, I've defined a
of tools. Here, I've defined a subtraction function and a square rooting function. Then, we define a tool
rooting function. Then, we define a tool mapping that maps from the names of these functions to the actual functions themselves. We'll see later that this
themselves. We'll see later that this makes it much easier to call them. Now
we cast this information as a list of tools in a format that the LLM expects specifying the name of the tool, a description, and the input schema, which describes the input properties to the function. And finally, we include which
function. And finally, we include which inputs are required. And we'll repeat the same thing with the square root function. Now, we ask Claude the same
function. Now, we ask Claude the same question as above, this time passing in the tools mapping to the request, which tells Claude that it can use these tools when it's generating its response. Now
when we print out the response, Claude doesn't just return text. It returns
text and a request to use a tool. We can
see in the text that Claude now plans out how to solve this problem knowing it first needs to subtract. It then
delegates this task to our program providing us the name of the tool to call and the inputs to pass into the tool. So here we isolate the tool
tool. So here we isolate the tool request and then we formulate the reply.
So in the reply we include that the role is user and in the content we say that we're including a tool result and we specify the tool use ID by accessing the ID attribute of the given tool request.
This allows Claude to connect the tool request and result, which is important when there are several tool calls. And
finally, we include the actual result of calling this tool with these inputs. So
we first use the tools name to pull out the relevant function from the tool mapping. And then we pass the inputs in
mapping. And then we pass the inputs in as keyword arguments to the function using this starst star syntax. And
finally, we convert the result to a string. So, we can now add Claude's
string. So, we can now add Claude's previous message to the chat history and then add the appropriately formatted tool reply. Besides these two additions
tool reply. Besides these two additions we will call Claude again in the exact same way. So this request is the exact
same way. So this request is the exact same as above except for these two lines. Again, we print the response and
lines. Again, we print the response and we see that Claude knows it needs to use another tool. This time the square root
another tool. This time the square root tool. So we repeat the above tool
tool. So we repeat the above tool calling and reply formulation process once more and finally Claude returns the correct answer to the question. Claude
is now able to find the correct answer reliably with the tools provided to it.
If we ask the same question, we'll get the same response. Note that Claude and indeed any LM still can make mistakes but in practice, it really doesn't happen with this sort of tool use. So
we begin to see the power of AI agents but we want agents that can do more than just math. We want an agent that can
just math. We want an agent that can draft a document for us, share it with colleagues, and then email them asking for their review. How do we get there from an agent that can just perform arithmetic reliably? The answer is
arithmetic reliably? The answer is APIs. We rely on a huge digital
APIs. We rely on a huge digital infrastructure every day for things like communication, shopping, and banking.
For AI agents to be truly useful in the general sense, we need to give them access to this digital infrastructure.
For example, Google has servers and on those servers runs Google Docs. Google
exposes this service to you, allowing you to log into the web UI to create documents, share them, etc. It's a whole suite of functionality that they maintain and manage, but it's not the only way to interact with Google Docs.
Google has chosen to expose its Doc service through an API, not just a web UI. This API allows developers to
UI. This API allows developers to interact with Google Docs programmatically, which means that they can build programs around workflows involving Google Docs. What if we give LLM access to these APIs? Then the LLM
can understand the problem and then use tools both for local processing and for interacting with remote external services to accomplish the goal. This is
the promise of AI agents, autonomous or semi-autonomous agents that can flexibly compose tools together in a semantically driven way. Let's take a look at how
driven way. Let's take a look at how this works. I'll ask Claude to write a
this works. I'll ask Claude to write a spooky story and then upload it to Google Drive for us. As before, we import the packages we need and then get our environment set up. And then we define our prompt. In this case
generate a spooky story that is one paragraph long and then upload it to Google Docs. Again, we create our tool
Google Docs. Again, we create our tool mapping. In this case, there's only one
mapping. In this case, there's only one tool, the create document function that I've imported from this doc tools file.
And then again, we create the tools list in the way that Claude expects. So we
include the name of this create document function, a description of it, and then the properties that it accepts. As
before, we submit our request to Claude and pass in the tools that it can use.
And then Claude responds with a textual block and a tool request for the create document function. As in the example in
document function. As in the example in the previous section, we execute the tool with Claude's inputs and then return the results to Claude to continue the conversation. So here we pull out
the conversation. So here we pull out the create doctool request and then we formulate this reply again pulling out the ID of the request and then using the tool mapping to map the request tools
name to a function and then passing in the inputs that were provided by Claude with this keyword argument syntax. We
make another request this time adding on Claude's response as well as the tool result and then again we print the result and finally the LLM lets us know that it was successful in its task. If I
go check my Google Drive I can see the document indeed has been created. The
only fundamental difference between this example and the one in the previous section is that we've changed our tool set. The heavy lifting is done by the
set. The heavy lifting is done by the create document function that we imported from the doc tools file. It
uses Google's SDK to interact with the Google Docs API in order to upload the file. Google Workspace APIs use OOTH 2
file. Google Workspace APIs use OOTH 2 for authentication. So, you'll need to
for authentication. So, you'll need to set up a project in the Google Cloud Console if you want to do this yourself.
The details are beyond the scope of this video, but let me know in the comments if you want to see a video about that.
The ability for AI agents to leverage tools in this way is amazing. But there
are a few problems here. First, it's a lot of work. Every time we want to use a service in this way, we have to create a tool. In this case, the tool is
tool. In this case, the tool is essentially a wrapper for an underlying API. But this won't always be the case.
API. But this won't always be the case.
Additionally, even if the tool is a simple wrapper, it adds another point of maintenance and scope and just a greater overall code surface area and places for software entropy to creep in. Second
there are a lot of security concerns here. After all, an AI agent may have
here. After all, an AI agent may have access to your bank accounts, personal information, and health records.
Everyone building his own way for AI agents to access this information will inevitably lead to sensitive information being sent to the wrong place, the leaking of credentials, or something of the sort. And third, we have to make
the sort. And third, we have to make sure we're constantly up to date with service providers. Every time they make
service providers. Every time they make a change on their end, we have to make a change to the corresponding tool in our application. This is a big problem
application. This is a big problem considering how many tools an AI agent will need to have access to. What if
there were a way around these problems?
This is the goal of MCP. There are a lot of protocols in technology like HTTP we mentioned earlier. Its secure successor
mentioned earlier. Its secure successor HTTPS is the foundation for communication on the internet today. It
establishes clear rules for how servers are to interact with for example browsers. Without this critical
browsers. Without this critical protocol, the web as we know it couldn't function because browsers and servers couldn't interact because they don't share a common language. The model
context protocol is the HTTPS of AI agents. And similarly, it adopts a
agents. And similarly, it adopts a client server architecture. A service
will expose tools and other capabilities via an MCP server that AI agents via MCP clients can't interact with. MCP servers
don't just expose APIs like we saw in the last section. They expose actual tools. This way we don't need to build
tools. This way we don't need to build the translation layer which was the doc_tools file in the last section. Now
an important note here is that to use MCP servers, you often have to run them locally. Right now a local client makes
locally. Right now a local client makes requests to a local server which then makes requests to external services.
Cloudflare has a good write up on this and the current state of moving this functionality into the cloud if you're interested. So now let's see how to
interested. So now let's see how to build an MCP server that accomplishes the same goal in the last section. The
MCP server itself is actually quite simple. Here it is in its entirety. We
simple. Here it is in its entirety. We
import the relevant packages and then create an MCP server with the fast MCP function. We then define a helper method
function. We then define a helper method to format responses and then we define our tool. This tool is just an
our tool. This tool is just an asynchronous wrapper for the create document function. And the mcp.tool
document function. And the mcp.tool
decorator is the key piece here that registers this function as a tool for the MCP server. To run the server, we just use the mcp.run function. And here
we set the transport to standard input output. The input types required for the
output. The input types required for the function will automatically be inferred based on the function signature. So we
don't need to do any of this work ourselves like building the tool list we saw in previous examples. It's that easy to build the MCP server. Note that in practice, we won't be the ones actually building it, but more on that later. Now
we need to connect the server with a client. You can connect applications
client. You can connect applications like the cloud desktop app to MCP servers via configuration files. This
will make it so that the application spins up corresponding MCP clients for each of the MCP servers in this file on startup. But let's see how you do this
startup. But let's see how you do this in Python. Now the great thing here is
in Python. Now the great thing here is there's not much work to do. Since MCP
is a standard, the way that clients and servers interact is standardized. That
means that we can just pull the client example directly from the MCP docs and use that. In the same way that you don't
use that. In the same way that you don't need to do a lot of work to get a resource from a web server, you just use a get request. You don't have to do a lot of work here to connect a client to a server. So, let's go grab this code
a server. So, let's go grab this code and run it. So, when we run the client we pass in the path to the MCP server file. When it starts up, we can see that
file. When it starts up, we can see that it connected to the server and then it lists out the tools that are available on that server. We then get a spot to input our queries. We give it a command similar to the last section, but this
time, let's ask for a specific title for the piece to have. The agent tells us that it successfully completed this task. And if I go to my Google Docs, I
task. And if I go to my Google Docs, I can see that this short story was indeed uploaded. So why does MCP matter? The
uploaded. So why does MCP matter? The
aspect that's gotten a lot of attention and indeed we talked about a lot in this video, is that it standardizes the way that AI agents are going to interact with external services. Of course, this is important because it allows us to build systems that are interoperable and
composable, as we've seen. But there's
actually another aspect to this that's also important. In particular, MCP
also important. In particular, MCP shifts the onus of making AI agents interact with services to the service providers. When we built the MCP server
providers. When we built the MCP server above in our example, it may have seemed kind of pointless. I mean, we just added another file and basically passed through the doc tools file. But it would be Google here that runs this MCP
server. All we have to do is run it
server. All we have to do is run it locally or in the future connect to it remotely. Let's consider an analogy to
remotely. Let's consider an analogy to see why this matters. We've spoken about how many service providers offer an API to interact with their services.
Development teams that want to use these APIs will come up with their own ways to do this in their applications. Here we
see an example of how two teams might each create a function to call assembly AI's API to create a transcript. In the
first case, we have this transcribe file function and in the second case, we have this post transcript function. Now, this
certainly works, but let's see what happens when we at Assembly AI need to update our API, for example, by changing an endpoint. This change cascades into a
an endpoint. This change cascades into a multiplicative maintenance burden. Every
application that makes requests to this endpoint needs to be updated or it will stop functioning properly. To make
matters worse, if a development team doesn't adhere to good software engineering principles by, for example not keeping their code dry, the problem will be even worse. Now, the team's code has to be updated in many locations. Of
course, we hope development teams design software in such a way that this is avoided, but if they don't, it could be a huge change across the entire codebase. To circumvent this issue, some
codebase. To circumvent this issue, some service providers offer SDKs, which provide a port of the functionality into the environment in which the developer is working. That way, the developer can
is working. That way, the developer can just use the SDK and not worry about the underlying API. In other words, the SDK
underlying API. In other words, the SDK provides a layer of abstraction that keeps the applications code bases decoupled from the API's implementation details. Decoupling, both in this way
details. Decoupling, both in this way and others, is a common practice in software engineering, and it allows developers to focus on building their applications rather than worrying about the details of how to interact with a
service. This is exactly what MCP does
service. This is exactly what MCP does for AI agents. For providers that offer an MCP server, developers can hook AI agents into the corresponding service using a unified language. This allows AI
agents to immediately gain access to functionality in a robust way that's resilient to change. Additionally, this
means different development teams don't have to each spend time developing custom tools that are all going to accomplish effectively the same task.
Rather than having to build the bridge to a service and then do something with the results, i.e. implement their
business logic, they can just focus on the second half. The separation of concerns is especially important to AI agents who derive so much of their power from flexibly composing these tools in a semantically driven way. In other words
the power of AI agents comes disproportionately from interacting with external services which MCP abstracts away. We'll have more MCP and AI agent
away. We'll have more MCP and AI agent content in the coming weeks. So, make
sure to subscribe if you want to see more. In the meantime, feel free to
more. In the meantime, feel free to check out this video on building an AI agent for real-time speech to text using LiveKit. Begin speaking and you'll see
LiveKit. Begin speaking and you'll see your speech transcribed in real time.
After you complete a sentence, it will be punctuated and formatted and then a new line will be started for the next sentence in the chat box on the playground.
Loading video analysis...