Model Context Protocol (MCP) explained (with code examples)

By AssemblyAI

Summary

## Key takeaways - **MCP Standardizes AI Agent Interfaces**: MCP eliminates the need to build custom bridges for each service like local storage, databases, or APIs by standardizing the interface between AI agents and the tools and resources they need to be useful. [01:16], [01:30] - **LLMs Fail on Precision Math**: Claude incorrectly calculates subtracting 201,122 from 316,043 then taking the square root, giving 339 instead of the correct 339, and repeats different wrong answers due to its stochastic nature. [04:07], [04:14] - **Tools Make LLMs Reliable**: By providing Claude with subtraction and square root tools as wrappers for Python functions, it reliably plans steps, calls tools, and computes the correct answer of 339. [06:34], [08:02] - **Custom Tools Cause Maintenance Hell**: Building wrappers for APIs like Google Docs adds maintenance burden, security risks from credential leaks, and constant updates when services change. [11:14], [12:07] - **MCP Shifts Burden to Providers**: MCP requires service providers like Google to run MCP servers exposing tools, allowing developers to connect AI agents via a unified standard without building custom bridges. [15:11], [15:40] - **OpenAI Adoption Boosts MCP**: Stars on the MCP servers repository show steady climb in interest, accelerated by OpenAI's recent adoption, proving standards are useful only if people use them. [01:48], [02:06]

Topics Covered

LLMs Fail Complex Precision Tasks
Tools Make LLMs Reliable
Custom Tools Create Maintenance Hell
MCP Standardizes Agent Interfaces
MCP Shifts Burden to Providers

Full Transcript

The model context protocol. What is it?

How does it work? And why does it matter? You've likely been seeing a lot

matter? You've likely been seeing a lot about the model context protocol recently. It's a relatively new protocol

recently. It's a relatively new protocol invented by Enthropic, and it seeks to standardize the way in which AI agents interact with resources like local storage, databases, or APIs. In this

video, I'll teach you everything you need to know about the model context protocol through a series of illustrated examples. First, you'll learn about the

examples. First, you'll learn about the stochastic limitations of LMS and their unreliability in certain scenarios. And

then we'll look at how tools were invented to overcome this issue with an example of having an LLM create and upload a document to my Google Docs.

Then we'll look at the problem MCP solves and we'll reframe this problem in terms of building an MCP server. And

finally, we'll end with a discussion of why MCP matters from a bird's eye view and what it really means for the future of AI agents. All right, let's jump in.

AI agents are currently able to interact with a variety of systems including local storage databases, remote tools like Assembly AI's API and other remote resources. Currently, this means that

resources. Currently, this means that for each of these services, we have to build a custom bridge so that the AI agent can interact with each of these resources. And that's represented by the

resources. And that's represented by the different shapes in this image. This

approach means that every time we want to add a capability to our LLM to perform some task reliably or interact with an external resource, we have to build such an adapter. MCP eliminates

this issue by standardizing the interface between AI agents and the tools and resources they need to be useful. We want AI agents that can

useful. We want AI agents that can perform a wide range of tasks for us.

But it's become clear in recent years that these are not going to be monolithic models. Instead, they'll be a

monolithic models. Instead, they'll be a network of interconnected models resources, and tools. So standardizing

the interface between these components means that developers and researchers can focus on building the actual pieces of these networks rather than focus on the nitty-gritty details of the wiring of these networks. But beyond the

standard itself, MCP is important because people are actually adopting it.

Standards are only useful if people are actually using them. And we can see a steady climb in the interest towards MCP when we look at stars on the servers repository. And OpenAI's recent adoption

repository. And OpenAI's recent adoption of MCP has only served to accelerate this growth. To understand the value of

this growth. To understand the value of MCP, let's take a step back and see how we got here. LM have been in the public eye for only 2 years, and they've already transformed the way in which we work. For a thorough history of LM and

work. For a thorough history of LM and how they developed, you can see our relevant piece on that. But I'll recap the important points here which are relevant to a discussion for MCP. LLMs

are fundamentally probabilistic systems. They're trained on a huge corpus of text and through this they learn patterns in natural language. This means that they

natural language. This means that they can generate coherent text that's relevant to the input they receive. For

example, we can use a language model to predict the next word in a sentence. A

language model represents a probability distribution over word sequences. So

given an initial word sequence like I went to the we can predict the most likely next word and then append that to the sentence and recursively keep doing this to generate text. Don't worry if you don't understand the notation just

focus on the concept. And in reality LM inference is actually more complicated than this but it's beyond the scope of this video. So check out some of our

this video. So check out some of our other resources if you want to learn about that. So this makes LMS good for

about that. So this makes LMS good for creative writing, drafting emails really any task in which the goal is to just create relevant coherent text.

However, people quickly learn that we can get LMS to do other things simply by phrasing the question in terms of natural language. For example, if we put

natural language. For example, if we put the word sequence, translate this from Spanish to English, elgato rojo, into a large language model, we can get the LM to predict the most likely next word.

And it turns out it's actually pretty good at giving the right answer. These

abilities, the so-called emergent abilities of LMS were an unexpected surprise. Very quickly, people began

surprise. Very quickly, people began using LMS for a variety of tasks benchmarking the performance and debating the ways in which we can measure this sort of general competence.

However, it quickly became apparent that performance varied drastically across tasks. For tasks that require high

tasks. For tasks that require high precision and reliable data processing LMS were not a valid choice. Even for

tasks that did not require high precision, but required many steps, LMS were still not a good choice. You can

see in this example, I've asked Claude to subtract 201,122 from 316,043 and then calculate the square root of the result. I define the prompt

here and then pass it into Claude and then print out the response. The correct

answer to this problem is 339. Here's

Claude's response. The answer is incorrect. And worse yet, if you repeat

incorrect. And worse yet, if you repeat the question, you can get a different answer every time. This is completely unsurprising given the stochastic nature of LMS, but frustrating to researchers and developers who thought that smart

enough LLMs could be a panacea to a wide range of problems. LLMs may be good at some targeted tasks and trivial tasks but their performance breaks down when the problem reaches a sufficient complexity. And this unreliability

complexity. And this unreliability precludes them from being deployed to serious applications. After all, we use

serious applications. After all, we use computers for their precision and reliability. And LMS are neither of

reliability. And LMS are neither of these, at least not for problems that are of a complexity that's important to humans. At first, researchers tried to

humans. At first, researchers tried to brute force the problem by simply continuing the process which made LLM useful in the first place. Scaling LLMs

are generally performant because they're well large. They have billions of

well large. They have billions of parameters and they're trained on massive data sets. It's precisely the scaling process from small language models to large language models in which LLMs gain many useful skills. So, it

seemed reasonable that scaling the models even bigger would yield even better results. However, it soon became

better results. However, it soon became clear that this scaling process could not continue and was not the way forward to more useful and competent general AI.

The inherent probabilistic nature of LM is a fundamental limitation and not something that can be scaled away as was predicted by some AI leaders like Yamakun. So, LLMs can't do everything we

Yamakun. So, LLMs can't do everything we need done on their own. But why should they? Over decades, we built up a huge

they? Over decades, we built up a huge ecosystem of digital technology standards and libraries and frameworks and infrastructure that has allowed computers to already do much of what we need done and transform the way we live.

Why not allow LMS to leverage this pre-existing work? The idea is to

pre-existing work? The idea is to provide the LM with access to this functionality and by composing these tools in different ways, it can solve problems that are too complicated for it to solve on its own. In this way, all

the language model needs to do is determine what needs to be done, not how to actually do it. Let's see an example of how we can get Claude to reliably answer the question from before by giving it access to simple arithmetic

tools. These tools are just simple

tools. These tools are just simple wrappers for the underlying arithmetic operations of the programming language on which the LM is called. In this case Python. So, first let's create a couple

Python. So, first let's create a couple of tools. Here, I've defined a

of tools. Here, I've defined a subtraction function and a square rooting function. Then, we define a tool

rooting function. Then, we define a tool mapping that maps from the names of these functions to the actual functions themselves. We'll see later that this

themselves. We'll see later that this makes it much easier to call them. Now

we cast this information as a list of tools in a format that the LLM expects specifying the name of the tool, a description, and the input schema, which describes the input properties to the function. And finally, we include which

function. And finally, we include which inputs are required. And we'll repeat the same thing with the square root function. Now, we ask Claude the same

function. Now, we ask Claude the same question as above, this time passing in the tools mapping to the request, which tells Claude that it can use these tools when it's generating its response. Now

when we print out the response, Claude doesn't just return text. It returns

text and a request to use a tool. We can

see in the text that Claude now plans out how to solve this problem knowing it first needs to subtract. It then

delegates this task to our program providing us the name of the tool to call and the inputs to pass into the tool. So here we isolate the tool

tool. So here we isolate the tool request and then we formulate the reply.

So in the reply we include that the role is user and in the content we say that we're including a tool result and we specify the tool use ID by accessing the ID attribute of the given tool request.

This allows Claude to connect the tool request and result, which is important when there are several tool calls. And

finally, we include the actual result of calling this tool with these inputs. So

we first use the tools name to pull out the relevant function from the tool mapping. And then we pass the inputs in

mapping. And then we pass the inputs in as keyword arguments to the function using this starst star syntax. And

finally, we convert the result to a string. So, we can now add Claude's

string. So, we can now add Claude's previous message to the chat history and then add the appropriately formatted tool reply. Besides these two additions

tool reply. Besides these two additions we will call Claude again in the exact same way. So this request is the exact

same way. So this request is the exact same as above except for these two lines. Again, we print the response and

lines. Again, we print the response and we see that Claude knows it needs to use another tool. This time the square root

another tool. This time the square root tool. So we repeat the above tool

tool. So we repeat the above tool calling and reply formulation process once more and finally Claude returns the correct answer to the question. Claude

is now able to find the correct answer reliably with the tools provided to it.

If we ask the same question, we'll get the same response. Note that Claude and indeed any LM still can make mistakes but in practice, it really doesn't happen with this sort of tool use. So

we begin to see the power of AI agents but we want agents that can do more than just math. We want an agent that can

just math. We want an agent that can draft a document for us, share it with colleagues, and then email them asking for their review. How do we get there from an agent that can just perform arithmetic reliably? The answer is

arithmetic reliably? The answer is APIs. We rely on a huge digital

APIs. We rely on a huge digital infrastructure every day for things like communication, shopping, and banking.

For AI agents to be truly useful in the general sense, we need to give them access to this digital infrastructure.

For example, Google has servers and on those servers runs Google Docs. Google

exposes this service to you, allowing you to log into the web UI to create documents, share them, etc. It's a whole suite of functionality that they maintain and manage, but it's not the only way to interact with Google Docs.

Google has chosen to expose its Doc service through an API, not just a web UI. This API allows developers to

UI. This API allows developers to interact with Google Docs programmatically, which means that they can build programs around workflows involving Google Docs. What if we give LLM access to these APIs? Then the LLM

can understand the problem and then use tools both for local processing and for interacting with remote external services to accomplish the goal. This is

the promise of AI agents, autonomous or semi-autonomous agents that can flexibly compose tools together in a semantically driven way. Let's take a look at how

driven way. Let's take a look at how this works. I'll ask Claude to write a

this works. I'll ask Claude to write a spooky story and then upload it to Google Drive for us. As before, we import the packages we need and then get our environment set up. And then we define our prompt. In this case

generate a spooky story that is one paragraph long and then upload it to Google Docs. Again, we create our tool

Google Docs. Again, we create our tool mapping. In this case, there's only one

mapping. In this case, there's only one tool, the create document function that I've imported from this doc tools file.

And then again, we create the tools list in the way that Claude expects. So we

include the name of this create document function, a description of it, and then the properties that it accepts. As

before, we submit our request to Claude and pass in the tools that it can use.

And then Claude responds with a textual block and a tool request for the create document function. As in the example in

document function. As in the example in the previous section, we execute the tool with Claude's inputs and then return the results to Claude to continue the conversation. So here we pull out

the conversation. So here we pull out the create doctool request and then we formulate this reply again pulling out the ID of the request and then using the tool mapping to map the request tools

name to a function and then passing in the inputs that were provided by Claude with this keyword argument syntax. We

make another request this time adding on Claude's response as well as the tool result and then again we print the result and finally the LLM lets us know that it was successful in its task. If I

go check my Google Drive I can see the document indeed has been created. The

only fundamental difference between this example and the one in the previous section is that we've changed our tool set. The heavy lifting is done by the

set. The heavy lifting is done by the create document function that we imported from the doc tools file. It

uses Google's SDK to interact with the Google Docs API in order to upload the file. Google Workspace APIs use OOTH 2

file. Google Workspace APIs use OOTH 2 for authentication. So, you'll need to

for authentication. So, you'll need to set up a project in the Google Cloud Console if you want to do this yourself.

The details are beyond the scope of this video, but let me know in the comments if you want to see a video about that.

The ability for AI agents to leverage tools in this way is amazing. But there

are a few problems here. First, it's a lot of work. Every time we want to use a service in this way, we have to create a tool. In this case, the tool is

tool. In this case, the tool is essentially a wrapper for an underlying API. But this won't always be the case.

API. But this won't always be the case.

Additionally, even if the tool is a simple wrapper, it adds another point of maintenance and scope and just a greater overall code surface area and places for software entropy to creep in. Second

there are a lot of security concerns here. After all, an AI agent may have

here. After all, an AI agent may have access to your bank accounts, personal information, and health records.

Everyone building his own way for AI agents to access this information will inevitably lead to sensitive information being sent to the wrong place, the leaking of credentials, or something of the sort. And third, we have to make

the sort. And third, we have to make sure we're constantly up to date with service providers. Every time they make

service providers. Every time they make a change on their end, we have to make a change to the corresponding tool in our application. This is a big problem

application. This is a big problem considering how many tools an AI agent will need to have access to. What if

there were a way around these problems?

This is the goal of MCP. There are a lot of protocols in technology like HTTP we mentioned earlier. Its secure successor

mentioned earlier. Its secure successor HTTPS is the foundation for communication on the internet today. It

establishes clear rules for how servers are to interact with for example browsers. Without this critical

browsers. Without this critical protocol, the web as we know it couldn't function because browsers and servers couldn't interact because they don't share a common language. The model

context protocol is the HTTPS of AI agents. And similarly, it adopts a

agents. And similarly, it adopts a client server architecture. A service

will expose tools and other capabilities via an MCP server that AI agents via MCP clients can't interact with. MCP servers

don't just expose APIs like we saw in the last section. They expose actual tools. This way we don't need to build

tools. This way we don't need to build the translation layer which was the doc_tools file in the last section. Now

an important note here is that to use MCP servers, you often have to run them locally. Right now a local client makes

locally. Right now a local client makes requests to a local server which then makes requests to external services.

Cloudflare has a good write up on this and the current state of moving this functionality into the cloud if you're interested. So now let's see how to

interested. So now let's see how to build an MCP server that accomplishes the same goal in the last section. The

MCP server itself is actually quite simple. Here it is in its entirety. We

simple. Here it is in its entirety. We

import the relevant packages and then create an MCP server with the fast MCP function. We then define a helper method

function. We then define a helper method to format responses and then we define our tool. This tool is just an

our tool. This tool is just an asynchronous wrapper for the create document function. And the mcp.tool

document function. And the mcp.tool

decorator is the key piece here that registers this function as a tool for the MCP server. To run the server, we just use the mcp.run function. And here

we set the transport to standard input output. The input types required for the

output. The input types required for the function will automatically be inferred based on the function signature. So we

don't need to do any of this work ourselves like building the tool list we saw in previous examples. It's that easy to build the MCP server. Note that in practice, we won't be the ones actually building it, but more on that later. Now

we need to connect the server with a client. You can connect applications

client. You can connect applications like the cloud desktop app to MCP servers via configuration files. This

will make it so that the application spins up corresponding MCP clients for each of the MCP servers in this file on startup. But let's see how you do this

startup. But let's see how you do this in Python. Now the great thing here is

in Python. Now the great thing here is there's not much work to do. Since MCP

is a standard, the way that clients and servers interact is standardized. That

means that we can just pull the client example directly from the MCP docs and use that. In the same way that you don't

use that. In the same way that you don't need to do a lot of work to get a resource from a web server, you just use a get request. You don't have to do a lot of work here to connect a client to a server. So, let's go grab this code

a server. So, let's go grab this code and run it. So, when we run the client we pass in the path to the MCP server file. When it starts up, we can see that

file. When it starts up, we can see that it connected to the server and then it lists out the tools that are available on that server. We then get a spot to input our queries. We give it a command similar to the last section, but this

time, let's ask for a specific title for the piece to have. The agent tells us that it successfully completed this task. And if I go to my Google Docs, I

task. And if I go to my Google Docs, I can see that this short story was indeed uploaded. So why does MCP matter? The

uploaded. So why does MCP matter? The

aspect that's gotten a lot of attention and indeed we talked about a lot in this video, is that it standardizes the way that AI agents are going to interact with external services. Of course, this is important because it allows us to build systems that are interoperable and

composable, as we've seen. But there's

actually another aspect to this that's also important. In particular, MCP

also important. In particular, MCP shifts the onus of making AI agents interact with services to the service providers. When we built the MCP server

providers. When we built the MCP server above in our example, it may have seemed kind of pointless. I mean, we just added another file and basically passed through the doc tools file. But it would be Google here that runs this MCP

server. All we have to do is run it

server. All we have to do is run it locally or in the future connect to it remotely. Let's consider an analogy to

remotely. Let's consider an analogy to see why this matters. We've spoken about how many service providers offer an API to interact with their services.

Development teams that want to use these APIs will come up with their own ways to do this in their applications. Here we

see an example of how two teams might each create a function to call assembly AI's API to create a transcript. In the

first case, we have this transcribe file function and in the second case, we have this post transcript function. Now, this

certainly works, but let's see what happens when we at Assembly AI need to update our API, for example, by changing an endpoint. This change cascades into a

an endpoint. This change cascades into a multiplicative maintenance burden. Every

application that makes requests to this endpoint needs to be updated or it will stop functioning properly. To make

matters worse, if a development team doesn't adhere to good software engineering principles by, for example not keeping their code dry, the problem will be even worse. Now, the team's code has to be updated in many locations. Of

course, we hope development teams design software in such a way that this is avoided, but if they don't, it could be a huge change across the entire codebase. To circumvent this issue, some

codebase. To circumvent this issue, some service providers offer SDKs, which provide a port of the functionality into the environment in which the developer is working. That way, the developer can

is working. That way, the developer can just use the SDK and not worry about the underlying API. In other words, the SDK

underlying API. In other words, the SDK provides a layer of abstraction that keeps the applications code bases decoupled from the API's implementation details. Decoupling, both in this way

details. Decoupling, both in this way and others, is a common practice in software engineering, and it allows developers to focus on building their applications rather than worrying about the details of how to interact with a

service. This is exactly what MCP does

service. This is exactly what MCP does for AI agents. For providers that offer an MCP server, developers can hook AI agents into the corresponding service using a unified language. This allows AI

agents to immediately gain access to functionality in a robust way that's resilient to change. Additionally, this

means different development teams don't have to each spend time developing custom tools that are all going to accomplish effectively the same task.

Rather than having to build the bridge to a service and then do something with the results, i.e. implement their

business logic, they can just focus on the second half. The separation of concerns is especially important to AI agents who derive so much of their power from flexibly composing these tools in a semantically driven way. In other words

the power of AI agents comes disproportionately from interacting with external services which MCP abstracts away. We'll have more MCP and AI agent

away. We'll have more MCP and AI agent content in the coming weeks. So, make

sure to subscribe if you want to see more. In the meantime, feel free to

more. In the meantime, feel free to check out this video on building an AI agent for real-time speech to text using LiveKit. Begin speaking and you'll see

LiveKit. Begin speaking and you'll see your speech transcribed in real time.

After you complete a sentence, it will be punctuated and formatted and then a new line will be started for the next sentence in the chat box on the playground.

Loading...

Loading video analysis...