Whitepaper Companion Podcast - Agent Tools & Interoperability with MCP

By Kaggle

Summary

## Key takeaways - **Foundation models need tools to act.**: Powerful foundation models are like brilliant pattern-matching machines stuck in their training data, lacking the ability to perceive or act in the real world. Tools provide the necessary senses and actuators, transforming them into active agents. [00:14], [00:41] - **MCP standardizes tool integration.**: The Model Context Protocol (MCP) is an open standard designed to streamline the integration of tools with AI models, moving away from a fragmented mess of custom code towards a unified, plug-and-play system. [01:34], [01:43] - **Three types of AI agent tools.**: Tools for AI agents fall into three categories: function tools defined by developers, built-in tools provided by model services, and agent tools which invoke other separate agents hierarchically. [02:50], [03:57] - **Documentation is paramount for tool success.**: Clear, descriptive documentation for tools, including names, descriptions, and parameters, is essential for the model to understand how to use them. Focus on describing the task, not the implementation details. [05:38], [06:08] - **Design tools for concise output.**: Returning raw, large data from tools can cause context window bloat, increase costs, and degrade the LLM's reasoning. Instead, return concise summaries, confirmations, or external references (URIs) to the data. [07:31], [07:53] - **MCP enables modularity but needs external security.**: MCP promotes modularity and accelerates development by standardizing communication, but it lacks built-in enterprise security. Enterprises must wrap MCP in external security layers like API gateways for authentication and authorization. [13:54], [18:11]

Topics Covered

Foundation models need tools to act.
MCP standardizes tool integration for agents.
Agent tools enable hierarchical delegation.
Clear documentation is paramount for agent tools.
Avoid context bloat: return summaries, not raw data.

Full Transcript

Okay, so today we're diving into

uh what feels like the core question for

anyone working with AI right now. How do

you get these incredibly smart language

models to actually do something useful

in the real world?

>> Right? Because a powerful foundation

model,

>> I mean, it's brilliant, but it's

fundamentally stuck inside its training

data, you know?

>> Yeah. It's like this amazing pattern

matching machine, but completely

isolated from anything current or

anything it can act on.

>> Exactly. It's like having memorized

every book in the library but having no

hands or eyes. You can write code, sure,

or generate amazing text, but it can't

like call an API on its own or check a

real-time stock price or even send an

email.

>> And that ability to perceive and act,

that's what makes Agentic AI potentially

revolutionary, especially, I think, when

we talk about using it inside businesses

in the enterprise.

>> That's the key transformation. And the

map for how we get there, how models go

from just thinking to actually doing is

what we're digging into today.

>> Yes, exactly. Our source material is the

day two white paper from that 5day of AI

agents intensive course by Google and

Kaggle. It really lays out the

architecture for making models active

agents.

>> And this is where the idea of tools

becomes absolutely central. Tools are

basically the agent senses and

actuators. It's eyes and hands like you

said.

>> Okay. But you know, historically getting

these tools hooked up was a massive

headache. If you had, say, n different

models and m different tools or APIs.

>> Oh, I can see where this is going.

>> Yeah. You ended up with this exponential

n* m problem. Just this fragmented mess

of custom code, bespoke connectors for

every single pair. It just didn't scale.

>> Sounds like a nightmare. Honestly,

>> it really was. And the solution that the

industry seems to be coalescing around

now is the model context protocol or

MCP. came about in 2024.

>> MCCP.

>> Yeah, MCP. It's designed as an open

standard to sort of streamline all this

integration. The goal is a unified

almost plug-andplay way to connect tools

which fundamentally decouples the agent,

the brain, the reasoner from the

specific tool doing the work, the actor.

>> Okay, that sounds promising. Let's

unpack that starting right at the

beginning.

>> Yeah.

>> When we say tool in this AI agent

context, what exactly are we talking

about?

So fundamentally a tool is just a

function or maybe a program that an LLM

based application uses to do something

that the model itself can't do natively,

>> right?

>> And crucially, these tasks fall into two

main buckets. Either letting the model

know something new like fetching that

real-time weather forecast using an API

>> or letting the model do something like

actually sending a message, updating a

database, making something happen in the

outside world.

>> Got it. know something or do something.

And the developer community based on the

white paper seems to break these down

into what? Three main types.

>> That's right. Three main categories.

First up, you got function tools. Okay.

>> These are the ones developers explicitly

define themselves. They're external

functions, maybe Python code. And you

often use things like detailed dog

strings, especially in frameworks like

Google's ADK. These dock strings are

critical. They define the contract, the

inputs and outputs between the model and

the function.

>> Right? Like you might define a function

say set light values for a smart home

and the dock string specifies you need

brightness and color as inputs.

>> Exactly that the model reads that

documentation to figure out precisely

how to call your function correctly.

It's all about that contract.

>> Makes sense. What's the second type?

>> Second are built-in tools. Now these are

different because they're provided

implicitly by the model service

provider. So as a developer you don't

really see the tool definition. It just

works behind the scenes. a okay like um

Gemini's ability to use Google search

for grounding right or execute a bit of

code itself

>> perfect examples grounding with search

code execution even fetching content

from a URL you provide those are often

built-in capabilities of the platform

>> and the third type this one sounds

really interesting agent tools

>> yeah this is a powerful concept an agent

tool means you're actually invoking

another separate agent as if it were a

tool

>> okay

>> but the key distinction here is that

you're not completely handing off the

conversation or the main task. Your

primary agent stays in charge. It calls

the sub agent, gets a result back, and

then uses that result as like an input

or a resource for its own reasoning.

Often managed with something like an

agent tool class in an SDK.

>> So, it's more like delegation than a

full transfer.

>> Exactly. It's hierarchical. Think of it

like a manager asking a specialist team

for a report rather than just forwarding

the whole project to them.

>> Okay, I can see that. So if my main

agent needs to I don't know analyze some

data then email a summary it could call

a data analysis agent as a tool get the

summary and then use a separate send

email function tool itself

>> precisely you compose capabilities now

zooming out a bit the white paper also

talks about a broader taxonomy grouping

tools by what they do

>> right like information retrieval

>> yep getting data action execution which

is about making changes in the real

world like sending that email or booking

something

>> system API integration obvious Obviously

>> connecting to other software and

importantly human in the loop tools for

when the agent needs to stop and ask for

permission or clarification from a

person.

>> That structure is helpful but you know

having all these tool types is one thing

making them work reliably so the agent

doesn't constantly fail. That seems like

the real challenge. This brings us to

best practices which feels critical for

anyone actually building this stuff.

>> Absolutely critical. If you're

developing tools for agents, these

aren't just suggestions. They're pretty

much essential for success.

>> Okay, laid them on us. What's rule

number one?

>> Rule number one, non-negotiable.

Documentation is paramount. Seriously,

the only way the model knows what your

tool does, what it needs, and what it

returns is through the documentation you

provide. The name, the description, the

parameters. This info is literally fed

into the model's context.

>> So clarity is key.

>> Utterly key. Use clear, descriptive

names. The example in the paper is

great. Create a critical booer with

priority is infinitely better for the

model than something vague like update

to Jura. Be specific.

>> The name and description basically

become the instruction manual for the

LLM.

>> Exactly. Which leads to the next point.

Describe the action, not the

implementation detail. You need to tell

the model what task it should

accomplish. Create a bug to describe

this issue, not how to call your

specific function. Use the create bug

tool. Ah, okay. Why is that distinction

so important?

>> If you tell it how, you risk confusing

it. The model might just repeat your

instruction back or get stuck thinking

about the tool name instead of the goal.

You want the LLM to do the reasoning. I

need to create a bug and the tool should

just be the simple mechanism that

executes that decision. Let the LLM

reason, let the tool act,

>> right? Reinforcing that separation of

concerns. LLM is the brain, tool is the

hand.

>> Precisely. And related to that is

publish tasks, not just raw API calls.

It's tempting, especially with complex

enterprise APIs, to just create a thin

wrapper. That's usually a mistake

because those APIs can be huge, maybe

dozens of parameters.

>> A good tool should encapsulate a single

clear high-level task the agent needs to

perform, not expose the raw complexity

of some legacy system. Think book a

meeting room, not call the complex

calendar API endpoint with these 15

optional flags. Keep it task focused and

abstract away the underlying complexity.

That makes sense. And this focus

probably impacts the output too, right?

What about the data the tool sends back?

>> Huge point. Design for concise output.

This is vital. If your tool pulls down,

say a massive spreadsheet or a huge log

file and tries to stuff it all back into

the LLM's context window.

>> Bad news.

>> Very bad news. You get context window

bloat. It eats up tokens which cause

money. It increases latency. And

crucially, it actually degrades the

LLM's ability to reason because it's

waiting through tons of irrelevant data.

>> So, what's the alternative?

>> Don't return the raw data. Return a

concise summary or maybe just a

confirmation or better yet, return a

reference, like a URI pointing to where

the full data is stored externally.

Systems like Google ADK's artifact

service are built for exactly this. Let

the agent know where the data is. Don't

dump it on the agent.

>> Okay, keep the context clean. Makes

perfect sense. One last practice.

things will inevitably go wrong. How

should tools handle errors?

>> Errors happen. Yeah. First, use schema

validation on your inputs and outputs

rigorously. But when an error does occur

during execution, the error message

itself needs to be descriptive and

ideally instructive.

>> Not just error 500.

>> Definitely not. A good error message

tells the LLM what went wrong and maybe

even how to recover. Something like API

rate limit exceeded. Please wait 15

seconds before calling this tool again.

that gives the agent context and a clear

path forward. It turns a failure into

something potentially actionable.

>> Those are really practical guidelines.

Anyone defining agent functions can

probably use those right away. Okay,

let's shift gears now to the model

context protocol itself, MCP. This is

the standardization layer aiming to make

all this tool use scalable and

interoperable.

You mentioned it's inspired by the

language server protocol LSP. How does

that client server interaction work?

>> Right. It borrows that core

architectural idea. You basically have

three main pieces working together.

First, there's the MCP host.

>> Okay, that sounds like the main

application, maybe the thing the end

user interacts with.

>> Exactly. The host manages the overall

user experience, orchestrates the agents

thinking process, decides when tools are

needed, but critically it's also the

enforcer it's responsible for applying

any safety guard rails or policies. It's

the traffic cop.

>> Got it. Then there's the MCP client.

>> The client is sort of embedded within

the host. Think of it as the dedicated

communication module. Its job is to

maintain the connection to the server,

manage the session life cycle, and

actually send the commands to execute

tools based on what the host needs.

>> And finally, the MCP server.

>> That's the actual program providing the

tools or capabilities. The server's job

is to advertise what tools it offers,

listen for commands from the client,

execute those commands, and then send

the results back. So this separation is

key because

>> it allows developers building the agent

logic in the host to focus purely on

reasoning while other teams maybe third

parties can focus on building

specialized secure reliable tools in the

server. It promotes modularity

>> and how do these components talk to each

other? What's the language?

>> The communication layer uses JSON RPC

2.0. It's a wellestablished textbased

standard. Keeps things relatively simple

and interoperable.

>> Okay. standard message format. What

about getting those messages back and

forth? The transport layer.

>> MCP defines two main ways. For local

development or when the server can run

as a child process, you often use DDO

standard input output. It's super fast,

direct communication, very efficient if

everything's on the same machine,

>> right? But for most real world

distributed systems,

>> then you typically use streamable HTTP.

This is really designed for remote

connections. It supports server send

events s which means the server can

stream results back which is great for

tools that might take a while to run. It

allows for more flexible often stateless

server deployments across a network.

>> Now the white paper mentions MCP defines

several primitives like tools,

resources, prompts but it really

emphasizes that tools are the main event

right like almost universal adoption

compared to the others.

>> Absolutely. Tools are the core value

proposition of MCP today where most of

the implementation effort has focused.

The other primitives exist in the spec,

but tool definition and execution is the

killer feature driving adoption.

>> So let's zero in on that tool definition

within MCP. How does the protocol

enforce rigor there?

>> It uses a standardized JSON schema. A

tool definition must have fields like

name and description. We talked about

how critical those are. And importantly,

it requires an input schema defining

what the tool expects. and optionally an

output schema defining what it will

return.

>> The paper had that get stock price

example I remember.

>> Yeah, classic example. It clearly shows

you need to define the expected input

like the stock symbol, maybe an optional

date and the structure of the output,

the price the data was fetched using

these schemas. This ensures the host

knows exactly how to call the tool and

what kind of result to anticipate. No

guesswork.

>> Okay, so the schema provides the

contract. When the tool runs

successfully, how does the result come

back?

>> Results generally come back in one of

two forms. They can be structured, which

means a JSON object that strictly

conforms to that output schema you

defined. This is preferred because it's

easy for the host in the LLM to parse

and reason about reliably.

>> Then the other form,

>> unstructured. This is for things that

don't fit neatly into JSON like raw

text, maybe an audio file, an image, or

importantly those references the URI is

pointing to external resources we

discussed earlier to avoid context

bloat.

>> Got it. And we touched on errors before

in best practices. But how does MCP

formally signal when a tool execution

fails?

>> Yeah, there are two levels. You have

standard JSON RPC protocol errors like

if the host tried to call a method that

the server doesn't actually offer or

sent badly formed parameters. That's a

protocol level failure.

>> Okay,

>> but then you have errors that happen

during the tools execution.

Maybe the external API it relies on is

down or it couldn't find the requested

data. In this case, the server sends

back a result object but it sets a

specific flag as error.

>> True. Ah, so the call technically

completed but the outcome was an error.

>> Exactly. And setting that is error flag

is the signal to the host and the LLM

that something went wrong within the

tools logic. And crucially, the result

object can still contain that

descriptive error message we talked

about guiding the LLM on how to

potentially recover.

>> That seems like a robust way to handle

it. Okay, stepping back to the bigger

picture, what are the main strategic

wins that this kind of standardization

through MCP brings to the whole AI agent

ecosystem? Well, the biggest one is

probably accelerating development and

fostering a reusable ecosystem. By

having a standard way for tools and

agents to talk, you dramatically reduce

the friction and time it takes to

integrate new capabilities,

>> lowering the barrier to entry.

>> Definitely. And there's already talk

about things like public MCP registries

where people can publish standardized

declarations for their servers. Imagine

a future where you can easily browse and

plugandplay certified tools from

different vendors. That's a huge network

effect waiting to happen.

>> And for the agents themselves, does it

make them smarter or more capable?

>> It enables dynamic capabilities because

tools can be advertised and discovered

via the protocol. Agents could

potentially find and start using new

tools at runtime without needing to be

explicitly reprogrammed. That enhances

their autonomy and adaptability.

>> Interesting. And it also gives you

significant architectural flexibility by

decoupling the agents core reasoning

logic from the specific tool

implementations. You can build more

modular systems. People are talking

about creating an agentic AI mesh

networks of specialized agents and tools

communicating via MCP. It supports much

cleaner designs.

>> Okay, those are compelling advantages.

But let's talk about the flip side of

the challenges. You mentioned context

window bloat earlier when we discussed

best practices. Doesn't standardizing

everything with detailed schemas through

MCP risk making that problem worse if an

agent has access to hundreds maybe

thousands of tools.

>> That is the major nonsecurity scaling

challenge right now. You're absolutely

right. If an agent hypothetically has

access to a thousand tools, loading all

thousand detailed definitions, names,

descriptions, schemas into the LLM's

prompt context every time,

>> forget about it.

>> Yeah, it's completely infeasible. You

hit context limits, costs skyrocket and

worse, the Lon gets overwhelmed and

struggles to even figure out which of

the thousand tools is the right one for

the current task. Its reasoning quality

plummets.

>> So standardization creates this

potential fire hose of tools, but the

LLM can only sip from it. How do we

bridge that gap?

>> The most promising mitigation strategy

being explored and mentioned in the

paper is essentially applying a rag

retrieval augmented generation approach,

but for tools.

>> Okay. Tool retrieval.

>> Exactly. Instead of preloading all

possible tool definitions, the idea is

when the agent needs a tool, it first

performs a quick semantic search over an

index database of all available tools.

This search identifies maybe just the

top three or five most relevant tools

for the immediate task.

>> Ah, so you only load the definitions for

those few highly relevant candidates

into the context.

>> Precisely. You dramatically reduce the

context below by turning tool discovery

into an efficient targeted search

problem first. Only the relevant

information gets loaded for the final

reasoning step. It's seen as the most

viable path forward for scaling to large

numbers of tools.

>> That makes a lot of sense. Filter before

you load. Now, let's briefly touch on

security. The white paper acknowledges

that MCP itself was designed more for

decentralized innovation and

interoperability and it doesn't have

heavy built-in enterprise security

features like strong authentication or

authorization. This seems like a

significant gap for enterprise use. It's

a gap in the core protocol. Yes. And it

opens up several risks, but the one the

paper really flags as critical in this

model is the classic confused deputy

problem.

>> The confused deputy. Can you unpack that

quickly?

>> Sure. It's a well-known vulnerability.

Imagine you have an MCP server that have

high privileges. Maybe you can access

sensitive databases or code

repositories. Now imagine a

low-privilege user crafts a clever

prompt that tricks the AI model, the

host, into asking the MCP server to

perform a sensitive action. The MCP

server sees the request coming from the

trusted AI model. So it executes it. It

acts as a confused deputy performing an

action based on the AI's request without

verifying that the original user

actually had the permissions for that

specific action. The user basically

launderers their malicious request

through the trusted AI.

>> Right? prompt injection leading to

privilege escalation via the tool

server.

>> Exactly. That's a major concern because

the protocol itself doesn't handle that

enduser authorization context.

>> So given that MCP itself doesn't solve

this, how are enterprises supposed to

adopt it safely? What's the practical

solution?

>> The clear consensus and what the paper

points towards is that you absolutely

must wrap the raw MCP protocol in layers

of external centralized governance and

security. Meaning you don't expose MCP

servers directly. You put something like

an enterprisegrade API gateway, think

Apogee or similar platforms in front of

them. This gateway handles the critical

tasks. Robust authentication of the

user, fine grained authorization checks

for the specific action being requested,

rate limiting, logging, filtering

potentially malicious inputs before they

even reach the MCP server or the LLM.

>> So the security isn't in MCP, it's

around MCP.

>> Precisely. You leverage existing mature

enterprise security infrastructure to

provide the necessary secure framework.

Yeah,

>> the protocol enables the connection, but

the enterprise security layers ensure

it's used safely.

>> Okay, that clarifies the approach. So,

bringing this all together, it feels

like this white paper gives us a really

solid framework for thinking about how

to build agents that can actually

interact reliably with the world.

>> I think so too. The core takeaway is

simple really. Foundation models are

powerful brains, but they need tools to

act. MCP is emerging as the standard

language to connect those brains to the

hands and eyes.

>> But, and it's a big but getting the most

value means being disciplined about

those tool design best practices. We

covered clear names and descriptions,

focusing on tasks, not APIs, keeping

outputs lean, handling errors

instructively, and critically layering

those robust external security and

governance frameworks around the

protocol for any serious enterprise

deployment. And a lot of this is stuff

you, the listener, can put into practice

right now. Even if you're not using MCP

directly yet, thinking carefully about

how you define your agents functions or

tools that documentation, the

granularity, the output design, that

will make your agents more effective and

reliable immediately.

>> Absolutely.

>> Yeah,

>> those principles are universally

applicable.

>> So, as we wrap up, here's something to

chew on. Building directly on that

confused deputy risk we just discussed,

as these autonomous agents get more

deeply woven into our critical systems,

how do we actually design the interfaces

and the guardrails to ensure that an

agent is always acting on the user's

authorized intent, not just blindly

following the user spoken command? And

how do things like audit trails need to

evolve to capture that crucial

difference between what was asked for

and what was actually allowed?

>> That's a deep question. M

>> it's about control, authorization, and

true accountability in an agentic world.

>> Definitely something to think about.

>> Food for thought as you hopefully start

experimenting with these concepts.

That's all we have time for in this deep

dive. Thanks for joining us. Thanks for

having me.

Loading...

Loading video analysis...