MCP Is Burning Your Tokens Before You Ask a Single Question

By DevOps & AI Toolkit

Summary

Topics Covered

MCP Is a Tax on Your Agent's Intelligence
LLMs Already Know Popular CLIs—No Discovery Needed
Skill Files Beat MCP on Context Footprint
MCP, CLI, and HTTP Are All Just Plumbing
Use CLI for Known Tools, MCP for Custom Ones

Full Transcript

Every MCP tool you connect burns tokens before you even ask a question. Tool

names, descriptions, parameter schemes, all of it gets stuffed into your context window on every single turn. Connect a

few servers and you've lost thousands of tokens to tools you might never never call. That is the reality of the MCP

call. That is the reality of the MCP protocol. It is a tax on your agents

protocol. It is a tax on your agents intelligence. Now, don't get me wrong,

intelligence. Now, don't get me wrong, MCP solves a real problem. It gives a standardized discovery and zero client installation. But there's an alternative

installation. But there's an alternative that uses a fraction of the context, costs nothing extra to set up, and your agent might already know how to use.

It's not perfect either, but I think it's better choice for most real world scenarios. In this video, we're going to

scenarios. In this video, we're going to put MCP to the test, run some real operations through it, and then explore that alternative side by side. We'll

look at where each one wins, where each one falls short, and by the end, you will have a clear picture of which approach fits your situation. Now, we're

talking about agents connecting to remote servers, calling APIs, managing infrastructure. But here's something

infrastructure. But here's something most people skip over. When your agent calls a remote API on your behalf, how does it authenticate? How do you make

sure it only accesses what it should?

Hardcoded API keys in your agent config.

Huh? That's a disaster waiting to happen. And that's where the sponsor of

happen. And that's where the sponsor of this video comes in, O0.

Ozero now has production ready security for AI agent workflows. Their token

vault lets agents connect to thirdparty services like GitHub or Slack on behalf of users without ever ever seeing row credentials. No more hard-coding API

credentials. No more hard-coding API keys into your agent setup. Fine grain

authorization for rug ensures your LLM's respect permissions. So company A's data

respect permissions. So company A's data never leaks to user B during retrieval.

And if an agent is about to do something critical like a high impact operation, client initiated back channel authentication triggers a push

notification for human approval before it goes through. Now here's what caught my attention though. Ozero now offers enterprise SSO and scheme on their free

tier. That's the kind of thing

tier. That's the kind of thing competitors charge a premium for.

production ready OT from day one without an enterprise tax. Head to ozero.com to check it out. Big thanks to Ozero for

sponsoring this video. And now let's get back to putting MCP to the test.

Let's start by connecting our agent, my agent, to a remote server through the MCP protocol. We will launch cloud code

MCP protocol. We will launch cloud code with the dedicated MCP configuration that points to a DevOps AI toolkit server running in my in our Kubernetes

cluster. Now once inside cloud code we

cluster. Now once inside cloud code we can check what MCP servers are connected and what tools they expose. So look at that over there right the output shows eight highle tools. There is recommend

for getting a power deployment recommendations query for asking natural language questions about your cluster remediate for fixing issues. operate for

day two operations and so on and so forth. There are only eight tools.

forth. There are only eight tools.

Eight. This is eight I think. And that's

important. Remember that number. We will

get to why it matters later when we talk about some of the downsides of the MCB protocol. Now before we go further,

protocol. Now before we go further, let's frame what we are actually exploring here. Agents do two types of

exploring here. Agents do two types of work. There are local operations like

work. There are local operations like reading source code, editing files, running tests, and so on and so forth.

The agent handles those with built-in capabilities or tools already on your machine. Then there are remote

machine. Then there are remote operations like quering cluster state, managing infrastructure, accessing organizational knowledge bases and similar ones. For those the agent needs

similar ones. For those the agent needs to talk to a server. So we're not debating whether remote servers with detent capabilities are useful. We are

debating how agents should connect to them. The MCP protocol is one option.

them. The MCP protocol is one option.

There are others. And to be clear, we're talking about MCP connecting to remote servers over HTTP, not to the local

stddio transport where an MCP server runs as a process on your machine.

That's a different scenario with different trade-offs. So let's see what

different trade-offs. So let's see what this looks like in practice. We will ask the agent to show us the components running in the cluster, their relationships, and their health status all through the MCP protocol. And then

we wait and we wait. We fast forward actually and here we go. What happened

here is straightforward. We typed a request into our agent. The agent sent it to the LLM. The LM decided that the query tool should be called and told the agent to do it. The agent executed the

call to the remote server through the MCP protocol to remote server. The

server processed the request and returned structured data back through the agent to the LLM which formatted it into what we see on the screen right

now. And it worked. The agent talked to

now. And it worked. The agent talked to a remote server and got useful information back. How that server

information back. How that server assembled information is not important in this context. It could have been coming from a database from a Kubernetes cluster. It could have invoked a remote

cluster. It could have invoked a remote agent or anything else. What matters is the communication channel between our agent and that server. Now that channel,

the MCP protocol gives us something genuinely valuable. It's a standardized

genuinely valuable. It's a standardized way to connect. Implement the protocol once and your server works across cloud code, copilot, cursor, gemini, and so on

and so forth or any other MCP compatible agent, which is basically every agent.

The agent connects, discovers what tools are available, gets typed schemas, and calls them. There is no CLI to install

calls them. There is no CLI to install on anyone's machine. No binary to distribute and keep updated. And that's

the real value, especially if you're building a server that needs to work across many different agent platforms. But there's a cost. Every MCP tool definition, its name, description, and

parameter schema gets injected into the LM's context window on every conversation turn. So remember that

conversation turn. So remember that eight tools we saw earlier, that's a well-designed server with a small number of highlevel capabilities. Now imagine

connecting to a server that exposes 40 or 50 tools because someone intelligently did a onetoone mapping of their API endpoints. That's tens of

thousands of tokens burned before you even ask a question, leaving less room for the actual context of your project.

Now there are mitigations emerging like lazy loading tool search and dynamic tool sets but they are extra machinery bolted on to address the limitation of the protocol itself. So MCP gives a

standardized discovery and zero client installation but it comes with the context tax. What if there is a simpler

context tax. What if there is a simpler way? What if the agent could just use a

way? What if the agent could just use a CLI instead?

We established that agents need a way to talk to remote servers and MCP is one option. But there's another channel

option. But there's another channel that's been sitting right in front of us this whole time and that's CLIs. Think

about what a CLI like cube control GH or AWS actually is. It's a binary on your machine that sends requests to a remote server and returns text output. Cube

control get pods talks to the Kubernetes API. JH issue list talks to GitHub's

API. JH issue list talks to GitHub's API. AWS S3 LS talks to AWS. They're all

API. AWS S3 LS talks to AWS. They're all

doing the same thing the MCP protocol does just through a different mechanism.

They're not talking about CLIs that do purely local work like jQuer said. We're

talking about CLIs as a communication channel between your agent and the remote server. Now, here's the thing.

remote server. Now, here's the thing.

LMS are trained on enormous amounts of terminal interaction. They've seen GH

terminal interaction. They've seen GH cube control, AWS Docker, and hundreds or thousands of other CLIs in action across millions of documents. They don't

need a protocol to tell them that those tools exist or how to use them. They

already know. The knowledge is baked into the model. The LM doesn't discover the tool at runtime. It already knows it. So, let's see what it looks like.

it. So, let's see what it looks like.

We'll ask the agent to show us open issues in a GitHub repository. No MCP

server, no dual definitions, no configuration, just a plain request. And

then it works for a while. And there we go. And that's the same pattern we saw

go. And that's the same pattern we saw with MCP. We typed the request, the

with MCP. We typed the request, the agent sent it to the LLM. The LM decided what to do and the agent executed it.

The difference is that the LM didn't call an MCP tool. It run GH issue list command. The agent talked to GitHub's

command. The agent talked to GitHub's remote API just through a CLI instead of through the MCP protocol. The end result is identical. The difference is how it

is identical. The difference is how it got there. The reason this works so

got there. The reason this works so smoothly is that LLMs are trained on massive amounts of GitHub related content. They know GitHub's APIs. They

content. They know GitHub's APIs. They

know the GH CLI that is a standard way to interact with it from the terminal.

And they know how to construct the right commands. The agent assumed I have GH

commands. The agent assumed I have GH installed on my machine and it was right. If it turned out that I didn't,

right. If it turned out that I didn't, it would have tried an alternative, maybe a direct API call with C URL. For

a service as well known as GitHub, it would be pointless, and I repeat, pointless to connect through MCP. The

CLI is all that the agent needs, and it already knows how to use it. To justify

MCP or any other additional layer, you would need to offer capabilities that the CLI doesn't already provide. But but

but here's where it gets interesting.

LMS know the GitHub CLI because GitHub is everywhere in their training data.

The same goes for cube control, AWS Terraform, and other vitally used tools.

But what about your company's internal platform CLI? or a smaller open source

platform CLI? or a smaller open source project that hasn't made it into the training data yet or a custom API your team built last quarter. The LLM has no

idea those exist, let alone how to use them. So, how do we bridge that gap? We

them. So, how do we bridge that gap? We

teach it. We give the agent just enough context to know that the CLI exists and what it does and it figures out the rest on its own. We will see how that teaching works in a moment. First, let's

see the result. We will ask the same question we asked through MCP earlier, but this time tell the agent to use the CLI. That CLI, at least at this moment,

CLI. That CLI, at least at this moment, is completely unknown to LMS, at least right now, right at the time of this recording. It is not in any training

recording. It is not in any training data. And the only reason we are being

data. And the only reason we are being explicit about saying, hey, use the CLI here, just like we've said, use the MCP earlier, is because this demo has both configured side by side. In a real

scenario, you would have one or the other, right? The LM would figure out

other, right? The LM would figure out what to use from the context. We will

explore how that works shortly. So,

let's see it in action and then wait and wait and we fast forward and there we it first of all it got confused, right?

Because it doesn't know about it, right?

I tried to teach it. We'll see later how. But teaching is complicated. We'll

how. But teaching is complicated. We'll

discuss that later. For now, let's say that it worked fine. And we can see that the agent used the AI CLI to query the cluster and got data back. The

presentation might differ from what we saw earlier to MCP, but that's irrelevant in both cases. The agent

talked to a remote server, received data, and the LM formatted that data for us. The communication channel was

us. The communication channel was different. The outcome was the same.

different. The outcome was the same.

Now, the agent did not magically know about the CLI. We had to teach it or it had to discover it. and it got confused a bit over there and that's that's the downside. Nevertheless, most coding

downside. Nevertheless, most coding agents support some form of custom instructions whether they're called skills or rules or something else.

Unlike MCP where the tool comes with a full description and a complete parameter schema, a CLA based instruction just needs to tell the LM that the tool exists and what it's for

potentially how to use it. From there

on, the LM discovers the specifics like available commands and arguments on its own by telling the agent to run things like d-help when it needs them. So let's

look at what that looks like in practice. Look at that. Look at that.

practice. Look at that. Look at that.

That is the price. The price to be paid for using CLIs that LMS are not already trained on. Each of those skill files is

trained on. Each of those skill files is a small set of instructions that tells the LM it that the tool exists and when to use it. TM can figure out the rest.

Now, let's look at what's actually inside one of those. There we go. That's

it. A name, a description, and a hint to run d-help for discovery. That's all the LM needs to start using the CLI. Now,

notice it even says, "Hey, prefer CLI commands via Bash over MCP tools when both are available." That's how you would tell the LM to favor the CLI over

MCP in a real setup where you don't want both. That's a generic skill that covers

both. That's a generic skill that covers the entire CLI. But but but we can also create more specific ones that tell the LM exactly when to reach for a particular command. Look at that. That

particular command. Look at that. That

one is narrower, right? It tells the LM that when someone asks about cluster resources helper status, this is the command to use. The generic skill says,

hey, this CLI exists, the specific skill says, use this particular command for this particular job. Together they give the LM enough context to make good

decisions about when and how to use the tool. Now compare what's in those skill

tool. Now compare what's in those skill files to what MCP loads into context.

Both MCP tool definitions and skill descriptions end up in the LM's context window. Both of them. Now both include a

window. Both of them. Now both include a name and a description of what the tool does. But MCP also includes full

does. But MCP also includes full parameter schemas for every tool. every

argument, its type, whether it's required, allowed values, nested structures. For a complex tool that's

structures. For a complex tool that's easily hundreds or even thousands of tokens, a skill file like the ones we just saw, well, that's a few lines of text. The context footprint is

text. The context footprint is dramatically smaller. Now, there's a

dramatically smaller. Now, there's a real trade-off there. With MCP, the LM has all the parameter details up front and can construct the right call immediately. with a CLI, the LM might

immediately. with a CLI, the LM might need to run d-help, then hey, this command with the help, and then some other command with a help, and then try this, and try that, and it finally will

discover available commands and their arguments before making the actual call.

There are fewer tokens in context, but potentially more round trips at execution time. So, MCP pays the cost

execution time. So, MCP pays the cost once upfront. CLI's paid on demand.

once upfront. CLI's paid on demand.

There's another cost to consider though.

Someone has to create and maintain those skill files. Someone has to install the

skill files. Someone has to install the CLA on every developer's machine and keep it updated. With MCP, the server handles everything centrally. You

connect, you get the tools, and that's it. With CLIs, you're distributing both

it. With CLIs, you're distributing both the binary and the instructions. That's

a higher maintenance burden, especially across large teams. But it doesn't have to be manual. That command over there, the skills generate command, fetches tool definitions from the server and out

to generate skill files for whichever agent you're using, whether that's clot code or cursor or wind surf, right? Rer

running it regenerates everything. So

when the server adds new capabilities, one command updates all the skills. You

can even install a hook that regenerates them automatically on session startup.

On top of that, you can add custom skills into a git repo and manage them across repos and teams and laptops from a single place. So that kind of maybe

who knows takes care of maintenance problem. The skill files aren't

problem. The skill files aren't something you write once and babysit forever, right? They can be generated,

forever, right? They can be generated, version controlled, and distributed alongside the CLI itself. And here's a point that often gets lost in the MCP versus CLI debate. Those are not

fundamentally different things. their

different interfaces to the same server like DevOps CIA toolkit the one I'm using for example maintains a single API in its codebase the MCP protocol support the HTTP endpoints and the CLA commands

are all generated autogenerated from that one API as part of the release process when a new tool is added to the server a corresponding HTTP endpoint appears a new CLA command appears and

the MCP tool definition appears as well they're all derived from the same source the real work is building the server server and its capabilities. The

transport layer whether it's MCP or a CLI or is that's plumbing and for the most part it can be autogenerated. So

where does all this leave us? Right,

let's step back and look at the full picture.

Before we get to the verdict, right, let's remember what we're actually comparing. This whole discussion is

comparing. This whole discussion is about how agents talk to remote servers, quering cluster state, managing infrastructure, accessing knowledge bases for local operations like reading

code, editing files or running tests.

Agents already have built-in capabilities and skills. You don't need MCPS, you don't need CLIs. The question

is only about the remote communication channel. And it's not even a two-way

channel. And it's not even a two-way race. There's a third option we haven't

race. There's a third option we haven't talked about. A web UI that talks to the

talked about. A web UI that talks to the same server through a plain HTTP REST API. That's how humans interact with

API. That's how humans interact with those services directly. No agents

involved at all. So we really have three interfaces to the same back end. MCP for

agenttos server communication with standardized discovery. CLI for agent to

standardized discovery. CLI for agent to server communication through the terminal and HTTP for web UI and direct API access. All three are just plumbing.

API access. All three are just plumbing.

Now with that in mind, let's lay it out.

MCP gives you standardized discovery across agents. Build one server and it

across agents. Build one server and it works with cloud code cursor compile or Gemini anything or anything you want.

Anything that speaks the protocol which is everything. There is no CLA to

is everything. There is no CLA to install, no binary to distribute, no version to keep in sync across machines.

And when the MCP server runs remotely as it should be and it serves many people in an organization, maintenance is centralized. Update the server once,

centralized. Update the server once, everyone gets the change. That's a real operational advantage, especially for large teams. If the MCP server is welld designigned with a small number of

highle tools and a light schemas, the context cost is minimal. Remember those

eight tools we saw earlier? That barely

dense your context window. In that

scenario, MCP is arguably the better choice. Centralized maintenance, no

choice. Centralized maintenance, no skill files to manage, no client side upgrades, it just works. Now the problem is when MCP servers are not welld

designigned. When someone maps every API

designigned. When someone maps every API endpoint to its own tool and ships a server with 40, 50 or 90 or hundreds of tools each with full parameter schemas.

That's where MCP breaks down. Tens of

thousands of tokens burned before you ask a single question. Now yes there are mitigations like lazy loading and tool search but those are workarounds bolted

on to compensate for poor server design.

Now, CLI's flip that around. For

well-known tools like cube control or GH or AWS, the LM already knows how to use them. Zero context cost, zero

them. Zero context cost, zero configuration. The knowledge is in the

configuration. The knowledge is in the training data. For lesser known or

training data. For lesser known or custom CLIs, you need skill files to teach the LM that they exist. But those

files are tiny compared to MCP definitions. The context footprint is a

definitions. The context footprint is a fraction of what even a well-designed MCP server costs. The downside, well, CLA's require installation on every machine. Someone has to distribute the

machine. Someone has to distribute the binary and keep it updated. For tools

DLM hasn't seen before, you need skill files, which is an extra maintenance step. That maintenance is distributed

step. That maintenance is distributed rather than centralized. Even if it can be automated like with the commands I showed earlier. So, here's how we think

showed earlier. So, here's how we think about it. If the LM already knows the

about it. If the LM already knows the tool, use the CLI. Use it. There's

nothing to debate. If the tool is custom or internal, and the MCP interface is lean with a handful [snorts] of well-designed tools, MCP is the solid choice. You get standardized discovery,

choice. You get standardized discovery, centralized maintenance, and the context cost stays manageable. But if you're dealing with bloated MCP servers, or you

want the absolute smallest context footprint, CLI with skill files are probably the way to go. Now, those are not really even competing religions,

right? They're different transports to

right? They're different transports to the same server. As we saw, the same API can expose MCP, HTTP, or CLA interfaces simultaneously. Pick the one that fits

simultaneously. Pick the one that fits your constraints. And to close it up, if

your constraints. And to close it up, if you want to see this in action, check out DevOps AI toolkit, the project I'm working on. Use it, fork it, start it,

working on. Use it, fork it, start it, study it. Please send feedback, create

study it. Please send feedback, create the issues, and feature requests. It

supports MCP, CLA, and Web UI, so you can try all three different approaches yourself and see which one works best for your setup. Thank you for watching.

See you in the next one. Cheers.

Loading...

Loading video analysis...