I Stopped Hitting Claude Code Usage Limits (Here's How)

By Brad | AI & Automation

Summary

Topics Covered

Context Bloat Makes Messages 31x More Expensive
MCP Servers Cost Tokens Just By Existing
Replace Verbose Skills With Concise Instructions
Plan Mode Prevents the Most Expensive Mistake
It's a Context Hygiene Problem, Not a Limits Problem

Full Transcript

I was constantly hitting my clawed code limits and now I can use it all day and haven't hit a usage limit in weeks.

Here's what I changed. I dug into my setup and figured out that I was wasting a huge amount of tokens on invisible context bloat I didn't even know was there. Since then I've been testing the

there. Since then I've been testing the best ways to keep my usage down. And

right now I'm going to walk you through what I changed in my setup so I never hit my usage limits. Plus give you a claude skill that audits your whole Claude code setup. Tells you exactly what's costing you and how you can fix it. You can install it and run it

it. You can install it and run it whenever you're looking to keep things clean in your repo. And I'm giving it away for free at the link in the description. So, let's get into it. Just

description. So, let's get into it. Just

a quick primer in case you haven't seen this explained already, but every time that you message Claude Code, it rereads the entire conversation, every single

message. In fact, message 30 actually

message. In fact, message 30 actually cost 31 times more than your first message when you're in a clawed code session. On top of that, all of your

session. On top of that, all of your claw.md files, your MCP servers, skills

claw.md files, your MCP servers, skills are all preloaded into your context window on every single session. More

context means better results usually, but it also means more cost and more usage and even sometimes result in a worse output because as context bloats, the LLM only pays attention to the end

of your context and you start to miss the beginning of your context window. So

bloated context means you're paying more and getting less output. So the key strategy that I put in place was reducing that starting context by removing a bunch of invisible sources of

context that were bloating my claude code environment. So here's how you

code environment. So here's how you would actually see this. If you go and run / context in front inside of a fresh claw code session, you'll see exactly how many tokens you're already paying

for without sending a single message.

That'll be your system tools, your MCP servers, skills, all of that bloating your context window. for me when I ran this skill before making any enhancements. That was over 50,000

enhancements. That was over 50,000 tokens that I was paying for and every message from here is compounding off top of it. So, let me walk you through some

of it. So, let me walk you through some of the biggest sources of what that context bloat was coming from and how I solved each one of them. The number one issue for me was MCP servers. Every

connected MCP server loads all of its tool definitions into your context on every single message. And not just when you call that tool, every message. One

server can have about 18,000 tokens worth of tool definitions. If you have a few of these servers compounding, that can be over 70,000 tokens of dead weight on every single turn. The fix here is

actually dead simple. If you run / MCP at the start of every session, you'll see all of the MCP tools that you have connected at the moment. What I would do is go through and disconnect all the

ones that I'm not actively planning on using in that session. And that saves a bunch of context. The other thing that I'm doing to take this a step further is by replacing my MCPs with CLIs or

command line interfaces if they're available. The reason I'm doing this is

available. The reason I'm doing this is because a CLI only costs tokens when Claude actually calls that command. And

an MCP server cost tokens just by existing in your session. And by

replacing it with CLI, you can expect a 40% savings in tokens usage. I've done

this with the Playright MCP and Ampify recently and it's been really making a difference for me. Your claw.md gets

loaded into context at the start of every session and those tokens are part of every API call to the claude API from that point on. So there are three things that you should do to your claw.md file

to optimize it and reduce your usage.

The first is check for contradictions.

Be concise in one section but then in another section it says always explain your reasoning in detail. In another

claude can't satisfy both of these requests. If it picks one, you'll get

requests. If it picks one, you'll get inconsistent output, and you don't know why. It also bloats your file. Second is

why. It also bloats your file. Second is

cutting instructions that aren't earning their place. There are five questions to

their place. There are five questions to ask about every single rule. Is this

something Claude already does by default without being told? Does it contradict another rule? Does it repeat something

another rule? Does it repeat something already covered? And was it a band-aid

already covered? And was it a band-aid for a specific bad output? Is it so vague that Claude would interpret it differently every time, like be more natural or use a good tone? If a rule

fails any of these, cut it. The audit

skill that I'm linking in the description below uses these five filters and gives you a list of what to remove and a reason for each. It's

better to be ruthless about this and cut more then add back what you need later.

Third, and this is the biggest one for me, was progressive disclosure in my claw.md. Your core claw.md should only

claw.md. Your core claw.md should only contain rules that apply to every single session in that repository. universal

stuff like style preferences and project structure. Everything else goes into

structure. Everything else goes into your reference files. API conventions,

you put them in an API standards.mmd and

add one line to your claw.md. For API

conventions, read API standards.mmd. But

this goes for development patterns, testing guidelines, development rules, all of the same thing. One line pointers to more detailed resources. Claude only

reads those files when it actually needs them. So instead of a 5,000 token

them. So instead of a 5,000 token claw.mmd file sitting at the root of

claw.mmd file sitting at the root of your repo with detailed instructions for how to deploy APIs when you're working on the front end, you've got maybe 500 lines of the core file with oneline

references to uh reference docs which Claude pulls in on demand. Next is

skills. Same idea applies to skills.

Every skill you have installed gets its metadata loaded into context so Claude can decide whether to trigger it or not.

If you're loading up a setup with tons of skills, you're bloating your context for very little return. And if those skills are 400, 500, 800 lines of verbose instructions, which a lot of

them are if you're downloading them from free skill marketplaces online, then those are the ones that are burning up even more of your context. More

instructions doesn't necessarily mean a better output when it comes to skills.

Past a certain point, Cord starts ignoring rules as there's too much competing for its attention. The best

skills are concise and short. The audit

skill that I have linked in the description below runs the same five filters on your skills. It tells you which ones are overly verbose and which ones aren't providing you enough value to justify their usage. Number three is

settings. There are a few things in your

settings. There are a few things in your settings.json that's still tuned for the

settings.json that's still tuned for the old hunt for the old 200,000 context window that Claude used to have. And

most people never really set these up at all. So, I'm going to go through a few

all. So, I'm going to go through a few quick ones that I added to help me with my context window. Number one was autoco compact. Claude autoco compacts

compact. Claude autoco compacts somewhere around 83% of your context window by default, but the output quality starts degrading well before that. If you set auto compact percentage

that. If you set auto compact percentage override to about 75, compaction triggers earlier before the quality degrades. Bash output. When Claude runs

degrades. Bash output. When Claude runs a shell command, it can only read a limited number of characters. The

default is between 30 and 50,000.

Anything beyond that gets silently truncated. Then Claude reruns the entire

truncated. Then Claude reruns the entire command and asks for more information.

That's a lot of wasted tokens on retry.

So, what you need to do is set the bash max output length environment variable to 150,000 tokens. Another one is file permissions. By default, claude code can

permissions. By default, claude code can read everything in your project. Node

modules, uh, the disc folder, the lock files, and build artifacts. The fix is to add deny rules into your settings.json permissions. It works like

settings.json permissions. It works like a git ignore, but for claude. Tell it

which directories it can't read. It

stops wasting context on stuff it doesn't need to know. The audit also checks for these. It's if it's missing autocompact, missing bash output override, or missing any of these deny rules, it will flag them and tell you

exactly what to add. So, before we get into the quick wins, let me show you how to install the audit skill re real quick. It takes about 10 seconds. So,

quick. It takes about 10 seconds. So,

here's how you can get really quickly set up with the context audit skill inside of your Claude code. You'll just

need to go to the Google Drive link in the description below and download the Claude skill. Once you've got the skill

Claude skill. Once you've got the skill in school, just write slashcontext audit and it'll go through and run all of the analysis that you need to get your clawed context window into tip-top shape. So, I've just run it here and

shape. So, I've just run it here and it's given me a score of one out of 100.

It's recommended a bunch of different things that we can do to fix up my context. Once I've done that, all I have

context. Once I've done that, all I have to do is ask Claude to go and implement those fixes and it's done. Claude will

go ahead and handle all of it. It will

split up my claw.md file. It will remove any unnecessary skills and add all the settings that I'm missing. That's how

you install it. Just a quick thing.

Right now, everyone is sharing Claude skills. They're cloning repos,

skills. They're cloning repos, installing code they haven't reviewed.

And the number one question I keep hearing is, "How do I know that this is safe?" I'm building a marketplace for

safe?" I'm building a marketplace for verified Claude code skills, security reviewed, vetted by real engineers, and domain experts, so you know exactly what you're getting before you install it.

That sounds useful. The weight list link is in the description below. All right,

so that's all the setup stuff.

Everything up to this point is about pruning out the invisible waste. Now,

let me go through the habits that make a big difference dayto-day. Now, you might have heard a few of these already, but they're genuinely useful for keeping your usage under control. Remember to

start fresh sessions between unrelated tasks. Remember how token cost compounds

tasks. Remember how token cost compounds with every message? If you just finished a 20 message uh session on content research and now you're switching to script writing, every scripting message

is still paying the tax of all of that research content. The slashclear command

research content. The slashclear command fixes that with a fresh session, fresh context. This one habit probably saves

context. This one habit probably saves more tokens than anything else on this list. Use the plan mode before anything

list. Use the plan mode before anything non-trivial. This is the single most

non-trivial. This is the single most expensive mistake in Claude code is letting it go down the wrong path. It

writes 200 lines of code. Then you

realize it's misunderstood the task and now you have to scrap it and start over.

This happens to me all the time. All of

these tokens are now gone. Plan mode

lets Claude ask you clarifying questions first. map out the approach and then

first. map out the approach and then from there get alignment before it starts to write a single line. If you

want to take this a step further, have a look at some of the development frameworks like BMAD or PRD that go deep into planning loops. When Claude gets something wrong, don't send a follow-up correction. Every follow-up message gets

correction. Every follow-up message gets added to a conversation history permanently. So now you've got the bad

permanently. So now you've got the bad response and your correction and the new response all sitting in your context, compounding on every future message. The

bad exchanges get replaced entirely. You

save the tokens from the bad response and the correction and you don't pollute the rest of your session. The next thing is using the right model for the job.

Sonnet handles most coding work. Haiku

is great for sub agents, formatting, and simple lookups. And if you're looking to

simple lookups. And if you're looking to make deep architectural planning choices when Sonnet isn't cutting it, that's when you use Opus. The most important takeaway here is that it's not a limits problem. It's a context hygiene problem

problem. It's a context hygiene problem and your setup drifts over time. That's

why I built the audit as a skill and not a one-time checklist. install it once and run it whenever you want. The

context audit skill is free and it's at the link in the description below. Drop

it in your skills folder and run it. It

checks everything we talked about and gives you a score, tells you exactly what to cut and why. Drop a comment with any questions. I read every single one.

any questions. I read every single one.

And if you're rolling out Claude code across a team or business and want help getting that done right from the start, there's a link for strategy sessions in the description below as well. Thanks

for watching.

Loading...

Loading video analysis...