Your Claude Limit Burns In 90 Minutes Because Of One ChatGPT Habit.
By AI News & Strategy Daily | Nate B Jones
Summary
Topics Covered
- Models Are Not Expensive, Your Habits Are
- The 20x Token Savings from Markdown Conversion
- Frontier AI Can Be Absurdly Cheap
- Don't Bring a Ferrari to the Grocery Store
- Agents Must Index References Before Retrieval
Full Transcript
The next generation of models is likely to drop in the next one to two months.
I'm talking about Claude Mythos. I'm
talking about whatever Chad GPT drops next. I'm talking about the next Gemini
next. I'm talking about the next Gemini model. They will be more expensive, a
model. They will be more expensive, a lot more expensive because they're all trained on much more expensive chips, the GB300 series from Nvidia, and it's just going to get more expensive from there. The intelligence we're going to
there. The intelligence we're going to get, the ambient compute all around us that is essentially free intelligence is going to be the dumber models. That's
just how it is. If you want to use cutting edge models, you have got to stop burning tokens and blaming the model. And that is the theme for this
model. And that is the theme for this video. If you're in a position where
video. If you're in a position where you're wondering how much token usage you have or how expensive your AI is or whether you're using too many tokens for your AI or how you can even measure
that, how you can get better at it. That
is what this is. And that is going to be one of the most valuable skills on the planet. by the way, because you do not
planet. by the way, because you do not want to be in a position where you are putting $250,000 a year. A real number that JSON Huang gave in a real interview for what he expects an actual individual
engineer to spend in a year on tokens.
You don't want to be the person spending 250 grand on tokens you don't have to be spending on. You want to be smart. And I
spending on. You want to be smart. And I
am going to give you a specific example.
This is real life example. A real person I know gave me permission to use this. I
recently saw a production AI pipeline that ingests multiple long- form conversations per user, runs an analysis across dozens of dimensions and generates a fully personalized output
all on the most expensive models that money can buy. Not because the person wants to use expensive models, but because he tested it and what he found was that the better models produce the results he needs for this business. The
cost per user less than a quarter, less than 25 cents per user for that. Most of
us are spending more than we need to on AI and this is a video about that. You
can be really smart, use really good cutting edge AI and you can be intelligent with your token usage and not spend a ton of money. If you want to know what that's like, keep on watching because we're going to get into specific
strategies and I'm going to show you what I built so that we can actually make this easier for everybody so it's not just a guessing game anymore. The
takeaway is that Frontier AI can be absurdly cheap when you know what you're doing. Essentially, the models are not
doing. Essentially, the models are not expensive. It's your habits that cost a
expensive. It's your habits that cost a lot. And with cloud usage limits
lot. And with cloud usage limits dominating everything in the last week, I think it's worth having that conversation. So, let's get to it. I've
conversation. So, let's get to it. I've
made the case we can use our models better. What are the specific habits we
better. What are the specific habits we can change? I want to name specific
can change? I want to name specific habits that I have seen in conversations with others, looking over shoulders, reading GitHub repos, listening to conversations online. These are specific
conversations online. These are specific examples that are patterns I see over and over again. And the first one is the rookies. The folks who are new to
rookies. The folks who are new to cutting edge, you know what you bleed out on in tokens? You bleed out on document ingestion. This one drives me
document ingestion. This one drives me crazy because it's so so easy to fix. A
brand new Cloud Desktop user might drag in three PDFs into a conversation that might be 1500 words each, which is just 4,500 words of text. It's not that long.
And they say, "Summarize these." and
Claude processes the raw PDFs with all the formatting overhead that goes with that, the headers, the footers, the embedded fonts, the layout metadata, and the entire binary structure gets encoded
as tokens. And so the 4500 words of
as tokens. And so the 4500 words of content can become a 100 plus thousand tokens if you're not careful. All you
have to do to avoid that is just think in terms of markdown. If you just ask Claude or frankly go to any of a number of services on the internet that are free and say please convert to markdown.
It will just do it right. It will just take 10 seconds and convert to markdown.
And then you have a very clean set of content that's between four and 6,000 tokens. And that's like saving you 20x
tokens. And that's like saving you 20x on the memory. And this waste just compounds, right? Because once those
compounds, right? Because once those 100,000 tokens are in your conversation history, they bounce back and forth and bounce back and forth. And this is how you fill up your token window. and you
wonder how other people get so much done. Please, please, please, if you're
done. Please, please, please, if you're new to AI or if you've never thought about it, think about the file formats you're throwing because so many of these file formats are designed to be human
readable. They're not designed to be AI
readable. They're not designed to be AI readable. Think about the token
readable. Think about the token efficiency of these file formats. And if
you're wondering, well, how do I convert to markdown? I built something for you
to markdown? I built something for you because all you have to do is just ingest a file. You you hit transform and it just converts it back into into markdown. That's it. And we have a
markdown. That's it. And we have a number of file types. We're adding more from the community all the time. It's
part of the open brain ecosystem. It's
just a plugin you can put in and it will just convert it to markdown. But that's
not the only way. You can tell Claude to do it directly. You can also just directly do it on the internet with any of a number of free web services.
Markdown conversion should not be gated.
It just it's super easy to do. Tokens
are designed to preserve everything in an original text. If you wanted to reason about the style of the PDF, fine, keep it. But 99% of the time, all you
keep it. But 99% of the time, all you care about is the text. You just want it in markdown. Please, please, please
in markdown. Please, please, please think about your file formats. Next big
mistake that people make, and this one comes a little bit after people tend to convert to markdown and start to understand how some of these initial documents work. Please do not sprawl
documents work. Please do not sprawl your conversations. If you were doing
your conversations. If you were doing 20, 30, 40 turns on a conversation, no AI was reinforcement learned, trained, or designed to handle that kind of
sprawl. All you're doing is compressing
sprawl. All you're doing is compressing the ratio of the conversation where the original instructions happened. And yes,
the models are getting better and better and better at anchoring on and remembering those original instructions even when they go through compression.
But why make them suffer? Why make
yourself suffer by filling up the context window with croft? Why waste
tokens? Why not just ask for what you want upfront? And if you're going to
want upfront? And if you're going to have an evolving exchange or evolving conversation, clearly market at the top as our goal here is to evolve and reach
a conclusion together. And then you have a light conversation that goes 20 or 30 turns and say, "Thank you. I've got a conclusion. Please summarize this." And
conclusion. Please summarize this." And
then you go and do real work. I see so many people trying to mix together modes, but AI is really designed for single turn do a lot of heavy work more and more and in that context you need to do the thinking in advance and bring
that to the table and if you need to think with AI that should be in a separate chat, separate conversation. It
might even be a separate model. It might
be three separate models and you're bringing all of that in. I do that all the time. I'm like, okay, I want to look
the time. I'm like, okay, I want to look through what communities are thinking about AI on X. I'm going to go to Grock for that or I'm going to go through and look at what earnings reports are saying about the state of AI and capital investment. I'm going to go and pipe
investment. I'm going to go and pipe that through chat GPT thinking mode and get a bunch of reports out on that. Or
I'm going to go through perplexity research and get a bunch of reports out on that. Now I'm going to go and have a
on that. Now I'm going to go and have a look at what some major blog posts have to say about a particular AI topic. I'll
just go to Claude Opus 4.6. We'll do a targeted web search. We'll go back through. We'll make sure we understand
through. We'll make sure we understand what we're looking at. None of that is intended to be a single answer, right?
These are all evolving conversations.
Once I get what I want out of each of these individual threads, I can pull them together and say, "Okay, now I have a piece of work to do. Now I have something I actually need done and I have all the context needed." So you
should have two modes here. You should
have a mode where you are trying to gather information and a mode where you are trying to focus and get work done.
Do not mix the two together. That is how you burn tokens. That is how you confuse the AI. Your objective when you want the
the AI. Your objective when you want the AI to do real work should be to be so clear that the AI needs to do nothing else and it just goes and gets the work
done and comes back. It should be that clear. If you are an intermediate user
clear. If you are an intermediate user and you are like, I know this stuff, Nate, well, let me give you another tip you may not know. the people who are adding lots of plugins to their chat GPT or their cloud instances, you are paying a tax every time you start a
conversation because in the background, those are going to be loaded in and they're going to start to fill the context window. I know someone who
context window. I know someone who shared with me that they are over 50,000 tokens in on a context window before they type the first word because they
actually load that many plugins and connectors. You don't need that much.
connectors. You don't need that much.
You know what that's like? That is like walking in to a fully functional tool workshop and the first thing you do instead of leaving the tools on the walls is you go and get all the tools off and you lay them out on the
workbench and you say, "Okay, now we're going to do, I don't know, we're going to do something. We're going to make a bench." Do you need all 200 tools in the
bench." Do you need all 200 tools in the workshop to make the bench? No. You
probably need the right five. Think
about that the next time you have an approach to tooling. Because so many people, we we hear about this new plugin, we hear about this new connector, someone hypes it up, we say
we need to add it, and we don't realize it's a silent tax for the rest of time.
Every time we have a conversation, and it just adds that little bit, it adds a thousand tokens, it adds 2,000 tokens, whatever it does, and it just adds it always. Do you want to pay that for the
always. Do you want to pay that for the model? Maybe you should think more
model? Maybe you should think more strategically about which plugins and connectors are really adding value for you because they can. like they can be tremendously valuable, but make sure you know which ones you really want because
if you don't, then you're going to be looking at dozens of plugins that you don't really need that are supposed to add value, but just add a bunch of croft, a bunch of junk into your context
window and confuse the model and keep it from doing good work and maybe confuse it as to which tools it's supposed to use. Now, I'm saving the most expensive
use. Now, I'm saving the most expensive and the most advanced users for last because this is where the leverage lies.
If you are an advanced user, if you are someone who's like, "Send me to the GitHub repo. I can just do this myself.
GitHub repo. I can just do this myself.
Let me install OpenClaw on my Mac Mini.
I'm okay managing the gateway. I can be secure." This is for you. You have the
secure." This is for you. You have the most leverage of anybody out there in terms of how many tokens you use. And
typically speaking, your mistakes are the most expensive ones because if you screw up, you're screwing up at a level of hundreds of thousands or millions of tokens, maybe more. And the reason why is simple. You are doing bigger projects
is simple. You are doing bigger projects with AI. And when you do big projects
with AI. And when you do big projects with AI, your ability to leverage AI effectively becomes one of the most critical things you can do to manage ROI and cost on a particular project. It is
a job skill at that level. If you're
technical enough to go to a GitHub, you have a job skill to manage tokens efficiently. And you cannot pass that
efficiently. And you cannot pass that off to somebody else. That is not going to be somebody else's full-time job at an org. All of us are going to have to
an org. All of us are going to have to learn to manage our tokens. Well, if you were sitting there and you are you are the person who is responsible for the system prompt on an agent and you haven't pruned it in the last couple of weeks, what are you doing? If you
haven't sat there and gone line by line and said, you know what, a hundred of these lines I don't need anymore because they've been here since 3.5 and like I don't need them now. If you're sitting there and you're like, I don't know why we're loading this entire repo into the
context window. We just do it all the
context window. We just do it all the time and it seemed to work two generations ago but we never tested it.
That's just irresponsible. You need to be in a position where you are actually allowing the gains in model intelligence to lean out your context window. If you
want to look at the larger trend that we see in AI today, it is that we needed to frontload and be really specific about a lot of context for dumber models in
2025. And now that it's 2026, as the
2025. And now that it's 2026, as the models get more intelligent, we can lean out the context window initially because we can trust the model to retrieve better. So take that seriously. That is
better. So take that seriously. That is
something you can do that is practical to get ready for claude mythos. Don't
sleep on it. This is again if you're technical, these are million token decisions we're talking about, especially if you're running this agent over and over again. It adds up. Let me
give you a specific example that is based on the original beginner example with PDFs to show you the tangible difference in cost, right? And this is something that should cascade all the way across. If you don't believe me,
way across. If you don't believe me, this is real. Let's say you feed raw PDFs into context. Let's say it's a 100,000 tokens versus 5K like we talked about. Let's say it's a conversation
about. Let's say it's a conversation sprawl that takes 30 turns. I've seen
these like this is very realistic. And
let's say you use Opus 4.6 for everything including formatting, including proof reading, and you're making something over a 5 hour session where you're talking back and forth. You
might be spending roughly 800,000 to a million input tokens with maybe 150,000 to 200,000 of output tokens including thinking. $5 in and $25 out per million.
thinking. $5 in and $25 out per million.
you're spending eight to$10 dollars worth of compute which you might say you know what I can tolerate that or I got the unlimited plan or I don't care whatever but I want you to look at the difference because anytime you start to
get serious with AI you need to see the difference we talk about not being wasteful with artificial intelligence this is being wasteful you want to save water you want to save energy don't waste your tokens clean session same
work convert documents to markdown first start fresh conversations every 10 to 15 turns use opus for reasoning and sonnet for execution ution and haiku for polish and scope the context to what's needed
and over the same period of time you get the same result for 100 to 150,000 input tokens a lot less and maybe 50 to 80,000 output tokens you blend that across both
models and instead of costing8 to$10 in compute you spend a buck and you got the same amount in other words you got an 8 to10x reduction in cost now scale it right that sloppy user is burning 40 to
50 bucks in compute a week and the clean user is burning five to seven bucks a across a 10 person team on an API.
That's 2,000 bucks a month versus 250 bucks a month for the exact same result.
For subscription users, it's the difference between hitting your limit daily and then forgetting that limits exist because you just are so productive. Now, if you think this isn't
productive. Now, if you think this isn't serious, I want you to think about the cost structure for Mythos for a minute.
Mythos is rumored to be by far anthropic's most expensive model. I
think very strongly by April or May we are going to have a new class of pricing well above $525 range for tokens into maybe 10x that right imagine a world
where you are 10x what opus costs now $5 in $25 out for opus what if it's $50 in $250 out for opus well now things start to get serious now that eight or 10x
reduction on individual work for a day becomes something that you can actually measure and think about as a business and you imagine how big that gets When you start to work across a dev team, the mistakes you're making today were
tolerable because models were priced cheaply when cutting edge intelligence that you want comes out more expensive.
And I don't know the exact price, right?
I'm not saying it's 50 and 250. I'm
giving you a thought exercise. It might
be 10 and 50 instead. It's still the same point. The point is the model that
same point. The point is the model that you want is going to cost more. And as
models cost more, your mistakes scale.
Your mistakes scale with the price of intelligence. And make no mistake, the
intelligence. And make no mistake, the models will keep getting better. Every
quarter, every release, the trajectory is unambiguous. People who tell you the
is unambiguous. People who tell you the models are plateauing are lying. They
are lying to you. The models are getting much faster. And I do see that
much faster. And I do see that occasionally that people are insisting that the models aren't getting better.
It's not true by any measure out there.
And the people that I see insisting on it, I think they're insisting on it partly because they don't want to face the world as it will exist when AI is this good and continuing to accelerate
this fast. It's scary, right? We but we
this fast. It's scary, right? We but we should face it and we can all work through it together.
All right. I have built a stupid button.
That is my contribution to this discourse. I am building a stupid button
discourse. I am building a stupid button so you can check and see if you are using your context incorrectly. I want
to save you money. I want to save you hundreds of dollars. Please do not be stupid with your tokens. You know, if if you care about it, don't waste the water. Don't waste the electricity. If
water. Don't waste the electricity. If
you just care about the bottom line, also don't waste your bucks, right? We
should probably care about all of those things. If you want to know like what's
things. If you want to know like what's in Nate's stupid button, it's really simple. There's six questions that I'm
simple. There's six questions that I'm helping you answer. Number one, do you feed Claude raw PDFs and images when all you need is text? Is there something you are doing that is grossly inefficient as
far as tokens go? By the way, screenshots, terribly inefficient. It
would be much, much better just to copy and paste text. Convert to markdown always. Claude can do it really, really
always. Claude can do it really, really fast for you. Why not? Question two.
When was the last time you started a fresh conversation? Are you one of those
fresh conversation? Are you one of those people that keeps a conversation going forever? I swear the number of people
forever? I swear the number of people who keep their conversations going forever is highly correlated to the number of people who start experiencing symptoms of LLM psychosis. Why? Because
models drift over time. They were never intended for that long a conversation.
If you're having a longunning conversation, you're just in strange territory. When was the last time you
territory. When was the last time you started a fresh conversation? And why is that? Again, every time you take a turn
that? Again, every time you take a turn in a conversation, you read it as sending one line back. But Claude or Chad GPT or Gemini reads it as sending the entire conversation back. And if
you're wondering, is this something that's just for Claude? Nate's talking
about Claude a lot. No, it's for Chad GPT, it's for Gemini, it's for Llama, it's for any LLM you're using. It's for
Quen. This is how LLMs work. Don't waste
it. Question three, are you using the most expensive model for everything? Are
you using Opus? Are you using 5.4 on pro mode? Whatever your choice is, are you
mode? Whatever your choice is, are you picking the most expensive model and just blindly using it regardless when the cheaper model may work better? This
is especially important if you have production workloads, but it's also true for all of us. Like, if you're doing something that's a simple formatting task, don't depend on Opus for it. Don't
depend on 5.4 for it. Use the models for what they're designed for. Don't bring a Ferrari to the grocery store. Question
four, do you know what's loading in context before you even type? You can
actually find this out. You can run slashcontext in cla code. By the way, you could look at the number of things that are loading. If you're in cloud code, if you don't know what that means, you can go to your Chad GBT or your
cloud. You can see how many connectors
cloud. You can see how many connectors you have available. You can see how many you've loaded up. You could be loading tens of thousands of tokens that you're not really aware of and not really
using. If you enable Google Drive months
using. If you enable Google Drive months ago and you never never ever use Google Drive, you just thought it was cool on the day it launched. Why? just drop it.
There are so many examples like that where we see something cool, we add it, and we forget it's there. It's like a barnacle on a ship. It's going to slow you down. It's going to burn tokens. You
you down. It's going to burn tokens. You
don't need to have it. Audit. Audit your
plugins. It matters. Next question. API
builders, are you caching stable context so you don't reuse it? Prompt caching
can give you a 90% discount on repeated content. Right. Cash hits on Opus cost
content. Right. Cash hits on Opus cost 50 cents per million versus $5 per million standard. It makes a difference.
million standard. It makes a difference.
Do not sit there and ignore prompt caching. Take it seriously. If your
caching. Take it seriously. If your
system prompt, your tool definitions, your reference documents aren't cached, what are you doing? This is not advanced stuff in 2026. You should just be doing it. In the last question, the stupid
it. In the last question, the stupid button test for this is a real button.
By the way, I really built a stupid button. How are you handling web search?
button. How are you handling web search?
Are you letting Claude do web research the expensive way? People don't realize this, but if you call perplexity for a search, it tends to be much more token cheap than using claude natively. Now,
Claude is addressing this. There are
lots of ways to do claude search. You
can actually use Claude to navigate through a browser. You can also directly search in the terminal and it will spin up something in the background that's a service and you can call something in like an MCP connector for perplexity.
All different options you can use. This
is broadly true. It's not just true for cloud. It's true for Chad GPT. is true
cloud. It's true for Chad GPT. is true
for Gemini, etc. because MCP is magic.
But if you are trying to do search, the larger point is that you should be doing search as cheaply as possible. If you
just want quick results that are token efficient, it may be worth it to take the time to spin up an MCP and just have a dedicated service that just returns the search results. That's what I have
found experimentally with perplexity and claude is that perplexity tends to burn something like 10 to 50,000 less tokens per search which is not a small number
if you're doing complex search and it tends to be five times faster and it has structured citations. So this is not
structured citations. So this is not meant to be a perplexity plug. It's just
a token management plug. Try it for yourself. But I got to say I like
yourself. But I got to say I like faster. I like citations. I like less
faster. I like citations. I like less tokens over a researchheavy session like a plug-in like that can save you a lot on the token side. And that's a larger
call out. Like if you have ways to look
call out. Like if you have ways to look at your token usage and to diagnose it, you're going to be smarter about it. And
that's the whole point of the stupid button is like let's not fly blind here.
Let's look at our actual token usage and let's actually make some good choices and let's optimize it. Now what's in this stupid button? Number one, there is a prompt. If you've never done this, if
a prompt. If you've never done this, if you're like, "What is an MCP server?" We
got a prompt for you, right? A prompt
you can run against your recent conversations that actually identifies the specific dumb things you specifically are doing. Like it will see which documents you're feeding raw. It
will see your conversation spraw. It
will look at model misuse. It will look at redundant context loading. It looks
at your actual patterns and it will tell you what to fix first. So that's the easy version, right? Anyone can use it.
Any plan, no setup required. Number two,
a skill. This is an invocable skill that audits your cloud code or your desktop environment or any other environment. It
could be it could be chat GPT etc. Skills are also translatable and it measures your per session token overhead. It will flag system prompt
overhead. It will flag system prompt load. It will check your plug-in and
load. It will check your plug-in and your skill loading. It will give you a before and after before you make changes. Think of it as like you kind of
changes. Think of it as like you kind of need a gas tank for your tokens and gee, wouldn't it be nice to have one, right?
So, it's like the gas tank skill. Number
three, we built some guardrails. So
guardrails will sit directly on your knowledge store. So if you're an open
knowledge store. So if you're an open brain person, which is something we've been doing as a community, it will sit right on your open brain and you will stop burning tokens on input, which is a
nice touch, right? Automatic markdown
conversion for documents that are hitting the store. Index first retrieval instead of just dump and search. Uh
context scoping that enables a sort of minimum viable context for the query.
This is where token management stops just being a personal discipline and it becomes infrastructure that starts to maintain itself. And I think I'm really
maintain itself. And I think I'm really excited to see how the community continues to build on this because open brain is open source and we'll keep evolving it and improving it. But I
wanted to make sure that we had rails that ensured we have responsible token usage for the open brain community. So
look, I'm going to close by talking briefly about agents and context because agents burn hundreds of millions of tokens in some cases. We don't want to leave them out. How do we think about context management for agents? And I'm
going to give you five commandments. I
call it the keep it simple stupid commandments for agents. Number one,
index your references. Right? If an
agent is getting raw documents instead of relevant trunks, you've already failed. The entire point of retrieval is
failed. The entire point of retrieval is to scope what the model sees to what it needs. Dumping a full document set into
needs. Dumping a full document set into the window on every agent call is wildly irresponsible. You can't do that just to
irresponsible. You can't do that just to give the agent context. Don't make the agent do work it doesn't need to do.
Number two, please prepare your context for consumption. Pre-process,
for consumption. Pre-process, pre-summarized, pre-chunk it. A
reference document should arrive in an agent's context, ready to be used, not ready to be read or processed. If the
model's first several thousand tokens of reasoning are just spent dealing with the crappy pre-processing you did, you're not being a responsible agent builder. Number three, this is something
builder. Number three, this is something we've mentioned before. I'm calling it out in the context of agents because it's so important for agent workflows.
Please, please, please cash your stable context. System prompts, tool
context. System prompts, tool definitions, persona instructions, reference material, anything that is stable, all should be cashed at a 90% discount on cash hits. This is the
lowest effort, highest impact optimization that you have on the table.
If you're making thousands of agent calls a day and you're not cashing, it's just pouring money down the drain.
Number four, scope every agent's contact to the minimum it needs. Right? A
planning agent does not need your full codebase. Don't give it the full
codebase. Don't give it the full codebase. An editing agent doesn't need
codebase. An editing agent doesn't need your project roadmap. Don't give it the project roadmap. You get the idea,
project roadmap. You get the idea, right? Passing everything to every agent
right? Passing everything to every agent is architectural laziness and it has real costs both in tokens burn and frankly in degraded agent performance.
Models perform worse when they're drowning in a relevant context. And by
the way, if you're like, I'm not sure what the agent will need. Aren't the
smarter agents supposed to find it? The
answer is yes. But you will only do that efficiently if you give them a searchable repo that is pre-processed so they can go and get only the relevant slice of context. So take the time to do
it right. Number five, measure what you
it right. Number five, measure what you burn. If you don't know your per call
burn. If you don't know your per call token cost, you're just optimizing without any information. Right? Please
instrument your agent calls. Track your
input tokens. Track your output tokens.
Track your overall model mix and your cost ratio. You cannot improve what you
cost ratio. You cannot improve what you do not measure. And most teams building agentic systems are thinking a lot about whether they are semantically correct, not whether they're functionally
correct. There's a big difference. And
correct. There's a big difference. And
they're thinking a lot about optimizing their system prompt. They're not
thinking a ton about their model cost because most of the time the model cost is not what makes the project live or die. And I get that in this age in 2025,
die. And I get that in this age in 2025, early 2026, with the cost we have today and the urgency from executives to build the $12 per run cost or whatever it's going to be is not going to make or
break the ship. But plan for a world where the models are more expensive.
Plan for a world where you have to scale up. Plan for a world where you have to
up. Plan for a world where you have to be responsible and instrument. Now,
stepping back, there's a cultural problem we need to acknowledge behind all of this. At some point in the last few months, burning tokens has become a badge of honor. And I get it. There is a
degree to which you need to be burning tokens in order to do meaningful work in the age of AI. None of this is to say that I expect token consumption to go down. It won't. You need to be ready to
down. It won't. You need to be ready to burn those tokens. This is not an ask that you not do that. This is an ask that you do it efficiently. And so when Jensen sits there on stage and says $250,000 in token costs per developer
and everyone like is shocked or rolls their eyes or whatever the reaction is, my reaction is I hope it's 250 grand in smart token costs. It's not the individual dollar amount for Jensen because he's got cash in the bank. It's
whether the tokens were used well. It's
whether it's smart tokens. So begin to think to yourself, yes, I need to be maxing out my cloud. There are people who like go into withdrawal when they don't get to use their cloud. I know
people like that who are like, "Ah, I went to a movie and uh I couldn't use my cloud for a few hours. I feel like I missed out on my token limit." Touch
some grass. It's going to be okay. But
use your tokens well. Be efficient with your token usage. Know what you're spending it on. Don't spend it on silly stuff. Don't spend it on the PDFs that
stuff. Don't spend it on the PDFs that you have to convert. Actually spend it on meaningful work. And that is something that is a human problem. We
need to be bold and audacious. These
models are really good at stuff. So,
let's get more bold, more audacious, and think bigger about what we can aim them at. Because if we can be more efficient,
at. Because if we can be more efficient, we can do a whole lot more cool and creative stuff with those tokens. That's
why I built the internet a stupid bug.
Loading video analysis...