LongCut logo

Your Claude Limit Burns In 90 Minutes Because Of One ChatGPT Habit.

By AI News & Strategy Daily | Nate B Jones

Summary

Topics Covered

  • Models Are Not Expensive, Your Habits Are
  • The 20x Token Savings from Markdown Conversion
  • Frontier AI Can Be Absurdly Cheap
  • Don't Bring a Ferrari to the Grocery Store
  • Agents Must Index References Before Retrieval

Full Transcript

The next generation of models is likely to drop in the next one to two months.

I'm talking about Claude Mythos. I'm

talking about whatever Chad GPT drops next. I'm talking about the next Gemini

next. I'm talking about the next Gemini model. They will be more expensive, a

model. They will be more expensive, a lot more expensive because they're all trained on much more expensive chips, the GB300 series from Nvidia, and it's just going to get more expensive from there. The intelligence we're going to

there. The intelligence we're going to get, the ambient compute all around us that is essentially free intelligence is going to be the dumber models. That's

just how it is. If you want to use cutting edge models, you have got to stop burning tokens and blaming the model. And that is the theme for this

model. And that is the theme for this video. If you're in a position where

video. If you're in a position where you're wondering how much token usage you have or how expensive your AI is or whether you're using too many tokens for your AI or how you can even measure

that, how you can get better at it. That

is what this is. And that is going to be one of the most valuable skills on the planet. by the way, because you do not

planet. by the way, because you do not want to be in a position where you are putting $250,000 a year. A real number that JSON Huang gave in a real interview for what he expects an actual individual

engineer to spend in a year on tokens.

You don't want to be the person spending 250 grand on tokens you don't have to be spending on. You want to be smart. And I

spending on. You want to be smart. And I

am going to give you a specific example.

This is real life example. A real person I know gave me permission to use this. I

recently saw a production AI pipeline that ingests multiple long- form conversations per user, runs an analysis across dozens of dimensions and generates a fully personalized output

all on the most expensive models that money can buy. Not because the person wants to use expensive models, but because he tested it and what he found was that the better models produce the results he needs for this business. The

cost per user less than a quarter, less than 25 cents per user for that. Most of

us are spending more than we need to on AI and this is a video about that. You

can be really smart, use really good cutting edge AI and you can be intelligent with your token usage and not spend a ton of money. If you want to know what that's like, keep on watching because we're going to get into specific

strategies and I'm going to show you what I built so that we can actually make this easier for everybody so it's not just a guessing game anymore. The

takeaway is that Frontier AI can be absurdly cheap when you know what you're doing. Essentially, the models are not

doing. Essentially, the models are not expensive. It's your habits that cost a

expensive. It's your habits that cost a lot. And with cloud usage limits

lot. And with cloud usage limits dominating everything in the last week, I think it's worth having that conversation. So, let's get to it. I've

conversation. So, let's get to it. I've

made the case we can use our models better. What are the specific habits we

better. What are the specific habits we can change? I want to name specific

can change? I want to name specific habits that I have seen in conversations with others, looking over shoulders, reading GitHub repos, listening to conversations online. These are specific

conversations online. These are specific examples that are patterns I see over and over again. And the first one is the rookies. The folks who are new to

rookies. The folks who are new to cutting edge, you know what you bleed out on in tokens? You bleed out on document ingestion. This one drives me

document ingestion. This one drives me crazy because it's so so easy to fix. A

brand new Cloud Desktop user might drag in three PDFs into a conversation that might be 1500 words each, which is just 4,500 words of text. It's not that long.

And they say, "Summarize these." and

Claude processes the raw PDFs with all the formatting overhead that goes with that, the headers, the footers, the embedded fonts, the layout metadata, and the entire binary structure gets encoded

as tokens. And so the 4500 words of

as tokens. And so the 4500 words of content can become a 100 plus thousand tokens if you're not careful. All you

have to do to avoid that is just think in terms of markdown. If you just ask Claude or frankly go to any of a number of services on the internet that are free and say please convert to markdown.

It will just do it right. It will just take 10 seconds and convert to markdown.

And then you have a very clean set of content that's between four and 6,000 tokens. And that's like saving you 20x

tokens. And that's like saving you 20x on the memory. And this waste just compounds, right? Because once those

compounds, right? Because once those 100,000 tokens are in your conversation history, they bounce back and forth and bounce back and forth. And this is how you fill up your token window. and you

wonder how other people get so much done. Please, please, please, if you're

done. Please, please, please, if you're new to AI or if you've never thought about it, think about the file formats you're throwing because so many of these file formats are designed to be human

readable. They're not designed to be AI

readable. They're not designed to be AI readable. Think about the token

readable. Think about the token efficiency of these file formats. And if

you're wondering, well, how do I convert to markdown? I built something for you

to markdown? I built something for you because all you have to do is just ingest a file. You you hit transform and it just converts it back into into markdown. That's it. And we have a

markdown. That's it. And we have a number of file types. We're adding more from the community all the time. It's

part of the open brain ecosystem. It's

just a plugin you can put in and it will just convert it to markdown. But that's

not the only way. You can tell Claude to do it directly. You can also just directly do it on the internet with any of a number of free web services.

Markdown conversion should not be gated.

It just it's super easy to do. Tokens

are designed to preserve everything in an original text. If you wanted to reason about the style of the PDF, fine, keep it. But 99% of the time, all you

keep it. But 99% of the time, all you care about is the text. You just want it in markdown. Please, please, please

in markdown. Please, please, please think about your file formats. Next big

mistake that people make, and this one comes a little bit after people tend to convert to markdown and start to understand how some of these initial documents work. Please do not sprawl

documents work. Please do not sprawl your conversations. If you were doing

your conversations. If you were doing 20, 30, 40 turns on a conversation, no AI was reinforcement learned, trained, or designed to handle that kind of

sprawl. All you're doing is compressing

sprawl. All you're doing is compressing the ratio of the conversation where the original instructions happened. And yes,

the models are getting better and better and better at anchoring on and remembering those original instructions even when they go through compression.

But why make them suffer? Why make

yourself suffer by filling up the context window with croft? Why waste

tokens? Why not just ask for what you want upfront? And if you're going to

want upfront? And if you're going to have an evolving exchange or evolving conversation, clearly market at the top as our goal here is to evolve and reach

a conclusion together. And then you have a light conversation that goes 20 or 30 turns and say, "Thank you. I've got a conclusion. Please summarize this." And

conclusion. Please summarize this." And

then you go and do real work. I see so many people trying to mix together modes, but AI is really designed for single turn do a lot of heavy work more and more and in that context you need to do the thinking in advance and bring

that to the table and if you need to think with AI that should be in a separate chat, separate conversation. It

might even be a separate model. It might

be three separate models and you're bringing all of that in. I do that all the time. I'm like, okay, I want to look

the time. I'm like, okay, I want to look through what communities are thinking about AI on X. I'm going to go to Grock for that or I'm going to go through and look at what earnings reports are saying about the state of AI and capital investment. I'm going to go and pipe

investment. I'm going to go and pipe that through chat GPT thinking mode and get a bunch of reports out on that. Or

I'm going to go through perplexity research and get a bunch of reports out on that. Now I'm going to go and have a

on that. Now I'm going to go and have a look at what some major blog posts have to say about a particular AI topic. I'll

just go to Claude Opus 4.6. We'll do a targeted web search. We'll go back through. We'll make sure we understand

through. We'll make sure we understand what we're looking at. None of that is intended to be a single answer, right?

These are all evolving conversations.

Once I get what I want out of each of these individual threads, I can pull them together and say, "Okay, now I have a piece of work to do. Now I have something I actually need done and I have all the context needed." So you

should have two modes here. You should

have a mode where you are trying to gather information and a mode where you are trying to focus and get work done.

Do not mix the two together. That is how you burn tokens. That is how you confuse the AI. Your objective when you want the

the AI. Your objective when you want the AI to do real work should be to be so clear that the AI needs to do nothing else and it just goes and gets the work

done and comes back. It should be that clear. If you are an intermediate user

clear. If you are an intermediate user and you are like, I know this stuff, Nate, well, let me give you another tip you may not know. the people who are adding lots of plugins to their chat GPT or their cloud instances, you are paying a tax every time you start a

conversation because in the background, those are going to be loaded in and they're going to start to fill the context window. I know someone who

context window. I know someone who shared with me that they are over 50,000 tokens in on a context window before they type the first word because they

actually load that many plugins and connectors. You don't need that much.

connectors. You don't need that much.

You know what that's like? That is like walking in to a fully functional tool workshop and the first thing you do instead of leaving the tools on the walls is you go and get all the tools off and you lay them out on the

workbench and you say, "Okay, now we're going to do, I don't know, we're going to do something. We're going to make a bench." Do you need all 200 tools in the

bench." Do you need all 200 tools in the workshop to make the bench? No. You

probably need the right five. Think

about that the next time you have an approach to tooling. Because so many people, we we hear about this new plugin, we hear about this new connector, someone hypes it up, we say

we need to add it, and we don't realize it's a silent tax for the rest of time.

Every time we have a conversation, and it just adds that little bit, it adds a thousand tokens, it adds 2,000 tokens, whatever it does, and it just adds it always. Do you want to pay that for the

always. Do you want to pay that for the model? Maybe you should think more

model? Maybe you should think more strategically about which plugins and connectors are really adding value for you because they can. like they can be tremendously valuable, but make sure you know which ones you really want because

if you don't, then you're going to be looking at dozens of plugins that you don't really need that are supposed to add value, but just add a bunch of croft, a bunch of junk into your context

window and confuse the model and keep it from doing good work and maybe confuse it as to which tools it's supposed to use. Now, I'm saving the most expensive

use. Now, I'm saving the most expensive and the most advanced users for last because this is where the leverage lies.

If you are an advanced user, if you are someone who's like, "Send me to the GitHub repo. I can just do this myself.

GitHub repo. I can just do this myself.

Let me install OpenClaw on my Mac Mini.

I'm okay managing the gateway. I can be secure." This is for you. You have the

secure." This is for you. You have the most leverage of anybody out there in terms of how many tokens you use. And

typically speaking, your mistakes are the most expensive ones because if you screw up, you're screwing up at a level of hundreds of thousands or millions of tokens, maybe more. And the reason why is simple. You are doing bigger projects

is simple. You are doing bigger projects with AI. And when you do big projects

with AI. And when you do big projects with AI, your ability to leverage AI effectively becomes one of the most critical things you can do to manage ROI and cost on a particular project. It is

a job skill at that level. If you're

technical enough to go to a GitHub, you have a job skill to manage tokens efficiently. And you cannot pass that

efficiently. And you cannot pass that off to somebody else. That is not going to be somebody else's full-time job at an org. All of us are going to have to

an org. All of us are going to have to learn to manage our tokens. Well, if you were sitting there and you are you are the person who is responsible for the system prompt on an agent and you haven't pruned it in the last couple of weeks, what are you doing? If you

haven't sat there and gone line by line and said, you know what, a hundred of these lines I don't need anymore because they've been here since 3.5 and like I don't need them now. If you're sitting there and you're like, I don't know why we're loading this entire repo into the

context window. We just do it all the

context window. We just do it all the time and it seemed to work two generations ago but we never tested it.

That's just irresponsible. You need to be in a position where you are actually allowing the gains in model intelligence to lean out your context window. If you

want to look at the larger trend that we see in AI today, it is that we needed to frontload and be really specific about a lot of context for dumber models in

2025. And now that it's 2026, as the

2025. And now that it's 2026, as the models get more intelligent, we can lean out the context window initially because we can trust the model to retrieve better. So take that seriously. That is

better. So take that seriously. That is

something you can do that is practical to get ready for claude mythos. Don't

sleep on it. This is again if you're technical, these are million token decisions we're talking about, especially if you're running this agent over and over again. It adds up. Let me

give you a specific example that is based on the original beginner example with PDFs to show you the tangible difference in cost, right? And this is something that should cascade all the way across. If you don't believe me,

way across. If you don't believe me, this is real. Let's say you feed raw PDFs into context. Let's say it's a 100,000 tokens versus 5K like we talked about. Let's say it's a conversation

about. Let's say it's a conversation sprawl that takes 30 turns. I've seen

these like this is very realistic. And

let's say you use Opus 4.6 for everything including formatting, including proof reading, and you're making something over a 5 hour session where you're talking back and forth. You

might be spending roughly 800,000 to a million input tokens with maybe 150,000 to 200,000 of output tokens including thinking. $5 in and $25 out per million.

thinking. $5 in and $25 out per million.

you're spending eight to$10 dollars worth of compute which you might say you know what I can tolerate that or I got the unlimited plan or I don't care whatever but I want you to look at the difference because anytime you start to

get serious with AI you need to see the difference we talk about not being wasteful with artificial intelligence this is being wasteful you want to save water you want to save energy don't waste your tokens clean session same

work convert documents to markdown first start fresh conversations every 10 to 15 turns use opus for reasoning and sonnet for execution ution and haiku for polish and scope the context to what's needed

and over the same period of time you get the same result for 100 to 150,000 input tokens a lot less and maybe 50 to 80,000 output tokens you blend that across both

models and instead of costing8 to$10 in compute you spend a buck and you got the same amount in other words you got an 8 to10x reduction in cost now scale it right that sloppy user is burning 40 to

50 bucks in compute a week and the clean user is burning five to seven bucks a across a 10 person team on an API.

That's 2,000 bucks a month versus 250 bucks a month for the exact same result.

For subscription users, it's the difference between hitting your limit daily and then forgetting that limits exist because you just are so productive. Now, if you think this isn't

productive. Now, if you think this isn't serious, I want you to think about the cost structure for Mythos for a minute.

Mythos is rumored to be by far anthropic's most expensive model. I

think very strongly by April or May we are going to have a new class of pricing well above $525 range for tokens into maybe 10x that right imagine a world

where you are 10x what opus costs now $5 in $25 out for opus what if it's $50 in $250 out for opus well now things start to get serious now that eight or 10x

reduction on individual work for a day becomes something that you can actually measure and think about as a business and you imagine how big that gets When you start to work across a dev team, the mistakes you're making today were

tolerable because models were priced cheaply when cutting edge intelligence that you want comes out more expensive.

And I don't know the exact price, right?

I'm not saying it's 50 and 250. I'm

giving you a thought exercise. It might

be 10 and 50 instead. It's still the same point. The point is the model that

same point. The point is the model that you want is going to cost more. And as

models cost more, your mistakes scale.

Your mistakes scale with the price of intelligence. And make no mistake, the

intelligence. And make no mistake, the models will keep getting better. Every

quarter, every release, the trajectory is unambiguous. People who tell you the

is unambiguous. People who tell you the models are plateauing are lying. They

are lying to you. The models are getting much faster. And I do see that

much faster. And I do see that occasionally that people are insisting that the models aren't getting better.

It's not true by any measure out there.

And the people that I see insisting on it, I think they're insisting on it partly because they don't want to face the world as it will exist when AI is this good and continuing to accelerate

this fast. It's scary, right? We but we

this fast. It's scary, right? We but we should face it and we can all work through it together.

All right. I have built a stupid button.

That is my contribution to this discourse. I am building a stupid button

discourse. I am building a stupid button so you can check and see if you are using your context incorrectly. I want

to save you money. I want to save you hundreds of dollars. Please do not be stupid with your tokens. You know, if if you care about it, don't waste the water. Don't waste the electricity. If

water. Don't waste the electricity. If

you just care about the bottom line, also don't waste your bucks, right? We

should probably care about all of those things. If you want to know like what's

things. If you want to know like what's in Nate's stupid button, it's really simple. There's six questions that I'm

simple. There's six questions that I'm helping you answer. Number one, do you feed Claude raw PDFs and images when all you need is text? Is there something you are doing that is grossly inefficient as

far as tokens go? By the way, screenshots, terribly inefficient. It

would be much, much better just to copy and paste text. Convert to markdown always. Claude can do it really, really

always. Claude can do it really, really fast for you. Why not? Question two.

When was the last time you started a fresh conversation? Are you one of those

fresh conversation? Are you one of those people that keeps a conversation going forever? I swear the number of people

forever? I swear the number of people who keep their conversations going forever is highly correlated to the number of people who start experiencing symptoms of LLM psychosis. Why? Because

models drift over time. They were never intended for that long a conversation.

If you're having a longunning conversation, you're just in strange territory. When was the last time you

territory. When was the last time you started a fresh conversation? And why is that? Again, every time you take a turn

that? Again, every time you take a turn in a conversation, you read it as sending one line back. But Claude or Chad GPT or Gemini reads it as sending the entire conversation back. And if

you're wondering, is this something that's just for Claude? Nate's talking

about Claude a lot. No, it's for Chad GPT, it's for Gemini, it's for Llama, it's for any LLM you're using. It's for

Quen. This is how LLMs work. Don't waste

it. Question three, are you using the most expensive model for everything? Are

you using Opus? Are you using 5.4 on pro mode? Whatever your choice is, are you

mode? Whatever your choice is, are you picking the most expensive model and just blindly using it regardless when the cheaper model may work better? This

is especially important if you have production workloads, but it's also true for all of us. Like, if you're doing something that's a simple formatting task, don't depend on Opus for it. Don't

depend on 5.4 for it. Use the models for what they're designed for. Don't bring a Ferrari to the grocery store. Question

four, do you know what's loading in context before you even type? You can

actually find this out. You can run slashcontext in cla code. By the way, you could look at the number of things that are loading. If you're in cloud code, if you don't know what that means, you can go to your Chad GBT or your

cloud. You can see how many connectors

cloud. You can see how many connectors you have available. You can see how many you've loaded up. You could be loading tens of thousands of tokens that you're not really aware of and not really

using. If you enable Google Drive months

using. If you enable Google Drive months ago and you never never ever use Google Drive, you just thought it was cool on the day it launched. Why? just drop it.

There are so many examples like that where we see something cool, we add it, and we forget it's there. It's like a barnacle on a ship. It's going to slow you down. It's going to burn tokens. You

you down. It's going to burn tokens. You

don't need to have it. Audit. Audit your

plugins. It matters. Next question. API

builders, are you caching stable context so you don't reuse it? Prompt caching

can give you a 90% discount on repeated content. Right. Cash hits on Opus cost

content. Right. Cash hits on Opus cost 50 cents per million versus $5 per million standard. It makes a difference.

million standard. It makes a difference.

Do not sit there and ignore prompt caching. Take it seriously. If your

caching. Take it seriously. If your

system prompt, your tool definitions, your reference documents aren't cached, what are you doing? This is not advanced stuff in 2026. You should just be doing it. In the last question, the stupid

it. In the last question, the stupid button test for this is a real button.

By the way, I really built a stupid button. How are you handling web search?

button. How are you handling web search?

Are you letting Claude do web research the expensive way? People don't realize this, but if you call perplexity for a search, it tends to be much more token cheap than using claude natively. Now,

Claude is addressing this. There are

lots of ways to do claude search. You

can actually use Claude to navigate through a browser. You can also directly search in the terminal and it will spin up something in the background that's a service and you can call something in like an MCP connector for perplexity.

All different options you can use. This

is broadly true. It's not just true for cloud. It's true for Chad GPT. is true

cloud. It's true for Chad GPT. is true

for Gemini, etc. because MCP is magic.

But if you are trying to do search, the larger point is that you should be doing search as cheaply as possible. If you

just want quick results that are token efficient, it may be worth it to take the time to spin up an MCP and just have a dedicated service that just returns the search results. That's what I have

found experimentally with perplexity and claude is that perplexity tends to burn something like 10 to 50,000 less tokens per search which is not a small number

if you're doing complex search and it tends to be five times faster and it has structured citations. So this is not

structured citations. So this is not meant to be a perplexity plug. It's just

a token management plug. Try it for yourself. But I got to say I like

yourself. But I got to say I like faster. I like citations. I like less

faster. I like citations. I like less tokens over a researchheavy session like a plug-in like that can save you a lot on the token side. And that's a larger

call out. Like if you have ways to look

call out. Like if you have ways to look at your token usage and to diagnose it, you're going to be smarter about it. And

that's the whole point of the stupid button is like let's not fly blind here.

Let's look at our actual token usage and let's actually make some good choices and let's optimize it. Now what's in this stupid button? Number one, there is a prompt. If you've never done this, if

a prompt. If you've never done this, if you're like, "What is an MCP server?" We

got a prompt for you, right? A prompt

you can run against your recent conversations that actually identifies the specific dumb things you specifically are doing. Like it will see which documents you're feeding raw. It

will see your conversation spraw. It

will look at model misuse. It will look at redundant context loading. It looks

at your actual patterns and it will tell you what to fix first. So that's the easy version, right? Anyone can use it.

Any plan, no setup required. Number two,

a skill. This is an invocable skill that audits your cloud code or your desktop environment or any other environment. It

could be it could be chat GPT etc. Skills are also translatable and it measures your per session token overhead. It will flag system prompt

overhead. It will flag system prompt load. It will check your plug-in and

load. It will check your plug-in and your skill loading. It will give you a before and after before you make changes. Think of it as like you kind of

changes. Think of it as like you kind of need a gas tank for your tokens and gee, wouldn't it be nice to have one, right?

So, it's like the gas tank skill. Number

three, we built some guardrails. So

guardrails will sit directly on your knowledge store. So if you're an open

knowledge store. So if you're an open brain person, which is something we've been doing as a community, it will sit right on your open brain and you will stop burning tokens on input, which is a

nice touch, right? Automatic markdown

conversion for documents that are hitting the store. Index first retrieval instead of just dump and search. Uh

context scoping that enables a sort of minimum viable context for the query.

This is where token management stops just being a personal discipline and it becomes infrastructure that starts to maintain itself. And I think I'm really

maintain itself. And I think I'm really excited to see how the community continues to build on this because open brain is open source and we'll keep evolving it and improving it. But I

wanted to make sure that we had rails that ensured we have responsible token usage for the open brain community. So

look, I'm going to close by talking briefly about agents and context because agents burn hundreds of millions of tokens in some cases. We don't want to leave them out. How do we think about context management for agents? And I'm

going to give you five commandments. I

call it the keep it simple stupid commandments for agents. Number one,

index your references. Right? If an

agent is getting raw documents instead of relevant trunks, you've already failed. The entire point of retrieval is

failed. The entire point of retrieval is to scope what the model sees to what it needs. Dumping a full document set into

needs. Dumping a full document set into the window on every agent call is wildly irresponsible. You can't do that just to

irresponsible. You can't do that just to give the agent context. Don't make the agent do work it doesn't need to do.

Number two, please prepare your context for consumption. Pre-process,

for consumption. Pre-process, pre-summarized, pre-chunk it. A

reference document should arrive in an agent's context, ready to be used, not ready to be read or processed. If the

model's first several thousand tokens of reasoning are just spent dealing with the crappy pre-processing you did, you're not being a responsible agent builder. Number three, this is something

builder. Number three, this is something we've mentioned before. I'm calling it out in the context of agents because it's so important for agent workflows.

Please, please, please cash your stable context. System prompts, tool

context. System prompts, tool definitions, persona instructions, reference material, anything that is stable, all should be cashed at a 90% discount on cash hits. This is the

lowest effort, highest impact optimization that you have on the table.

If you're making thousands of agent calls a day and you're not cashing, it's just pouring money down the drain.

Number four, scope every agent's contact to the minimum it needs. Right? A

planning agent does not need your full codebase. Don't give it the full

codebase. Don't give it the full codebase. An editing agent doesn't need

codebase. An editing agent doesn't need your project roadmap. Don't give it the project roadmap. You get the idea,

project roadmap. You get the idea, right? Passing everything to every agent

right? Passing everything to every agent is architectural laziness and it has real costs both in tokens burn and frankly in degraded agent performance.

Models perform worse when they're drowning in a relevant context. And by

the way, if you're like, I'm not sure what the agent will need. Aren't the

smarter agents supposed to find it? The

answer is yes. But you will only do that efficiently if you give them a searchable repo that is pre-processed so they can go and get only the relevant slice of context. So take the time to do

it right. Number five, measure what you

it right. Number five, measure what you burn. If you don't know your per call

burn. If you don't know your per call token cost, you're just optimizing without any information. Right? Please

instrument your agent calls. Track your

input tokens. Track your output tokens.

Track your overall model mix and your cost ratio. You cannot improve what you

cost ratio. You cannot improve what you do not measure. And most teams building agentic systems are thinking a lot about whether they are semantically correct, not whether they're functionally

correct. There's a big difference. And

correct. There's a big difference. And

they're thinking a lot about optimizing their system prompt. They're not

thinking a ton about their model cost because most of the time the model cost is not what makes the project live or die. And I get that in this age in 2025,

die. And I get that in this age in 2025, early 2026, with the cost we have today and the urgency from executives to build the $12 per run cost or whatever it's going to be is not going to make or

break the ship. But plan for a world where the models are more expensive.

Plan for a world where you have to scale up. Plan for a world where you have to

up. Plan for a world where you have to be responsible and instrument. Now,

stepping back, there's a cultural problem we need to acknowledge behind all of this. At some point in the last few months, burning tokens has become a badge of honor. And I get it. There is a

degree to which you need to be burning tokens in order to do meaningful work in the age of AI. None of this is to say that I expect token consumption to go down. It won't. You need to be ready to

down. It won't. You need to be ready to burn those tokens. This is not an ask that you not do that. This is an ask that you do it efficiently. And so when Jensen sits there on stage and says $250,000 in token costs per developer

and everyone like is shocked or rolls their eyes or whatever the reaction is, my reaction is I hope it's 250 grand in smart token costs. It's not the individual dollar amount for Jensen because he's got cash in the bank. It's

whether the tokens were used well. It's

whether it's smart tokens. So begin to think to yourself, yes, I need to be maxing out my cloud. There are people who like go into withdrawal when they don't get to use their cloud. I know

people like that who are like, "Ah, I went to a movie and uh I couldn't use my cloud for a few hours. I feel like I missed out on my token limit." Touch

some grass. It's going to be okay. But

use your tokens well. Be efficient with your token usage. Know what you're spending it on. Don't spend it on silly stuff. Don't spend it on the PDFs that

stuff. Don't spend it on the PDFs that you have to convert. Actually spend it on meaningful work. And that is something that is a human problem. We

need to be bold and audacious. These

models are really good at stuff. So,

let's get more bold, more audacious, and think bigger about what we can aim them at. Because if we can be more efficient,

at. Because if we can be more efficient, we can do a whole lot more cool and creative stuff with those tokens. That's

why I built the internet a stupid bug.

Loading...

Loading video analysis...