Nvidia GTC26 Financial Analyst Q&A
By NoRush Invest
Summary
Topics Covered
- The Third Inflection Point Is Agentic AI
- Token Budgets Turn Computers Into Manufacturing Equipment
- Every Company Needs an OpenClaw Strategy
- NVIDIA Has Strong Visibility of $1 Trillion Demand
- Inference Will Consume 99% of Future Compute
Full Transcript
All right, good morning everybody. I hope
you enjoyed uh the presentation uh yesterday went a little bit longer, but I think uh it was an absolutely great uh summary for us. But we're going to take this time to focus on your needs and
some of the additional kind of questions you are. We're going to start with a
you are. We're going to start with a couple uh maybe the first slide or so and then we'll open it up for questions and I'm going to turn this over to Jensen with that.
Yeah, as I was saying yesterday, there there were three inflection points in in recent AI. The first one was generative
recent AI. The first one was generative AI. The second was reasoning and we're
AI. The second was reasoning and we're at the third inflection point now and each one builds on the others. Um
there's a lot of technical reasons why each one of them built on the others. Uh
but here we are with uh the third inflection point which is agentic systems. Agentic systems that are able to operate autonomously. That's why they call them agentic because they're they
have agency and you can give them goals and instead of just answering questions they can now perform tasks and tasks could be anything from of
course uh one of the most popular applications of agentic systems is write software and you know engineers in your company I'm sure and engineers in in my
company for sure are using agentic systems u all all day long and and uh what used to What used to be a thing for
for uh engineers is you know when you come to work they give you a laptop. Now
when you come to work they give you laptop and tokens and and token budget is now a real thing. Every engineer is going to have a
thing. Every engineer is going to have a token budget and you know the idea that you would hire a $300,000 engineer and and they they spend no tokens in doing
their job. You you got to ask the
their job. You you got to ask the question what are they what are they doing? And so it is very very clear now
doing? And so it is very very clear now that every engineer will have a lot of tokens that they would have to consume and those tokens are going to be produced. Now I just said something a
produced. Now I just said something a second ago. If you just connected the
second ago. If you just connected the dots, we used to be when an engineer comes to work, software programmer, somebody comes to work, give them a laptop. That's a tool to get. Today we
laptop. That's a tool to get. Today we
give them a laptop and tokens. Those
tokens have to be manufactured.
And so a computer used to be just a tool. A computer of the future is a
tool. A computer of the future is a manufacturing equipment. And so these
manufacturing equipment. And so these computers, as you see, they're no different than ASML manufacturing equipment in the future. They're
producing something that is sold just it's no different than a dynamo machine a long time ago that produced electricity. These are manufacturing
electricity. These are manufacturing systems and the energy efficiency of it, the production efficiency of it matters everything because it drives your
revenues. Okay. And so so I the uh the
revenues. Okay. And so so I the uh the third inflection point is here. Um, as
you know, Open Claw, many of these things when they first drop, these open source projects, when they first drop, they seem like toys.
You take a step back and just analyze what is open claw on first principles.
And I explained that yesterday. Um, open
claw and first principles is really a computer, the operating system of an AI computer, a personal AI computer. And it has all
of the properties of a computing system has all the properties of an operating system of this new new computer. You
know, re it manages resources. It
schedules does scheduling. It does IO and you know it networks all of the all of the properties of a fundamental
computer it has okay and so uh you could see the red line is actually not the not and not the y- axis the red line is its growth that's just the extraordinary
thing and so every company in the world will now need to have a what is your open claw strategy every single software company every single every single company needs to have an open cloud
strategy just as we all had our Linux strategies just as we all had to have an internet strategy just what is your mobile cloud strategy now the question is what's your open clause strategy okay
and so this is a very big deal uh the next I wanted to to answer uh the questions about about what I said here a little
bit more first of all a year ago a year ago I said that we had strong visibility
of our Blackwell and Reuben shipments of $500 billion through 2026.
I was standing I was standing in 2025, right? And so GTC 2025 is around what
right? And so GTC 2025 is around what was it? Was it March, April?
was it? Was it March, April?
It was October.
October.
Okay. October. I was standing there. You
sure it was October? GTC DC or GTC.
I said it twice though. The first time I said it was GTC here, right?
I think you've been saying it twice.
Yeah.
I don't think all the way back.
I see. Yeah. Okay. Anyways, um anyhow, in 2025 2025, one of those months I said
that we have strong visibility of Blackwell plus Reuben demand, purchase orders and demand. Okay. very firm
demand of$500 billion dollars and and there were a lot of questions from many of you that that um you know so where are we now and so you wanted an update
on where we are now and so I thought I'd give you guys an update where we're standing right now and what month are we just to for the record March and so here
we are in March here we are March the end of 2027 the end of 2027 7 as you know is many more months away. I just want to first
let you guys know that. However, because
we're building infrastructure and factories and the lead times for everyone is long, they want to make sure they give us firm demand or give us, you
know, purchase orders and firm demand as early as they can to secure secure their supply. Okay. And so we have
supply. Okay. And so we have strong confidence and visibility visibility and strong confidence of $1 trillion dollars
plus you know there's it's not a floatingoint number you guys.
Okay. It is also not 94 digits of accuracy. Okay. And we're not counting
accuracy. Okay. And we're not counting cents. You can keep your cents. How
cents. You can keep your cents. How
however we have strong visibility of a trillion dollar plus of Blackwell plus Reuben.
And the reason why it's only Blackwell plus Ruben and not all of the other things that we sell is because I referenced it from the last year when I was only talking about Blackwell and
Ruben. Does that make sense? So last
Ruben. Does that make sense? So last
year we didn't have we didn't have Grock. Last year we weren't selling
Grock. Last year we weren't selling standalone CPUs. Last year we didn't
standalone CPUs. Last year we didn't have many of the things that we have to sell now and so it wouldn't have made sense for me to include those today and not because we didn't have those things
yesterday. Does that make sense?
yesterday. Does that make sense?
Somebody nod then then I can continue.
Okay. And and and so therefore therefore a couple of things it's only Blackwell and Ruben.
It's not Fman. It's not Reuben Pl, you know, Reuben Ultra. It's not any of those things. It's not Vera Standalone.
those things. It's not Vera Standalone.
It's not Grock. So, Blackwell plus Reuben. We have high confidence, strong
Reuben. We have high confidence, strong visibility, demand, forecast, purchase orders of a
trillion dollar plus.
We close businesses that we ship oftent times, oftentimes. And we expect to
oftentimes. And we expect to close and ship more business between now and the end of 2027.
We expect to close to close, book, and ship more business on top of this between now and 2027. And the reason for that is because we expect to be coming
to work between now and the end of 2027.
Now unlike unlike um other businesses because we build and complete systems of this
quality we can actually win book ship new business in the same
quarter. Of course, you can't do that if
quarter. Of course, you can't do that if you have to build an ASIC or you know, obviously if you don't see that, if you don't see it now, you're not shipping it
by the end of 2027. But that's not true for us. We build inventory. We have a
for us. We build inventory. We have a pipeline of of of uh of supply that and and we have to take advant
of the blue because they're desperate for more compute. Does that make sense?
And so when they're desperate for more compute and all of a sudden the last day they say, you know, goodness gracious, I could use more. I would like to be able to say, and we are always in a position
to say, we'd be more than happy to help you.
We're also working on new customers, new markets, new regions that we haven't put in here yet
because we still have well about 21 months to go.
Okay? And so I want you guys to understand what that $1 trillion is.
It's by definition going to keep growing by definition because I what I compared it against it will keep growing and it'll be larger than that. Um, a couple of things that I
than that. Um, a couple of things that I I wanted to say also that last year was a really good year because 2025 was our year of inference
and I think we helped everybody understand that the price of the computer and the cost of the token, the price of the computer and the cost
of the token are only marginally related.
The price of the computer and the cost of the token. Remember, people are buying these computers to produce tokens. The effectiveness of the
tokens. The effectiveness of the production of those tokens matter greatly. They're not reselling the
greatly. They're not reselling the computer.
If you bought a computer and it's expensive, if you wrote resold it and that's it, then it's expensive. But you
bought a computer and it's expensive because the technology is incredible but it produces tokens at such incredible rates. You had simultaneously have
rates. You had simultaneously have purchased the most expensive computer and produce the lowest cost tokens. Does
that make sense?
This is what we do every day. This is
our job. It is the reason why we deliver the value that we deliver. The value
discrepancy that we deliver here the the two numbers that I just described is how we're able to secure our gross margins.
We have to deliver and we consistent consistently deliver so much more value which is tokens per second which is tokens per second per watt. We deliver
so much more value every single generation that customers would prefer to buy our next generation product at a
higher price than our current generation product at a lower price.
They prefer instantaneously to convert the moment that Vera Rubin comes. It is
smarter to install Vera Rubin's than to continue to buy Grace Blackwells.
Are you guys following me? Somebody nod.
Okay? Because the value is better even though the price is higher.
So I'm comparing these two systems because these are the two de facto systems in the world. And until you can beat these two systems, there's no point buying something else.
And these two systems are incredibly hard to beat because Moore's law doesn't give you 35x.
So Moore's law alone won't do it.
building a faster chip won't do it.
You're gonna have to build a faster lots of chips. And so, um, our last year was
of chips. And so, um, our last year was our 2025 was a year of inference and I think we demonstrated our inference leadership training the post training to
now inference. And then some of the other
inference. And then some of the other things that that we did last year that was really great is we expanded the reach. We expanded the number of AIs
reach. We expanded the number of AIs that now support our platform. Last year
2025, we added anthropic to our platform which is net new.
We added Meta SL which is net new. We're
still working with Meta on all of the other stuff. MetaSL is a net new entity
other stuff. MetaSL is a net new entity and they have net new computing requirements and we can all
acknowledged that last year open-source software, open source models really took off to the point where API inference service
providers now see that open models probably represent approximately represent
the second most popular AI model.
Meaning the large the first one of course is open AI into total number of tokens generated in aggregate open models represent number two as you know
Nvidia is the best platform for open models in the world. We are the standard for open models everywhere. And so
number one open AI, number two all the open models, number three anthropic, number four XAI.
Just take your list, keep working. I
think Nvidia's coverage of open of models last year increased substantially which explains
our accelerating growth at a very large number.
We are already a very large company as you know and we're now accelerating our our rate of growth is actually
accelerating. And so anyways that's I
accelerating. And so anyways that's I think about it. Oh, one last point.
We love our hyperscaler partners and we work very very closely with them. But
it's important to understand that our relationship with hyperscalers is we're not selling we're not just selling to them. We attract customers for them.
them. We attract customers for them.
Having CUDA in their cloud brings all of the CUDA developers, all the AI natives, all the large companies that we work with, whenever we accelerate those large
companies, those or small companies, we bring them, we terminate, we have them hosted in the world CSPs. We are one of the best sales forces of the world's
CSPs. It is the reason why if you go
CSPs. It is the reason why if you go down to the show floor, they have all of the largest booths AWS has the largest booth here.
Google Cloud has the largest booth here.
Azure has the largest booth here.
Oracle giant booth here. Coreweave big
booth here. Does that make sense?
Because we bring customers to them. Why
are they here? To talk to my to sell to my developers.
And all of our developers only know how to program one thing. They only know how to program CUDA. And they only use CUDA X libraries. And when we win and when we
X libraries. And when we win and when we help those developers integrate NVIDIA,
they land on one of our CSP partners. We
are one of the CSP's best sales forces.
All right. However,
we're also seeing tremendous customer diversity outside of the CSPs, regional clouds, industrial, enterprise,
onrem. When Dell and Lenovo and HP,
onrem. When Dell and Lenovo and HP, they're all growing so fast and all the ODMs are growing so fast. A lot of that business go towards the right hand side
of that chart, the 40%.
Most people see our business in the left 60%.
That the right 40% without Nvidia's full stack without our entire the fact that we can build you the entire AI factory
and the fact that all of the world's open platforms run on top of Nvidia you have no hope addressing the 40%.
So a big so the net of this chart is this a big part of that 60% is Nvidia developers landing in the
cloud 100% of the 40% is impossible without full stack without end to end was I successful in communicating that it's important to understand our
business we aggregate that whole thing into what is called accelerated computing and it's probably a disservice to you. So, next year we're going to
to you. So, next year we're going to separate it out a little differently.
Well, in the future, we're going to se separate it out a little differently and it's going to look probably like this chart.
You'll see something like hyperscalers or something like that in 60% of it. And
even when you see that, remember a lot of those customers we brought to the clouds. And then on the right hand side
clouds. And then on the right hand side that 40% is completely impossible if you just build a chip because they don't buy chips. They buy platforms.
chips. They buy platforms. Three messages all in one slide which probably made your brain blow up
and therefore I did it again. Was that
helpful?
I should, you know what I should have done? I should have made three panels
done? I should have made three panels or three slides.
It would have been a seven-hour keynote, but it would have been worth it.
Okay, that's it. Thank you.
Questions?
We're opening it up for questions now.
Hi, U. It's Ben Wright Amelius Research.
thanks for having us here for this uh event. It's uh it's amazing access that
event. It's uh it's amazing access that you guys provide. Congrats to you and the team for that. This is great. Uh
Jensen, last night um when we took a picture, by the way, you all can still like that picture. I need to beat last year's record. Um
year's record. Um what what picture?
Uh we took a quick picture and I posted it and I'm trying to beat last year's likes so Okay. All right. All right. So,
Okay. All right. All right. So,
was I was I in some vulnerable position or anything? Uh, let's put it this way.
or anything? Uh, let's put it this way.
The camera added 10 pounds to me, but not to you. I don't know how that works.
You look great. Um, so, uh, I promised I'd, uh, ask you an inference question, and this is related is, um, you know, this is great. Like, I think a lot of
people here get this. Um, I think the main push back we get is, is the juice worth the squeeze? and will the hyperscalers have upside to their
revenues for API and cloud that justify all this spend and what is Jensen seeing because you know I have estimates for the hyperscalers um and um I've said there's upside to
the revenues but for now the capex is 20% above their cloud API revenue and and I'm wondering what you're seeing um you've said in the past that there's
this massive upside to these uh cash flows and uh from your customers particularly hyperscalers and those that are serving anthropic and open AI. So
when do we adjust those higher? I I know this is a tough question for you because you got a guide for three or four other five other companies but if we see that upside your stock will behave a lot
better because then we'll realize this build can keep going. So when is this inle I mean we're seeing the inflection but when is it you know what is the upside to their revenues and um how do
we feel better about it?
Yeah. So um
so so so I I I wished I wish those companies were public and the reason for that is because then you will see what I see.
No company in no companies in history has ever grown as a startup company, nonprof non non-public company as a startup company
increased revenues by a billion or two billion a week.
That's what they're experiencing right now.
Now remember I just said a week the entire IT software industry is call it two trillion
that$ two trillion dollar industry I don't believe is going to be disrupted I think it's going to be transformed I believe that every one of the two that
$2 trillion IT industry is going to integrate a combination of open AI anthropic and open models
and turn them into connect it with an open source software called OpenClaw that we turned into a enterprise ready version called Nemo Claw
and you have instantly an agent. One and
a half million people downloaded OpenClaw and built themselves an agent.
It's one line of code and then you tell the agent to finish building itself.
Oh, you don't you don't know this thing.
Go learn it. And it goes off and learn it, you know. And so in the future, those agents will be integrated into the IT industry. This IT industry is $2
IT industry. This IT industry is $2 trillion of software licenses today.
It's probably going to be, let me just pick a random number, $8 trillion.
That also resells an enormous amount of tokens. A 100% of the world's IT
tokens. A 100% of the world's IT industry will become resellers of OpenAI and Anthropic. Are you guys following
and Anthropic. Are you guys following me?
No.
Take your estimates up for Open AI and Anthropic.
I I believe that Anthropic and Open AI and and of course all of the IT companies will also modify and customize their own software, their own models with open models. And that's what
Neotron's for. And that's what Nemo's
Neotron's for. And that's what Nemo's for. And all the we've created all the
for. And all the we've created all the tools and that's why we're working with all of them. They're all going to create agents that integrate these three components.
And I believe they're going to grow incredibly. The time is going to come.
incredibly. The time is going to come.
It's going to come soon. And the reason for that is you could see it in Anthropics numbers. You could see it in
Anthropics numbers. You could see it in OpenAI's numbers. They are growing. Not
OpenAI's numbers. They are growing. Not
they're growing an entire IT company in a month.
And the revenues of these AI companies, their AI will be used by enterprise directly, but it's also going to be
resold through IT companies, integrated into IT companies. Does that make sense?
Yep.
Because just think of that AI is just software. Their software is going to be
software. Their software is going to be offered directly to enterprises but is also going to be integrated and become domain specific and specialized
governed, secured, easily provisioned, connected to their system of records, so on so forth. There's going to be a whole, you know, and and that agentic
system will be rented to customers, but they still would have to consume tokens through factories.
And so if it comes down through OpenAI, that's terrific. If it comes down
that's terrific. If it comes down through anthropic, that's terrific. If
it comes down through open models, that's terrific. But they all have to
that's terrific. But they all have to have tokens generated. So the net net is IT companies of the past
licensed software. IT companies of the
licensed software. IT companies of the future will rent tokens, will generate tokens. Are you guys following me? Their
tokens. Are you guys following me? Their
business models will change. The
companies will become bigger. Their
gross margins will change. gross margin
profile will change because they now have tokens in their they have cogs in their business model now but they offer greater much much more value
and so this is exciting for them super exciting for them.
Okay great uh passing this $8 trillion microphone.
Uh good morning CJ Muse Caner Fitzgerald. uh thank you uh for hosting
Fitzgerald. uh thank you uh for hosting this event. Really appreciate it. Wanted
this event. Really appreciate it. Wanted
to I guess maybe follow up on on Ben's question. Um and think about the
question. Um and think about the evolution of this chart of 6040. Uh you
know you you talked about Neimlop uh and then you you announced yesterday the Vera Rubin DSX AI factory uh reference design essentially uh providing the the blueprint um for your non-hypers scale
customers to compete with with the hyperscalers. So, uh, I'm curious, you
hyperscalers. So, uh, I'm curious, you know, as you put it all together, um, you see, you know, massive spike in token generation, um, how, um, you're expecting, you know, pretty much this
chart to to evolve over time and and how we should be thinking about the different players inside there, uh, as to to their relative kind of growth vectors.
I think that this chart um grows on both sides of it grows at similar rates approximately until the physical AI inflection happens
in a few years and so let's say physical AI inflection happens then the industrial side has to be done on prem it has to be done at the edge it has to
be done in location it has to be done in the factory then all of a sudden that 40% % is likely to grow and I think
ultimately that 40% becomes larger and the reason for that is because the world's industries that are related to physical AI is much much larger than the
industries related to digital AI you know some something like 70 trillion dollars of the world's industries you know 50 60 70 is requires physical AI
because the world is happening not in not in our laptop. The world happens out where the world is. And so there's a lot of, you know, atom related businesses
that simply can't be taken care of without physical AI. And so so I I I believe and I hope that that 40% actually becomes 70%.
But both of them are going to be incredibly large because the world is going to produce tokens every single day continuously. It will not stop.
continuously. It will not stop.
You know, right now as we speak, all of our laptops, you know, well, hopefully most of your laptops are are kind of sitting idle, but in the future, computers are going to be running 24/7 m
creating tokens because your agents are off doing work. Somebody somebody uh uh you know, I was reading one of the one of the Reddit posts uh somebody's
somebody's um uh claw consumed 50 million tokens in a day.
Now, that sounds like a lot. Um, but
that's only $50.
And if you had an agent doing productive work $50 that's not bad.
And so, you know, you could have somebody who makes a few thousand dollar a day, have a whole bunch of agents spending $50 a day, becoming a lot more
productive.
This is going to be the norm.
I have them at Nvidia right now as we speak and I'm hoping the person that I'm paying a couple thousand dollars a day
to is spending more than $50 a day of tokens. You know, are you nuts? I want
tokens. You know, are you nuts? I want
you to be managing an entire fleet of agents doing your work. And so, you know, I I'm really
work. And so, you know, I I'm really hoping that somebody who makes $2,000 a day is spending $1,000 a day of tokens.
And it what I just said makes sense.
And it's going to happen and it's already happening in software companies all over the world.
Hi guys, Stacy Razana from Bernstein.
Thanks for taking my question. Um, I
have a quick clarification to ask Colette and then Jensen, I have a question for you. Um, Colette, just just to clarify. I know you've talked about
to clarify. I know you've talked about Reuben ramping in the second half. Grock
sounds like it's launching in Q3. So, am
I correct in thinking that Reuben should launch with Grock because I don't think Grock goes standalone. And then Jensen, I want to ask a longer term uh question
from you. You know, I I really like the
from you. You know, I I really like the chart you put up the other day. It would
almost to me showed like sort of the extension of the spectrum of inference um which uh drove uh which I mean drove value from uh from Grock. You used to
talk about how GPUs were fully the way to go. We now see architectures like
to go. We now see architectures like Grock are needed to sort take advantage as that spectrum of insurance widens low latency becomes more important. I guess
I wanted to to get from you how do you see that spectrum evolving from here? Does your platform now have all the pieces that you need um as we go forward over the next like
several years and I you know hopefully longer than that. Um what are the new types of workloads with inference that you see coming and do you have all the pieces you need to take advantage of that? Is that
something else um that we still need to be keeping our eyes on as that grows?
Okay.
So first Stacy uh thanks for the question regarding Grock and the LPX. We
did communicate that that would be uh also in the second half of this year uh starting and we'll see where that looks uh once we get closer uh to the second half of the year but it is in this current year
but you could say Grock shipping in Q3 I think yesterday correct okay that's what we're expecting however um Vera Rubin is going to ship before Grock it'll ship before yeah yeah and the reason for that is
because we're already in production of Vera Rubin systems are already going through lines and you you know and so so at the moment
that's the condition right and so and and it's okay it's just fine vera rubin is extremely hard to beat even for grock even adding grock to ver rubin is very
tough to beat vera rubin and I'm going to explain your question in a second okay it turns out in computing you have you know it's not completely true but it's close to true that you have you have um
uh two types of architectures one that are extremely low latency one that's extremely high throughput, one that's extremely low latency.
And in fact, a CPU is a low latency computer. And notice the size of the
computer. And notice the size of the cache on board, the SRAM.
Grock is an extreme version of that, hyper extreme version of that where the SRAMM occupies basically nearly the whole chip and
the scheduling is done completely statically meaning the compiler figures out where the data and where the compute is and it makes them meet just in time
and the whole the whole Grock system is like one giant synchronous machine. As a
result, it is deterministic. is
extremely low latency. It is not easy to program. It is not flexible. It's not
program. It is not flexible. It's not
general purpose. But it is what it is.
And so what we've done is we've taken Vera Rubin which occupies yesterday I described about three quarters of that space. Vera Rubin is the right answer.
space. Vera Rubin is the right answer.
We don't know how to make that better.
If we knew how to make that better, we would have made that better. MVLink 72
and and then Ver Rubin Ultra MVLink 144 and Fman MVLink 1152 is going to keep expanding the aperture of that left left
hand side where high throughput matters tremendously.
We're going to add Grock, fuse it with Vera Rubin, fuse it with our GPUs, and use Grock to process the
very last stage of auto reggressive models, which is used for language models. That last stage is extremely
models. That last stage is extremely bandwidth intensive. And if we ganged up
bandwidth intensive. And if we ganged up a whole bunch of SRAMs, like thousands of Grock chips, okay, it's 8 to one. So
for that last 25% of the power and you know that last 25% of the use case because your data center has all kinds of different use cases. It's not just one, right? We're all using chat GPT.
one, right? We're all using chat GPT.
We're all using it in different ways. We
all have different tiers of pricing and so we're in different bands in my graph.
We're in different bands in that graph.
Are you guys following me Stacy? So
there's I showed the low the zero tier the free tier you know good better best extreme version and so so for free good
better ver Rubin is untouchable we can't think of anything close by and then for you know best and extreme
probably you know the best and extreme adding grock to that you could increase in the your throughput on the best And you could extend
the extreme version even further.
Now the extreme version is now introduced a new tier but your volume because the throughput curve your volume is so low you can't afford
to make that demand too high. So you
have to set the price quite high. Does
that make sense? However, there's a new class of customers who's very very rich software engineers. They already cost so
software engineers. They already cost so much money that if if I added to them $100 a day of
inference cost, token cost, I'd be more than happy to do it. If I added even $1,000 on crunch time, more than happy to do it. Does that make sense? And so,
so I'm simply describing what's happening to a market that is, if you will maturing.
In the beginning of the market, nobody knew exact the technology wasn't mature and people didn't know exactly how to use it. 100% of the early inference
use it. 100% of the early inference customers were free tier.
And as the technology started to reached 01 and 03 all of a sudden the pay tier skyrocketed because people are now able to use it for something useful. Then all
of a sudden when agents came now for example cloud code right codeex those tokens are a lot more expensive
than free tier and they're a lot more expensive than $20 a month.
And so that segment I we just incre we just added two more segments. Did you
just see that? And so this is no different than iPhone. In the beginning there was only one version and now there are a whole lot of versions. No
different than the car industry. No
different than any industry. As the as the market expands the segments expand.
I showed a factory that is able to produce tokens of different segments and different tiers
from very very smart, incredibly fast to high throughput free tier. And that described an
free tier. And that described an architecture of AI factory architecture that allows you to address the whole thing to maximize ultimately the total
revenues of the factory. And we let you decide how you want to in mix and match.
My estimate is it's probably about 25% today for call it a handful of companies. You have to be one of the you
companies. You have to be one of the you know you need to have you need to generate a lot of tokens to make it worthwhile. And so and then and then
worthwhile. And so and then and then there's a a a whole bunch of of um uh they call them uh inference service
providers ISPs you know API service providers I think they could also benefit from this okay because they would like to have you know different segmentation of token generation and so
I you know call it a group of 10 customers and 25% of that 10 customers represents a big part of that pie We can
increase our total revenues with Grock by 2x on 25%.
2x 25%. Does that make sense? So say
25%.
And I mean as you continue like with new versions of Grock with with new generations. So what does that do? Are
generations. So what does that do? Are
you are you pushing that out even farther or are you lowering the cost and increasing the demand? Like I'm just trying to get some feeling.
You're we're always doing one of two things. We're pushing the throughput at
things. We're pushing the throughput at every tier up and we're always pushing the smartness of the AI out. And so so
did you you see the paro I'm always pushing it up. I actually did the transition showing you guys from Hopper to Blackwell to to Ver Rubin. So I'm
always pushing it up and I'm always pushing it out. Whenever I push up, the production volume of your factory goes up at every price point
ISO price point, the volume goes up.
Okay? When I push it out, you can introduce new tiers of AI, new tiers of tokens and therefore you're you got new price
point today, you know, price point of call it $6 per million tokens. That's
kind of where the world is. We really
like to be, I know they would all love to be $50 per million tokens, but super large models, super fast. Could you imagine a
10 trillion parameter model running at 500 tokens per second?
Our engineers would pay big money for that. And I would let my engineers pay
that. And I would let my engineers pay big money for that. And and so so that world wants to come and then the next year will come again because the models will get bigger. they'll think more,
they'll use more tools and things like that. You know, it's just like back in
that. You know, it's just like back in the old days for I don't know how many of you knew Nvidia in the beginning, but we had one product, Revo 128.
Revo 128, 299 bucks.
That was it. One product, you know, those good old days. And then today we have 59 5090
5080 two different SKs 5070 three different SKs 50 are you guys following me? And all of these SKs exist because
me? And all of these SKs exist because the market got larger and it started the segment and people wanted different things. The market is exactly doing the
things. The market is exactly doing the same thing with tokens is getting larger and larger and different segments wanted different things. And so I need to we
different things. And so I need to we need to help the customers. We need to help our model makers produce manufacture different segments of tokens.
I know they they look like numbers, but you know they're different different AIs. Make sense?
AIs. Make sense?
Got it. It does. Thank you.
Yeah. So incredible. So we're going to increase the throughput and we're going to increase their their pricing simultaneously.
That's the benefit of Vera Rubin and we did that every single time. We did that with Blackwell. We did that with Vera
with Blackwell. We did that with Vera Rubin. We're going to do that with Vera
Rubin. We're going to do that with Vera Rubin with Grock. We do that with Ver Rubin Ultra. We're just going to keep
Rubin Ultra. We're just going to keep pushing that envelope and ultimately the simplistic way is that paro chart because a factory a lot of different workloads and different customers that
paro chart we want to push the paro frontier out up and out constantly up and out constantly up and out and the computer
science necessary to do that. Insane the
hardest problem of all.
Thank you. Uh hi Vive Kara from Bank of America Securities. Thanks Jensen.
America Securities. Thanks Jensen.
Thanks Colette uh for hosting us and for a very informative event. Um I wanted to ask actually two related questions. One
is in this $1 trillion Jensen that you showed you have other products also that you spoke about yesterday right the Vera CPU right other CPUs you know you have
Grock um you know you have a storage uh solution um right CPX right assume. So
how much of that is incremental right is it a small number is it a medium like ho how much more is that addressible market that is not captured in this trillion assuming it is incremental to this and
then I wanted to double click on grock again Jensen I think you mentioned that it will take up 25% of the inference that's a pretty big statement and is it
cannibalizing something is it you know what is kind of the value capture uh from uh Grock over time and a lot of people ask us you know is it cannibalistic of high bandwidth memory
demand I don't think it is but I would love to hear your view on how to kind of put gro in the value capture uh right part of the spectrum okay we're the only company in the world
today that can optimize and architecture one AI factory across three memories of course HBM memory but we're the first to use LPDDR5
which is extremely high bandwidth and very low power and that changes the equation for CPUs and the third is SRAM.
We can now utilize all three memory types to create the perfect architecture and we are okay that's number one. Um
we used to offer just MVL 72 Grace Blackwell that was our rack. We had one rack. We now have five racks as you
rack. We now have five racks as you know.
And the reason why is because Can you go to the next slide?
Thank you.
That was previous.
Yep. So Oh,
this one.
No.
Back. There you go. Is that the one?
Yeah. You see that?
This this is what MVL72 did. It ran
that. Are you guys following me? It ran
all these large language models. This is
what it was designed to do. And and all of our inference stack ran that. But
remember what an agentic system is. It
runs this.
This is what claude code now do. This is
what codecs now do. It runs all of this.
It has memory that goes into the KV cache.
It has um and that's on the STX system.
This memory has grown so much that it needs to be accelerated.
It's just too much. All of our working memories, every time we use it, the more we use it, the harder the problem we solve. Um this is structured and
solve. Um this is structured and unstructured data. This is where I
unstructured data. This is where I started the the keynote with QDF and QVS the stuff that nobody ever talks about which is value incredible in the future because this agent is way faster than a
human and it's going to bang on that way harder and faster. Does that make sense?
And then tool use web browser.
And so a web browser runs on a CPU.
And so you need you need a a CPU to give the agent access to tools. And uh then it spawns off sub agents and who knows what this could be. One of the sub aents
could be co-opt which is GPU accelerated. Another sub sub aent could
accelerated. Another sub sub aent could be omniverse GPU accelerated. And so we need those kind of GPUs in the data center. So the way to think about what
center. So the way to think about what is Vera Rubin? Ver Rubin as a system expanded tremendously because we went from processing that which is it's still
90% of the workload to processing all of this. Are you guys following me? This is AI.
following me? This is AI.
This is where chat GPT started. But this
is where it is now.
Can someone nod?
No. I have no idea what you're talking about.
You guys get it? Comp. Okay, give me a thumbs up. All right, thank you. And so
thumbs up. All right, thank you. And so
because I'll do it again. This is why you know sometimes our keynotes run long because I look in the audience and there's some person sitting in front of me that's like they looked lost and so
and so I I just I'm gonna have to do this again. I'll leave nobody behind,
this again. I'll leave nobody behind, you know? So and so so this is this is
you know? So and so so this is this is the an agent. So what what just happened in our data center? That data center doesn't want to be cobbled up Frankenstein
and it wants to use it wants to use elegant power delivery and cooling systems. And so we took all of the computers that's here and we put them into the MGX rack
and we designed the world the perfect processor for each one of these things and just rack them up. Does that make sense? And so and and if you're going to
sense? And so and and if you're going to put east if you're going to put storage, which is which is right up there and here, if you're going to put that in the east west, which is in the same aisle as
the compute, you better make it so it's not a Frankenstein outfit. You can't
have liquid cooled in MVLink 72 racks and then air cooled.
You know, you can't have 300 kilowatts here and then use 50 kilowatts here. It
makes no sense. And so we took the whole thing and we put harmonized all of it in one single rack architecture. And so if
you want to build a cluster to run that, you just connect them all up. It's
incredible. Same power delivery, same cooling system, all 100% liquid cooled, all completely optimized for the workload, all fully accelerated. And so
now your question, um, in order to run this agent and be able to offer all the things that we were just talking to Stacy about,
you would increase your capex, you would increase your compute spend, the the GPU compute spend by 25%.
And so you add Grock to 25% of this of the of the um uh of the of the workload.
and you um uh buy eight times as many chips, which is approximately the same price as the MVLink 72 racks. Okay? So
25% is multiplied by two and there that's the same as 25%. Okay? And so
your 25 your your your um uh compute spend goes up by 25%. That's that's the first one. And that's not in the one
first one. And that's not in the one trillion.
And so if a 100 if 100% of that one trillion now adds Grock then it'll be 1.25 trillion. Okay. And then then we
1.25 trillion. Okay. And then then we also have storage which is a lot because storage as you know there's just a lot of storage in the world. It is it is the
the second largest compute spend. And
then the third will be CPUs for tool use. But I'm not expecting CPUs to be
use. But I'm not expecting CPUs to be that much and call it you know because it just CPUs just don't add up to much.
Okay. And so so you could you could say CPU is another 5%. Okay. So so I you know so if you were to say um allin uh the difference between Grace Blackwell
racks which as you you know saw was however big it was and the Vera Rubin racks. Okay. If it added another you
racks. Okay. If it added another you know 50% opportunity I think that's probably not far off. Did I just kind of reason through it for you? Is that
everybody got that? Okay. And so that's the fundamental difference between the Grace Blackwell go to market and the Vera Rubin go to market because we were solving in the Grace Blackwell world
inference we wanted to be inference king you know who doesn't right and so so that's what we're solving ver Rubin we're solving for this that's why that's
why I said open claw is completely transformational finally we have one piece of software that runs across this whole thing one open- source software It
is the operating system of this chart.
It's incredible.
Now, every company in the world can go build this.
Couple of days, which seems pretty good.
Um, can you talk about the uses of that cash to build strategic advantage in your business? You're making investments
your business? You're making investments in ecosystem partners. You've got
purchase commitments on components.
You're also returning cash to shareholders. How do you balance those
shareholders. How do you balance those priorities?
Um, well, the priorities have to go.
Number one, it has to fund our growth.
And our supply chain we work very closely with. And we're in a great place
closely with. And we're in a great place with our supply chain today for a good reason. And and it's because we we work
reason. And and it's because we we work very long term with them. We help them uh plan their business. We award
businesses to them uh to support their growth. uh we we even prepay and
growth. uh we we even prepay and sometimes we'll even fund their capacity with them uh growth but we're preparing for a trillion dollars you know over the
next you know I just have to be very clear for clear a trillion plus um through December 25th
um five I think we we probably shut it down at 400 pm and and so through that time uh Pacific
Standard Time, there's a lot of caveats in there. Just
make sure. But anyways, the plus um and and and um and so that's number one. Number two, uh
we invest in our our um ecosystem because as you know the CUDA developers and and the growth of this AI natives in this this stage is really important and
then after that we're still going to generate quite amount of free cash flow and so so um well I'll let Colette Colette answer it. I mean she we have a
good plan so go ahead.
Yeah. So uh with the strong uh growth that we have of the uh one trillion going forward that gives us of course a very good position in terms of free cash
flows. He talked about some of them up
flows. He talked about some of them up front in terms of making sure uh that our suppliers and everything that we need to do is build is in order and that may take some prepaids. The second thing
is our investments. We are still working in terms of with our commitments that we made over the last year that we need to do in the first half of this year. Uh
but once we move forward and complete those, we do have an opportunity for stock repurchases and focusing on returning capital to here to our shareholders. It is still a very
shareholders. It is still a very important part of our work that we are going to do. Uh we had a good year last year and I think we're going to have another great year in terms of what we
can do in terms of returning capital to them.
Oh, it's up to you.
Okay. Where we stand right now? Um it is uh probably not taking into account the plus sign. Not taking account the plus
plus sign. Not taking account the plus sign. we will probably be at 50% stock
sign. we will probably be at 50% stock repurchases um and dividend together as a percentage of our free cash flow. So
that's where we're starting out and as you can see the plus sign is real and then that does give us an additional opportunity to even do more the timing of it. Uh again remember looking through
of it. Uh again remember looking through what we have to do here in the first half of the year with some of our existing commitments but stay tuned.
Hey, it's uh Tim Aruri at UBS. Thanks.
Um, so let me preface this by saying that this is not what I think, but this is what I hear from from a lot of, you know, folks out there. So there's some concern that you're capturing too much of the value of the ecosystem and that
you can't and that you can't sustain these margins over time. Um, so how do you respond to those concerns? I know
you see stuff online about having to invest in the ecosystem and people sort of spin that in a negative way. So can
you just talk about how you can sustain your your uh margins?
Um first of all almost everything I told you guys yesterday is a new perspective.
It is not it is not illogical that everybody has to understand tokconomics.
It is not illogical that the world needs to learn what a computer has become.
If we deliver if we continue to deliver X factors X factors of tokens per second per watt every year. If we continue to
deliver X factors of ASP increase for them because we introduce new token segments, customers will be more than delighted to
continue to do work with us.
And it is also true and I've said it before and the math is absolutely clear.
Every CEO, every CEO of every cloud service provider, I would challenge them all to go and create that chart for themselves and I'll help them. And you
pick your favorite other configuration.
You pick your favorite other configuration, third party chips, built your own chips, and you put it into that model faithfully.
And then you can decide, would you like to have higher revenues or lower? Would
you like to have higher ASPs or lower?
Would you like higher margins or lower?
Because that's all it means. Look,
TSMC's wafers are the highest in the world, but they're the best value in the world. And I gladly pay for it. And so
world. And I gladly pay for it. And so
there the the idea ASML systems aren't the most expensive in the world. They're
worth it. There's no question about it.
And so the question is simply, do you want to make more money or do you want to buy the lowest cost equipment?
Do you want to make more money or do you want to buy the lowest cost equipment?
That's the difference. Now,
what I just said is a new concept and I think we can all acknowledge that. I
just treated a computer computer system the way I treat TSMC's chip factory, the way I treat ASML manufacturing equipment. And that's not the way people
equipment. And that's not the way people thought about it in the past.
If I have two CPUs, one of them is 256 cores, the other one's 256 cores.
Tell me which one is the better one.
Well, the cheaper one's the better one because I'm renting it by the core anyways.
But that's not the way tokens are created. You don't rent by the core. You
created. You don't rent by the core. You
monetize by the tokens per second and so it's a different economic. Does
that make sense? You're not renting cores. You're not renting nodes. You're
cores. You're not renting nodes. You're
producing tokens which is the reason why everything changed. It was necessary
changed. It was necessary you know to to make sure that everybody understands the economics of the new world.
So we are ne anybody who says that simply does not understand the business. That's all.
They're trying to buy the lowest equipment, lowest cost equipment. My
equipment cost 30% cheaper.
What does that mean to your factory?
What does that mean to your factory?
That's really the question, you know, and so I I think people will you I anybody who says my equip my chips are 50% cheaper,
put that in the context of the factory and that person is actually demonstrating to you they don't understand AI. They're just saying
understand AI. They're just saying somebody goes, "I'm 30% cheaper. You
don't understand AI.
I'm 40% cheaper. You don't understand AI.
My chips are chippier. You don't
understand AI.
You guys all know who I'm talking about.
I'm not talking about anybody. I was
just saying it's a theoretical comment.
Hi. Uh Josh Buckalter from TD Cowan.
Thank you for uh spending the morning with us and I know there's a lot of customers and partners that are after your time, so we appreciate it. Um, I
wanted to ask a question. You know, you you'd said a few times, I think yesterday, that you expect to be short capacity into 2027. Can you elaborate on where you're seeing those shortages?
And, you know, on that note, you know, you've described yourself as the chief revenue destroyer and and and Sati has made some comments about not wanting to overindex to one generation knowing that
there's another one coming very soon. Is
that behavior unique to Microsoft and and are these constraints sort of protecting? By the way, Satia would also
protecting? By the way, Satia would also tell you who told him that exactly. I I told Satia, buy what you
exactly. I I told Satia, buy what you need this year because next year there'll be something better.
So I guess my question on that is is is TSMC constraints or the capacity constraints sort of protecting your other customers from doing that or do you see them holding a similar mindset as
No, I I I think you know I I don't I don't want you guys to thinly slice and dice our choice of words.
Um the is the world supply constrained at some level? Yes. Right. Can we all agree saying the opposite is weird, you
know? Hi, is the world constrained on
know? Hi, is the world constrained on cars?
Well, you see cars and well, what if I tripled the demand? Yeah. And so
everything is somewhat constrained. It
just depends on depends on everything.
And because we're building at such a large scale, our life is just not simplistic.
It's not so simplistic as say oh I can if I just solve this one problem that's it life is good we are working
multiple dimensions across multiple suppliers and making sure that things are in harmony you don't have too much we don't have
too little we can meet the c we can meet our demand plus and the reason why we want to meet our demand plus is because there's always new demand coming for the
next 21 months. I got a whole bunch of new demand that's coming and so I got to prepare for that. And so there's all kinds of there all kinds of parameters
are not simple. And and if I told you that we are supply constraint on this one item, then I know what you guys are going to do,
you know.
So I I think that the system the system is harmonious.
Nothing is too much, nothing's too little. We don't have too much power. We
little. We don't have too much power. We
don't have too little power. We don't
have too many construction workers. We
don't have too many plumbers. We don't
have too few plumbers. You know, we don't have enough. We don't have too many cables. We don't have too many
many cables. We don't have too many optics. We have We don't have too few
optics. We have We don't have too few optics. We don't have too Are you guys
optics. We don't have too Are you guys following me? It's just kind of right
following me? It's just kind of right there. And we'll work it every day.
there. And we'll work it every day.
But the one trillion plus we can meet.
Perfect. Uh Aaron Rakers with Wells Fargo. Thanks uh for doing this as well.
Fargo. Thanks uh for doing this as well.
Thank you.
I'm surprised we got to this point without this question being asked, you know, and it's more technical. You know,
there's a lot of discussion.
You know what? We're kind of like the Fed now.
Did he say near or almost? And what did he mean by?
or almost? And what did he mean by?
Well, we got to do all of his previous transcripts. And when did he use that
transcripts. And when did he use that word? And what you know,
word? And what you know, here's what I know. demand is
accelerating at a very large scale and we'll be able to support the supply.
Perfect.
Yes, perfect. So, I was going to ask about
perfect. So, I was going to ask about architecture. Um,
architecture. Um, oh, I've gotten a lot of questions about yesterday's presentation, where CPO starts, where copper ends. You you
outlined Oh, dear.
You outlined NVL 576. There was an NVL 15 or 1152 on a slide.
So I'm curious of what is your current thought process around offering both.
Yeah.
And and and how does that evolve as we scale to ver Ruben Ultra Fineman? Just
curious of your thoughts. Thank you.
Okay.
Please treat my partners properly. They're all doing great. Okay.
properly. They're all doing great. Okay.
I'm not saying anything here that that suggest any of their businesses I'm going to go the other way. All of their businesses
are going to grow because of us. We're
going to grow copper. We're going to grow optics tremendously.
We're going to grow copper. We're going
to grow optics tremendously.
Now, did I say something that is completely logical? The answer is yes.
completely logical? The answer is yes.
And let me tell you why.
We should scale with copper as long as far as we can, as long as we can, but you know, at a meter plus or minus, it's kind of, you know, the limits of copper.
Okay. And so you've seen us go from MVLink 72 to now Reuben Ultra MVLink 144, right? where the back plane was
144, right? where the back plane was designed to be able to support that.
Okay. And so that's kind of approximately, you know, and and we're going to keep working on on our 30s and if we could extend it from 144 to 288, we'll be more
than happy to do so because you should use copper for as long as you can because copper is just easier to manufacture.
It's more reliable.
We've been manufactur it for a long time. Humanity's been using it for a
time. Humanity's been using it for a long time. And so, did I say anything
long time. And so, did I say anything that's illogical to anybody? Everybody's
makes sense. You should breathe air for as long as you can until you're out of it. You know, after that, we'll breathe
it. You know, after that, we'll breathe like compressed liquid air, you know, but until then, how about air?
Okay, it's free. It's we've been using it for a long time. It's safe. All
right. And so, so one, we should scale up with copper as long as we can. As you
know, as you note, we also took Ethernet to a structured cable back plane.
So that's incremental growth opportunity. Did I Isn't that right? I
opportunity. Did I Isn't that right? I
just said it yesterday. We're going to take the back plane of Ethernet and we turn it into these spines because these structured cables are really easy. You
know, now that we got we mastered how to use it and and manufacture it and it's it's a real artistry. We now can create these things and you could it's easy to maintain it's easy to ship easy to wire
up you know you make no mistakes right it's fantastic however simultaneously we want to scale up beyond 72
to 144 right to 1152 and maybe even further than that someday and there's a limit to how far copper
can go and so you could see where 100% % copper. Now the next generation Ultra
copper. Now the next generation Ultra will have two options. You could copper or copper plus CPO.
Copper or copper plus CPO.
Copper or copper plus CPO.
Because I have two options, copper plus CPO or copper Okay, that's one year from now. Two
years from now at 11:52, it's all CPO because there's a limit to how far I could take copper and so so there's a
transition. However,
transition. However, even when MVL link is CPO and Spectrum X is CPO,
we will still have copper for the Ethernet scale up on the on our racks. We will still have copper for our
racks. We will still have copper for our storage. We will Does that make sense?
storage. We will Does that make sense?
Because we have five different racks.
And so the amount of copper we will use will continue to be high because even though scale up will go to CPO in two three years,
the total consumption of copper connectors is going to continue to grow because our demand and our total capacity continues to grow with all these different other racks.
Was I clear?
Okay, thank you.
Got to select the words just perfectly.
Jim Schneider, Goldman Sachs, thanks for uh taking the question. Um, you know, you previously talked about the spectrum
of of token cost and the very helpful to hear the 20 uh 25% of uh that in the in the high tier. How do you see the market evolving over time um in terms of growth
rates of the the low or free tier versus the high tier and you know in in a market that's been sort of predicated by big uh decreases in token costs coming
down over time? Um how do you see that trending? Does that start to slow or
trending? Does that start to slow or potentially flatten out and why?
Token cost is going to keep on coming down. Can we go to the next slide?
down. Can we go to the next slide?
No, I just wanted to like token cost is going to keep on coming down, you know, every single year. This is just Grace Blackwell and
year. This is just Grace Blackwell and then Reuben token cost will come down again and Reuben Ultra token cost will come down again. Okay. Meanwhile, the
token smartness, the smartness per token is going to keep on going going up as well as we extend that curve to the right. Okay, the the
the the the x- axis. Meanwhile, we're
going to increase the throughput. This
is everything has to be nobody cares about tokens per second. You always have to divide it by watt. And the reason for that is because your data center is only so big. Your data center is of a
so big. Your data center is of a gigawatt. you're not going to have two.
gigawatt. you're not going to have two.
If it's 200 megawatt, you're not going to have three. Does it make sense? And
so you always have to normalize it.
Otherwise, no architecture you can compare nothing. And Moore's law was
compare nothing. And Moore's law was always divided by something. Okay? So
you have to take tokens per second per one. Anybody who shows you anything else
one. Anybody who shows you anything else just doesn't understand AI. Okay? Or
they're trying to deceive you somehow.
All right? So you got that's the reason why semi analysis did it right.
They did it right. Everything was
divided by one. Okay? And so we're going to keep on increasing throughput. So
whatever this is the price of a token, whatever the price, whatever that ASP is, we increase its throughput. Whatever
the ASP is, we increase the throughput.
Does that make sense? And then here, whatever the a whatever that segment is, we reduce the cost. Whatever that
segment is, we reduce the cost. So this
is kind of like this down here is essentially your segment product segment. And that's throughput how many
segment. And that's throughput how many the volume production and that's the cost of it. These are the two curve that's why these two curves are so important. Now I combined I combined
important. Now I combined I combined those two curves. Um you can combine those two curves um uh if you like but it's it makes your head blow up but this
this curve is essentially the paro this curve and we spend in fact most most of the world today is simply right here.
This is the hopper world. You see that hopper is kind of right here. Blackwell
extended it and added a couple of segments.
And this is really valuable and people love that because the ASP difference between here and here could be 5x 10x
make sense larger model and faster.
Okay. And so these are really valuable.
Now how do I see the curve changing?
Demand curve changing. Yesterday I used 25% here, 25% here, 25% here, and 25%.
That's all I did. But a c a supplers's a pro a manufacturers's distribution of different product segments just kind of depends. Do you guys see what I'm
depends. Do you guys see what I'm saying? It kind of depends. You know,
saying? It kind of depends. You know,
Ferrari is kind of all high-end, nothing nothing in the free tier, you know, and then somebody else, right? Just depends
on the brand. And I think it's going to be the same here, guys. If your business is search, you're going to be largely free tier because nobody pays for search. So if you're a search business,
search. So if you're a search business, you're going to be largely free tier. If
you're codegeneration, if you're code gen agentic code, you're going to be a lot here.
If you're enterprise worker, you know, and the average salary of that person, let's pick, you know, pick a number, say 50,000 or 70,000, you might be here to you you want your product. If your
customer is that person, you want your product price somewhere here. Does that
make sense? It depends on your customer and the work that you do for them. It
depends on the customer, the work you do for them, and the competition. Those
three three things matter. It's just
exactly like products.
AI tokens are products, a new commodity, and it'll be marketed as such.
And different suppliers, different brands, different target markets are going to have different shapes. I just
simply chose an equal distribution yesterday.
Make sense?
Yeah. Just which segment do you see is growing faster in the future?
They're all going to grow really fast at the moment. It just I don't think at at
the moment. It just I don't think at at the moment it just doesn't matter.
They're all going to grow so fast.
They're all growing exponentially at the moment. Every one of them.
moment. Every one of them.
We're at the beginning, right? where you
know the growth rate is divided by a very small number.
Hi Mark Leatis Evercore ISI. Thanks a
lot for doing the Q&A. I always love the insights. Um Jensen, our fieldwork is uh
insights. Um Jensen, our fieldwork is uh telling us that AI engineers are getting excited about state space models because they address uh memory requirements and in your keynote you show showed Nebotron
3 is benchmarking in in one of the top models and I believe that's a a hybrid mixture of expert state space model. Um,
and I'm wondering, you know, impressive.
I was just trying to Thank you, uh, Jensen. Uh, in the past, new new AI workloads have, uh, led to the adoption of different AI models.
That was my Darth Vader imitator.
So, impressive young Jedi.
So, the question is, um, is agentic AI creating a new a demand, uh, a need for a new AI model? Is that what you're doing with Neotron and the the hybrid?
Uh what does state space get you for Neatron 3 that that pure mixture of experts didn't? And and what are the
experts didn't? And and what are the implications on the competitive environment for Nvidia if there's this transition uh to a new kind of AI model?
We run all AI models whether it's full transformer um uh discrete tokens um continuous
uh diffusion state space hybrid our architectures beauty is that it does it all um for example Grock can't do diffusion
models but we can do everything does that make sense and so um I'm I'm picking ing on Grock, not because I'm picking on Grock,
it belongs to me now. So I can I can say these things, you know, and so uh but every architecture has its place. The
reason why Nvidia is so versatile and the reason why it's used so freely everywhere is because irrespective of what innovation your research scientists come up with tomorrow, I promise you it's going to run great on CUDA. I just
promise you that. And and the reason for that is because it's we know we have all of the necessary computing elements to do all of it. Okay? And so so it's
Neotron 3 was designed so that you can deal with extremely long context and in time the AI models we're going to you're going to have conversations with your AI hopefully for as long as you shall live.
And so the question is um how do you deal with context? How do you deal with the relevant conversational memory? So
that so that on the one hand if you memorized everything and we talk about something over time which version of that memory do you pull back when you when you have too much memory over time
it it could become garbled and you know maybe you maybe a a reset is helpful.
These are research areas. Long memory
areas are re really research areas. But
but the hybrid architecture I think is going to be a very major thing because it allows you to deal with extremely long context and not have to suffer the quadratic explosion in computation. And
that's the reason why we invented it and we put it out in open source and it could it could um we we love for everybody to use it. Yeah. And so it's intended to advance AI not to compete
with anybody. We don't need to, you
with anybody. We don't need to, you know, we just we just want to advance AI.
Impressive.
Thank you, Yson.
Um, so I'm trying to to understand how concentrated your downstream like the AI market is and is going to be. And so you have this
chart showing 60% is hyperscalers. But
I'm kind of thinking the other 40% the majority of that is tier 2 cloud and a lot of them are actually reselling or renting their capacity to hyperscalers
or to the frontier labs. So if you take hyperscalers plus frontier labs it might be like 80% of people actually using the infrastructure that is being um being
deployed. So that's an element of
deployed. So that's an element of concentration.
And then these models like the entropic models, the open AI models etc. seems to be like a very small handful that are
really at the frontier. And so
do you think that's the right description of the situation today? How
do you see that evolving?
And maybe what does that mean in terms of right to make money in the in the value chain and development and like the
further acceleration of AI. Okay. So I
would I would slice it into three dimensions. Okay. It's a and I as you
dimensions. Okay. It's a and I as you were talking I simplified it as much as I can into a cube into three dimensions.
the the first the first dimension is um uh what is the end model being run and I I said earlier open AI is the
largest. The second largest by category
largest. The second largest by category is basically all the open models in aggregate is by definitely solidly number two and then number three would
be anthropic and then you know so on so forth. Okay. And so and that and the
forth. Okay. And so and that and the tail is actually you know it's fairly long. Okay. And so if you look at the
long. Okay. And so if you look at the world of model consumption, even just language, that's the way to think about it.
And we run all of them. We're in all of them. That's one dimension. Um,
them. That's one dimension. Um,
in that subdimension of models, you have to decide to add also physical AI models, which is robotics. Like all the robots you saw, they're not running,
you know, they're running vision, language, action models. And those
models are different than just language models. And for example, the control of
models. And for example, the control of motors is continuous. It's not dot dot dot dot dot, you know, it's not like a character. It's not like words. It's
character. It's not like words. It's
continuous. And so, um, uh, physics is continuous.
uh biology has g geome it obeys geometry because things chemicals obeys geometry.
Okay. And so so there's a lot of lot of different types of models. Um but point being that you have to first think about the different types of models being run and that that's helpful to how you think
about um the right to to write to business. The second dimension is
business. The second dimension is are they are the are the computing depending on the the way that the
companies are are structured um and their intentions or or interests. Uh
they are either companies that want to build their own chips and we have to compete with uh companies that want to host Nvidia customers in their cloud and
obviously you know CUDA only runs on Nvidia CUDA and then are they companies like for example NCPs where they need us they can't just buy chips they really
have to buy systems and so they're really infrastructure customers.
Or are they companies that want to build on prem? Therefore, my distribution
on prem? Therefore, my distribution distribution channel goes through Dell and HP and Lenovo because it has to integrate a whole bunch of other enterprise computing components and Dell
and HP they don't build their own chips or are they at the edge um and maybe they're radio networks, maybe they're robotic systems or self-driving cars or satellites and so
so on so forth. Does it make sense? Now
you got to decide where where where is the computing being done.
Okay. And so so those are kind of kind of the you know several dimensions I guess you could you could think about it. And when you're done subdividing all
it. And when you're done subdividing all of that you come back to um the chart that I showed you 6040%.
Within that 60 40% 40% of it basically they need computing. It doesn't matter what models they run. It could be open AI models, could be anthropic models.
The fact that NVIDIA supports confidential computing makes it possible for OpenAI to run on the right right side at all. We make it possible for anthropic to write to run on the right
side at all because we have an we have um confidential computing. That side
they want entire platforms. They want confidential computing. They want um
confidential computing. They want um computers at different parts of the world, not just in not just in in the cloud. Even in the cloud we compete with
cloud. Even in the cloud we compete with some part but we also bring customers to the other part and so some part of that
CSP chart of 60% we have to compete and our job is just to deliver that chart better than anybody else in the world and we're doing very very well and we're
actually increasing our position day in and day out. Um and then the other part we bring customers to them they're just grateful.
Make sense? So I took all of that dimensionality and I compressed it into basically two pies, two two two slices of a pie. And so that compression I
think if you if you if you test against do they design their does do they do we does Nvidia compete with them on chips?
Okay, there you go. That's interesting.
And then you got to figure out, you know, where are we in our our position and what's our opportunity and so on and so forth. Um you know we I don't think
so forth. Um you know we I don't think OCI will design their own chips. I don't
think it's sensible for them to do it.
Uh obviously corew is not going to design their own chips and so so there we we you know so where do we compete and where do we bring the cloud service provider customers
and and their cloud revenues a lot of them a lot a big part of it obviously OCI nearly 100% of that is because of Nvidia
right with open AI we'll take our last question.
Hi, it's uh Tim Schultzander from Ross Shaw and Co Redburn. Um maybe just a question around um how you run the company Jensen and looking ahead this 12
monthly flywheel is a critical part of your competitive advantage. But when I look at headcount actually it seems to be growing very slowly relatively slowly
and yet the undertaking that you are going for is growing much more rapidly than that you know how do you manage that or prepare for that going forward and how do you manage maybe the risk
that that could pose to your business yeah um as you know as you know I have 60 people on my direct team and the reason why we need 60 people is because
the com the company's architecture ure was designed to deliver on this architecture on the products. The the organization the architecture of a company should
reflect the products they build. Every
company should not look have the same business or you know and and I I look across and I said oh look at they have a business unit here they have a business unit there. They have a business unit
unit there. They have a business unit there and and yet they want to build what we want to build. You know what you build as a company for example the way Not because I've seen it, I I've read
about it. Um the way you build a Ferrari
about it. Um the way you build a Ferrari and the way you build a Ford is very different. In one case, you move the
different. In one case, you move the car. In the in the other case, you move
car. In the in the other case, you move the people. Okay? And so the car stays
the people. Okay? And so the car stays stationary. And so it depends on it
stationary. And so it depends on it depends on the results of what you want to create. The architecture should
to create. The architecture should reflect it. Um if you look across my
reflect it. Um if you look across my management team, every aspect of the technology necessary to build Vera Rubin's entire factory is right there
100%. Everybody is represented. All of
100%. Everybody is represented. All of
the expertise sitting at the table making a decision together. And the
second thing is we had the discipline to develop the entire software stack. You
can't build what we build on a yearly basis if you can't bring it up.
Are you guys following me? It's very
logical. How do you test it if you can't bring it up?
And if you're cobbling up new technology from everybody else, how do you bring it up once a year? It's just not even practical. It's not possible. So, we
practical. It's not possible. So, we
align all of our chips to the platforms, all seven chips. They only have one tapeout schedule.
I don't cobble up everybody's tape schedule and figure out when the system comes. The system comes when it, you
comes. The system comes when it, you know, when the system needs to come and everybody aligns to it. And the software stack, we completely own every piece.
the storage that's the reason why we developed it networking of course all of the you know even the factory operating system we call Dynamo we created
everything so that we could deliver every single benchmark test everything to the limit test for reliability test for and the reason why
Nvidia built Neimotron is so that we could do pre-training post-training and now we can do inference we own all of the software ware so that we can bring up all of the systems on an
annual basis which basically says you're bringing up all the time.
If you don't own everything, you have no shot, 0% chance.
People are talking about their new GPU, but where's their scaleup fabric coming from? And how's that going to work?
from? And how's that going to work?
And that's just I just gave you two examples.
that whole agentic system that we were talking about earlier, that's the future computer. And so that's that's really
computer. And so that's that's really what we the company's organization, the company's mission, the company's capabilities are all aligned to me
delivering the promise that I just delivered to the marketplace. And that's
why we're able to keep doing it. This a
PowerPoint slide is not going to deliver that system. And a PowerPoint slide with
that system. And a PowerPoint slide with two bar charts is not going to convince somebody to give you $50 billion. It
doesn't make any sense. And to engineer it all into existence inside the data center, by the time that you bring it up, we're already two clicks down the road.
So this is this is the pace that we put the whole c the whole industry on. And
it's it's it is frankly extremely extremely hard. And you know, we could
extremely hard. And you know, we could do it, but that's because of all the things that I just described. You also
know that every one of our systems is CUDA compatible. So on day one, I've got
CUDA compatible. So on day one, I've got yesterday's software that runs perfectly on this one.
I own all the scaleup switch. I own all the scale out switch. I own all the software. Do I not? So on day one, I
software. Do I not? So on day one, I take yesterday's software and I put it on the new system. If it doesn't work, what's the point? Then once we get it everything brought up because we own all
the software stack, then we can take it to the limit. And so having CUDA compatibility, we have this thing called Doka. Doa compatibility. We own all the
Doka. Doa compatibility. We own all the compilers. We own all the software
compilers. We own all the software stack. Really, really important. You
stack. Really, really important. You
can't outsource that to other people.
You know, somebody else is building it on your behalf. That is how do you bring up a system? They're not going to bring up your system for you. They're not
going to qualify for you, you know. And
so that's it. Can we take one more
that's it. Can we take one more question? Is that okay? Can you guys
question? Is that okay? Can you guys tolerate one more question? I'm enjoying
it so much.
Let me just Somebody's going to ask me a question where I have to choose the precise word.
Hair or a Did he say hair or a hair?
That's materially different.
Thank you. Thank you for extending the session and squeezing me in. Uh Jensen,
I just want to clarify one thing.
Oh, here it comes.
Oh dear, I changed my mind. I changed my mind. Everybody have a good GTC.
mind. Everybody have a good GTC.
Quick clarification. Um, does the one trillion plus include Reuben Ultra or not? And my question is, um,
not? And my question is, um, no, I got to stop you right there. No,
thank you.
No, no, absolutely not. And and, um, yeah, absolutely not.
Okay. My question is we talked a lot about inferencing um you know at this event. I I just I was hoping that you
event. I I just I was hoping that you could spend a couple of minutes on training. um in terms of how do you see
training. um in terms of how do you see the um you know the compute intensity growing um what will drive in your view over the next few years is it still the larger and larger models or is there
something else on the horizon that you see and uh I guess if you take a three to five year view what's your view on you know training versus inferencing mix in terms of compute uh demand thank you
um training went from pre-training to post-training pre-training is basically memorization memorization and generalization. The
more you memorize, the more you the more you memorize and generalize, this the better foundation you have. Once you
have that foundation, that's why it's called pre-training. It's kind of like,
called pre-training. It's kind of like, you know, AI kindergarten. Okay, it's
more in kindergarten, but AI high school. And so, so now you have the you
school. And so, so now you have the you have the pre-training, you have the basic vocabulary and grammar and a lot of hidden reasoning capability that when
I teach you new skills, you'll even understand it. So now when I go tell you
understand it. So now when I go tell you to go solve a math problem or uh write write code or try to write code, you I you actually understood what I meant. If
you don't even understand what I meant, how can you possibly even attempt at doing it? And so so pre-training does
doing it? And so so pre-training does that.
Post-training teaches you all kinds of skills. Okay. And reinforcement learning
skills. Okay. And reinforcement learning um reinforcement learning with executable grounding, reinforcement learning, verifiable feedback. Um the
whole bunch of technology techniques for um batch badge oriented reinforcement learning um you know tool use. I mean
the list goes on and on. Okay.
Structured based APIs, unstructured based tool use. I mean there's just there's a whole lot of domains and and that part computing intensity
I'm going to guess probably a million times more than pre-training you know I'm probably off by a factor of you know 1.2 too, you know, but it's a lot. And the reason for that is because
lot. And the reason for that is because there's a lot of skills to go learn and all these skills the roll out is really, really long. And so the models have to
really long. And so the models have to get larger and larger. When you get good at these, when you get this get good at these, you take all of that synthetic data and some of it you're going to push
back to pre-training next time.
And so yesterday's pre-training started all from internet data. Today's
pre-training is mostly internet data. In
a couple of generations, pre-training will be mostly synthetic data.
Meanwhile, you're adding multimodality to it. Meanwhile, you're adding motion
to it. Meanwhile, you're adding motion to it, long roll out physical actions to it. And the reason for that is because
it. And the reason for that is because there's a lot of common sense that's cognitively logic related that if you were able to interact in the physical
world, you could deal with that concept a lot easier even in the abstract world.
Okay? Because you you actually have grounded experience in the physical world. And so notice the amount of
world. And so notice the amount of computation that I just described. You
know, we're a million a billion times future amount of computing necessary for training and then after that continuous learning. So almost everybody's model
learning. So almost everybody's model will be lastly trained, fine-tuned so that it could also be memorized and
generalized per person. And so um in the future basically where inference starts and ends and where training starts and ends will become blurriier and blurriier just kind
of where when are you learning and when are you applying your wisdom well in most people's most people's cases is continuous now and so so I I think I
think that's kind of gives you the three phases of it in with respect to inference versus training let me tell you my hope my hope is that 99% % of the world's compute goes towards inference.
And the reason for that is because inference is where we translate tokens generated to economics.
Nobody pays you for learning. Nobody
pays for training. You pay for training.
I want the world to be able to use these tokens for valuable outcome impact impactful outcome for health care, for manufacturing, for financial services, for right, for engineering, for right,
you name it. Isn't that right? And so we want the world that's our hope is that 99% and you know if our dreams come true
100% of the future tokens are going towards economic benefits while the AI models are learning and so it's it's
there's a really good reason why Nvidia went all in on inference last year and the reason for that is because we see this future where inference and training and pre-training and learning and all
that is just one big continuum. It's not
as if, you know, go back and read two years ago the stories people write.
Nvidia really good at training.
Inference is easy. Any company could do that. And therefore, do you guys
that. And therefore, do you guys remember that? Inference is super hard.
remember that? Inference is super hard.
Look at this chart.
It's super hard. It's getting way it's going to get way harder.
Inference is thinking. It's working.
It's doing things. How could that be easy? I thought my life was easy pre
easy? I thought my life was easy pre high school, not post high school. Oh,
you know, pre high school super hard.
After that it was a, you know, after that was super hard. And so, so I I think people just got it all completely backwards and they just they wanted to
make up stories that that rationalized, you know, their opportunity, which is fine. Um, but
fine. Um, but you had you had reason about it from first principles and and you know I I take a long time answering questions for
you guys instead of a short highly curated super well selected precisely adjusted verbs and nouns. Um,
and the reason for that is because I want you guys to learn how to reason through these things. So when you see it yourself, you go, "Nah, that doesn't make any sense." Or, "That makes sense."
Or, you know, we could we could um because you're you're analysts. You're
you need to be able to understand these things. Okay. All righty, guys. Thank
things. Okay. All righty, guys. Thank
you very much. Thanks for coming to GTC.
Loading video analysis...