Class #2 | MS&E435: Economics of the AI Supercycle Stanford University Spring '26 Apoorv Agrawal
By MS&E 435: Economics of AI
Summary
Topics Covered
- Inference time reasoning will drive a billion-x explosion
- NVLink Fusion: Combining Groq and Nvidia yields 2.5x more tokens
- AI revenue proves the product works: Anthropic added $10B in one month
- AI leaders unanimously: We're nearing the end of the exponential
- IQ gets commoditized and EQ becomes super valuable
Full Transcript
The premise of the class today we're going to talk about everybody knows how software ate the world. Software
produced had near zero uh incremental cost of distribution. That is not the case with AI. More users on AI apps require a lot of compute and so they're
not uh it's not near zero.
The that's the topic of our discussion.
We're going to do a presentation uh by the by the group. Uh we're going to do a fireside chat and then we'll open it up for questions.
So, without further ado, I'm really excited for today's guests. Our first
guest uh is Brad Gersonner. Brad is the founder and CEO of Ultimter. Brad
started Ultimate uh with a few million dollars from friends and family. Today, 18 years later, Ultim manages over $15 billion across public and private markets. You
know, Brad, I've known you for a little bit, and the single consistent thing I've known about you is that the best investors have invested across super
cycles, across up markets, down markets, recession, crisis, co and Brad has done all of that and more. Brad started his career, trained as a lawyer, helped
start general catalyst uh back in the dotcom era, started a couple of businesses uh after that. uh Ultimter
being the fourth and at Ultimter you know early in the internet early to Google early with mobile and in Meta and many others early to cloud and software
led our investment in Snowflake Confluent [ __ ] Gitlab and now with AI one of the largest investors in in OpenAI and Enthropic which I know you
guys from last week love in Nvidia and he was on the board of Cerebras in letter investment in Grock which we're going to get into deep today. And
outside all of that, perhaps the most important movement that Brad has started is Invest America. Can I get a quick show of hands? How many of you have heard about Invest America?
Wow. Look at that.
Yeah, look at that.
Got a lot got a lot lot of opportunity, it looks like.
That's right. That's right. In brief,
Invest America is uh a federal legislation creating an investment account at the time of birth for every child born in America. The biggest
impact invest America is going to have according to me is independence away from dependence from our state and making every child in America an owner of our economy.
Brad, I have the great honor of calling you my mentor, coach, and partner. Thank
you so much for doing it. Please join
us. It's great to be here. Thanks for
having me.
And you know, a special thanks to Dr. Goell for greenlighting the class. I
think is a really important one. And you
know, I'm lucky enough to have my son junior in high school sitting here today. And you know, in in a lot of
today. And you know, in in a lot of schools today, in particular, colleges on the east coast as well. You know,
there's this people don't really know what to do with AI. And I say all the time, you got to make yourself bionic with AI, right? like you can't consume enough AI today because it doesn't you
know it used to be you go to this school you get a job with a place like Grock or Altimeter and today I don't really care where you went to school I want somebody who shows up and delivers abnormal value
bionic value right and the way you do that is going to be leveraging the latest technology so I'm glad that you are enabling the students to uh sit at the intersection of such important
topics you know I'm going to introduce Sunny And then I'm going to share a couple slides and then I'll invite Sunny up. But you know, I was thinking Sunny
up. But you know, I was thinking Sunny and I have been great friends for a long time. You know, we uh we play we we're
time. You know, we uh we play we we're going to play poker tonight in the all-in poker game here in Silicon Valley. So, we're buddies inside and
Valley. So, we're buddies inside and outside of work. But I was thinking about the introduction. Then I asked Chat GBT and Claude to give me an introduction. And you know, CatchBTs
introduction. And you know, CatchBTs wasn't great to be perfectly honest. And
Claude's blew me away. So, I figured I'd just read to you, you know, what what what Claude had to say about Sunonny.
So, our next guest is a serial entrepreneur who apparently can't stop getting acquired by bigger and bigger companies. And honestly, the trajectory
companies. And honestly, the trajectory is incredible. He co-founded Extreme
is incredible. He co-founded Extreme Labs, a mobile development shop acquired by Pivotal. Then he co-founded Auto
by Pivotal. Then he co-founded Auto Autonomic, a smart mobility platform acquired by Ford, which he became the VP running Ford X, their internal innovation lab. Then he co-founded
innovation lab. Then he co-founded Definitive Intelligence that was acquired by Grock where he helped where he became president and helped launch Grock Cloud. And then Nvidia bought the
Grock Cloud. And then Nvidia bought the platform of course recently for $20 billion, their largest acquisition ever.
So if you're keeping score at home, Pivotal, Ford, Grock, Nvidia, the man's career is basically a spa that only goes up according to Claude, unlike Chamas.
Okay.
He has a computer engineering degree from the University of Ottawa which proves that even Canadians can disrupt things when they put their minds to it.
Please welcome Sunny Mudra.
Um so I like to share a couple slides to to set the context for the for the moment that we're living in. and inference. The
conversation we're going to have today is really a subset of this important conversation. But but this is global GDP
conversation. But but this is global GDP per capita over the course of the last 2,000 years, right? And a and if you look at that, you realize that basically
for 1800 years, nothing happened. It was
survival, right? There was no excess productivity
right? There was no excess productivity from a fixed amount of labor and capital, right? It was what we could use
capital, right? It was what we could use to survive. And then all of this stuff
to survive. And then all of this stuff starts happening in the 1800s and 1900s.
The number of years it takes to double GDP and think like I I like to say GDP is what creates the excess in life for
enjoyment, right? It's beyond survival,
enjoyment, right? It's beyond survival, right? The surplus that we all have. And
right? The surplus that we all have. And
so the number of years it takes to double GDP, right, has plummeted. And
now we're doubling global GDP or you might think of it as quality of life every 25 years.
But you may say, "Well, Brad, what does GDP have to do with anything?" Well, it has to do with everything. So, what
happens when you have higher rates of of GDP? You have lower rates of poverty.
GDP? You have lower rates of poverty.
You have higher rates of basic education. You have higher rates of li
education. You have higher rates of li literacy. You have more democracy, more
literacy. You have more democracy, more freedom, higher rates of vaccination, few, you know, uh, lower child mortality. So it turns out that
mortality. So it turns out that innovation in and of itself is a societal good and it happens to be correlated and accelerating. So
technology as an investor has gone from 5% of global GDP to about 13% of global GDP. And if I asked you guys 10 years
GDP. And if I asked you guys 10 years from now, are we going to be at below 13% of global GDP or above 13% of global GDP? I think you would all say that
GDP? I think you would all say that technology as a percentage of global GDP is going to be a lot bigger number.
We're sitting in the heart of Silicon Valley. Technology outearns
Valley. Technology outearns non-technology. So the the dotted blue
non-technology. So the the dotted blue line here is this is the NASDAQ has compounded earnings per share at 15% for
the last 10 years compared to 6% for non- tech companies. So why do technology companies tend to be better investment than non- tech companies?
Because they compound their earnings per share faster. Again, I think will be
share faster. Again, I think will be accelerated by AI. And of course, AI is going to massively accelerate all of this because when we look at all the
knowledge work in the world, the TAM for it is measured in the trillions.
Demis said it well.
It'll be 10x the impact of the industrial revolution but happening at 10x the speed probably unfolding in a decade rather than a century. So I think that is the context like what we're
doing here and the acceleration that will come with AI should be something that's better for all of society. We're
going to have to talk about the guardrails and the societal change we're going to have to make to to to be that.
But sitting at the very root of all of this is compute. You know, you guys all know that the atomic unit of AI or
intelligence is the token, right? And
there's nobody better able to talk about the production um of this atomic unit than Sunny. And so Sunny, I want to kind
than Sunny. And so Sunny, I want to kind of go back in the wayback machine a little bit. You know, tell us what Grock
little bit. You know, tell us what Grock is, right? and what your observations
is, right? and what your observations were in 20 23 and 24 about what was going to happen um with inference.
Yeah. Uh just a a little bit of a quick background. So Grock was founded by uh
background. So Grock was founded by uh Jonathan Ross. Jonathan Ross was the
Jonathan Ross. Jonathan Ross was the creator of the TPU at Google. Um and
Jonathan Ross's background is interesting. He was a high school
interesting. He was a high school dropout. Uh not cuz he couldn't complete
dropout. Uh not cuz he couldn't complete it because it was like probably too boring for him. and went straight from being a high school dropout and probably complete his GED or something and went
straight into like a PhD math program at NYU. Um, and then gets recruited into
NYU. Um, and then gets recruited into Google and like every great engineer over the last 20 years was made to work on like ad optimization or ad testing
which is terrible um in some ways. But
what he did was um he listened to a talk by Jeff Dean and Jeff Dean had come in and basically said hey um good news bad news good news we've think we've found an algorithm to solve automatic speech
recognition which could be useful in many places bad news there's not powerful enough compute so we can never run it and Jonathan took it amongst himself to come up with a design um and
coming you know from a completely different area design using an FPGA for the first version of what became the TPU and And then ultimately, you know, Jonathan left Google because he thought he, you know, the rest of the world
should have this. It shouldn't just be embedded inside Google. And so he left and started. And so quickly, you know,
and started. And so quickly, you know, what Grock is um and continues to be inside Nvidia as well. It's a chip that's designed with a data flow architecture. And what makes it very
architecture. And what makes it very significantly different than any other computer architecture is that it's fully deterministic. So handinhand with the
deterministic. So handinhand with the architecture is a compiler. and a
compiler which predetermines where all the calculations are going to happen.
And that last bit is really important because the you know the underlying thing to any AI problem and token generation is lots and lots of math. And
that's why we're seeing this compute explode and I I you know implore everyone to go look at the following. Uh
you know we talked about one of Brad's great investments snowflake. Snowflake
is a you know database retrieval company right and so you have to go get a record and bring it back. And if you look at the number of cycles it takes to do that and compute cycles, you can really see and it's not a really large amount, but
you look at the number of tokens it takes to gener uh number of compute cycles or flops it takes to generate a single token. It's mind-blowing. And the
single token. It's mind-blowing. And the
best way to think about it is it's usually the parameter size of the model times the context length squared, right?
And so that's for each token. And you
know, you're doing something and have lots and lots of tokens. So we're in this era where we have this incredible technology, but it's incredibly compute intensive, several several orders of magnitude larger than any other
computing paradigm we've had before.
You're at your own startup, right? You
have a conversation with Jonathan about merging. I was an investor in in
merging. I was an investor in in Cerebrus, which is also building a fast inference chip. Grock was building a
inference chip. Grock was building a fast inference chip.
These two companies had been in existence for upwards of 10 years. Um,
and the extraordinary thing is like in year nine, they're they're they're both fighting for survival. They're not
thriving, right? Like, and they're they're they're building for a market that didn't really exist. Okay. But you
saw something, you know, Jensen came on my podcast, BG2, and he said, "Everything just changed." I said, "What do you mean?" He said, "Inference time reasoning." He said, "We've gone from
reasoning." He said, "We've gone from pre-training models to inference time reasoning." and inference is about to 1
reasoning." and inference is about to 1 billionx.
So not 10x, not 100x, not a million x, it's going to a billionx. And our
systems of compute are not designed for what's coming. I remember you and I had
what's coming. I remember you and I had a conversation and shortly thereafter you helped broker, you know, kind of this vision for Jonathan that said, Jonathan, I think I see your future more
clearly than you do. So tell us about that moment.
Yeah, I think, you know, at that moment a couple things are happening, right? So
um the market had been dominated by Nvidia um because NVIDIA is what the researchers use to create the models and so naturally when in part of creating a model inference is the forward pass of
training right in the back prop that's what's different and so um you're always doing inference when you're when you're creating models and so it's very natural to just run it on the same hardware that you've created the model on and so one
of the things that you know we saw with the Gro architecture was that we could complete inference much more efficiently Right. And so if you look at our V1 chip
Right. And so if you look at our V1 chip which we you know put into the cloud and I'll get back to in a second that's a silicon designed in 2018 silicon from
2019 14 nanometer and super competitive against hoppers right which is you know five generations newer in terms of silicon technology. And so really what
silicon technology. And so really what we saw was we thought it would be very difficult for convincing people to buy our hardware and use it. But if we built a cloud and put it in the cloud,
developers really don't care, right? If
there's an API and developers, you know, we've seen they're quite funible there.
So our big insight was take these things start putting them in the cloud run the data centers and make them available via an API and make the best open source models available um for everyone and even including open AI models like open
had whisper which was open source from the beginning and so we had put a lot of those models there and that's what really took off and we launched the cloud and within a few weeks we went to a couple hundred thousand users and
today it's like something at 4 million users and it took Nvidia almost 17 years to get to you know 7 million users so effectively reasoning models come
along. Reasoning models h have are much
along. Reasoning models h have are much more voracious in their token consumption. This is even before we get
consumption. This is even before we get to agents. This is just deeper thinking
to agents. This is just deeper thinking um than what you know oneshot pre-trained models were doing. And so
when you looked at token consumption curves, they were just going parabolic.
And our hardware, our clouds were starting to break. You know, OpenAI only had a gigawatt of compute. Anthropic
only had a gigawatt of compute. So we
had to figure out how to make both more token efficient models but also more token efficient architectures. So now
remember Cerebras and Nvidia were big-time competitors. Nvidia and Grock
big-time competitors. Nvidia and Grock were perceived as big-time you know competitors. So Sunny you sent me a text
competitors. So Sunny you sent me a text and said I have an idea. We we we at the time were major shareholders and still
are in Nvidia and um and and good friends with Jensen and you had an idea and the reason I want to point this out is just like how one person's idea you
know we see these big transactions but sometimes we don't unpack like that it's just one decision on one day that causes these things to occur. So yeah, what was
your you know you had a vision that was pretty orthogonal um when thinking about Nvidia. Tell tell us about how that came
Nvidia. Tell tell us about how that came to be.
Yeah and you know u Brad you did lead that email so that that was awesome. Um
but basically when we were looking at the problem of inference even as grock what became obvious to us is if you started to dissect how inference works there's first a dissection which happens
between say prefill and decode right and so um many people were starting to do that where you basically you know use a separate set of machines for you know prefill and another set of machines for decode and you can basically get some
efficiency or lots of efficiency by doing that. Um what we further did and
doing that. Um what we further did and this is like a good lesson for everyone.
We further looked at you know prefill and decode and within the decode we realized that we could disagregate the decode because within the decode there's many different functions that are happening and some of those functions
are compute intensive and some of those functions are memory bandwidth intensive. And so one of the big
intensive. And so one of the big differences with grock over a GPU is GPUs have lots and lots of compute and lots of external memory which is HBM for
them which is slower. We don't have a lot of compute on grock chips. We have a lot of SRAMM and that SRAMM is very high bandwidth almost more than an order of magnitude faster. And so typically on a
magnitude faster. And so typically on a CPU, you'd see that as like your L1 cache, but we have you we have lots of that in our chips. And so when we looked at the problem and what the email to to Jensen was about was basically
connecting to their chips via something they call NVLink. So NVIDIA chips speak to each other via protocol called Envy.
And that allows you to basically not run something on a single GPU. you can run it on lots and lots of GPUs together. I
think today we have 72 you can do and we're scaling up to 576. Grock has a similar protocol and we've been running thousands of chips together. In fact, we had many models that we were running on
four to 8,000 chips at a time. So
basically NVLink Fusion was a way for us to allow our chips to speak to the Nvidia chips. So we could take part of
Nvidia chips. So we could take part of the problem which we knew the Grock chips were faster at and more performant at and run it there. And the net result of all that is if you take the same footprint of power you can get two and a
half times more tokens out by basically you know combining those two systems together which in today's world of you know constrained compute is really valuable.
So Sunny sends me a text and he said I think we can partner with Nvidia.
That in and of itself is a pretty big change because if somebody's your your your chief competitor you know the idea that you can partner with them is is a pretty big change. He said, "Would you mind sending Jensen, you know, a text?"
And I'm thinking to myself, "Man, you know, I I'm going to spend some political capital with Jensen, so like I I need to know that this isn't a crazy idea." And so I kind of sit on it for a
idea." And so I kind of sit on it for a week or something. And then Sunny texts me again. He's like, "Have you sent
me again. He's like, "Have you sent Jensen, you know, that text yet?" And so I said, "Okay, I'm going to send it to him." And Jensen immediately got back to
him." And Jensen immediately got back to us and said, "Interesting idea, like, you know, let's let's have a chat." And
you guys started working with him. And
what was really compelling I think to Jensen was you had you know obviously somebody had built a competitive chip but they had mentally thought about how
can we produce a lot more tokens together. So what what Sunny just said
together. So what what Sunny just said is really important.
Open AAI is got a fixed footprint of let's call it a gigawatt that they're going to take in September of Vera Rubin's in in one end of the factory goes power and chips. You obviously have the
and chips. You obviously have the building, all the costs, and out the other end comes tokens. Okay, when they bought Grock for the exact same power
footprint, for the exact same building, they're now generating two and a half times the number of tokens. And the
constraint we have in the world is power and memory. So if you can double or
and memory. So if you can double or triple the amount of tokens for the exact same footprint, it leads to an enormous economic outcome for OpenAI or
for anthropic. And so you've seen as the
for anthropic. And so you've seen as the demand on inference because of inference time reasoning and we'll talk next about agents as the demand for these tokens of
intelligence have exploded and literally we're consuming tens of trillions of tokens now per week around around the world. We've had to come up with more
world. We've had to come up with more power, more chips, more of these inputs in order to produce those. And so it's not just about fast chips. It's also
just about or uh fast inference. It's
just about the ability to get more tokens into the world in in a world that is constrained. Yeah. How many days from
is constrained. Yeah. How many days from the time you know you showed Jensen a working system?
Yeah.
How many days from that until him greasing you with $20 billion?
Uh probably just over a month.
Yeah.
Yeah.
Yeah. 30 days.
Yeah. And Jensen is like, you know, do they have any competitive efforts going on at NVIDIA?
Yeah. I mean, I I think like the, you know, Nvidia and you see it and we talked about it at GTC. Nvidia has an ecosystem already of seven chips and five different racks, right? So Nvidia
is no longer making like a GPU and I think that's what is one of Nvidia's superpowers that they've started to look at disagregating the problem in all different ways whether it's storage, whether it's CPUs, whether it's you know
uh compute or networking chips. And so
that already exists. So they had already thought about building a decode only chip and something that was powered by a lot of SRAMM, right?
But I think sort of the and you know it's a good lesson for everyone like us putting that email in starting to work together and building a prototype that was working with their systems was a real proof of concept for them and and
you know in these large systems making these things work together making them performant and this is you know across two different companies with two different completely different stacks. I
think when they saw that we were able to do that, it showed that, you know, we'd be a good integration. And the last thing I'll say is we're two very different types of companies. I think,
you know, if we were kind of making a better GPU, there'd be a lot of conflict within Nvidia, you know, after after the type of deal that we did. But because we were making this SRAMM chip deterministic compiler based, which is
completely different than how GPUs work, it's very complimentary for the cultures and the engineering teams to come together as well.
How many people in here you have used open claw?
I mean, that's pretty incredible penetration. I saw a stat Mark Andrea
penetration. I saw a stat Mark Andrea may have tweeted this today, you know, that most of the people he talks to are somewhere between a hundred and $1,000 now a day.
Yeah.
On on token consumption with OpenClaw.
And he said basically the next 20 years of Silicon Valley is going to be producing technologies to drive down the cost of intelligence. Right? And so I want to talk about that. Sunny,
if we look at the cost of inference, it's dropped by basically 90% over the course of the last year. It's dropped by closer to 99% over the course of the
last two two and a half years. So talk
to us about the the the input like what's driving the unit cost of inference and if I take a like for like let's call it a unit of intelligence
whether it's a basic question I ask or whether it's a little bit more complicated question I ask do you expect that unit cost to continue to go down and if so why what are the inputs to
that unit cost so the the inputs are I think the following three major things the supply chain right? Like you know what can you
chain right? Like you know what can you do across the supply chain which is you know mostly centered around like Taiwan today um you know TSMC and the different packaging technologies and the
lithography technologies that they buy from others the innovation that your engineers can perform right and and what I would say um is like the you know the
amount of power you have right and so like those are the kind of things we're talking about and so what what we see today is um you know, lithography technology is starting to reach a limit,
right? We're not we're not going as fast
right? We're not we're not going as fast as we used to, right? And so we're not getting sort of the the Moors law, but so we have to exceed that. So we're
exceeding that in a couple different ways. We're exceeding that by making
ways. We're exceeding that by making bigger and bigger chips. And so if you see that, you know, these chips become quite large u now, which is very exciting, but also lead to a lot of
interesting issues. Cerebras, you know,
interesting issues. Cerebras, you know, as Brad's been talking about, you know, their chips are kind of size like a pizza box, right? versus you know CPUs you guys all would have seen and so there's a lot of energy and technology
there is like how big of a package can you make and how much silicon can you pack in there then there's the innovations and so the innovations is really where we're seeing most of this work happen right because that's
handinhand with the models right you know we're seeing this really interesting force today and there was a bunch of stuff that I don't know if it was leaked or put out there I think Elon is at at the center of some of that
which is they're discussing like these newer models are approaching like 10 tr like 1 trillion to 10 trillion parameters and those 10 trillion parameter models go back to that first thing I told you that's in the
fundamental flop calculation of how how much compute it takes to generate a token so as fast as companies like us are making better and better technology you know through lithography upgrades
through memory bandwidth upgrades um through innovation and you know how we lay out our circuits through quantization efforts MVFP4 was another one the models are getting bigger and then the demand is increasing ing. So,
it's a I'm going to put it back to you.
There's this like this three kind of it's a cube, but it's it's really difficult to navigate right now because all the factors are growing in ways which are uh really challenging. So, the
demand keeps going up, the models keep getting bigger, and as fast as we're innovating, even if we get a 50x over five years, the models and the demand faster. And that's why we're seeing, you
faster. And that's why we're seeing, you know, this unique phenomenon like H100 prices, if you're building a startup or using them, they're going up. In fact,
yes, like one of the things that I think is important for everybody to understand, I mean, when OpenAI and Anthropic started, their their gross
margins, right, on the businesses were highly negative.
Okay, so that's a scary thing to do. Go
raise a lot of money and it basically produce a widget for a dollar and you're selling it for 20 cents and you have a big negative gross margin. But why was
that? Right? They were going out and
that? Right? They were going out and they were charging you all to use chat GPT or uh you know their APIs were charging a certain amount of money. They
weren't that capable. So there was only so much money we were willing to pay, right? And two years ago the cost of
right? And two years ago the cost of inference was a lot higher. But the bet they were making is that the cost of inference would come down a lot and your
willingness to pay would go up a lot as intelligence got a lot more valuable. So
you know I like to think of the first inning of AI was just getting to a place where we could yield answers right in code generation it was basically like
autocomplete tab complete in the case of chat GBT it was like you know basically telling a slightly better version of Google but now we're entering into this
phase of action right where agents do things on go build me an app go me a website figure out how to resolve this customer service problem, sell more of
my product, find a cure for cancer, book me a hotel in New York. It starts doing things and when it does things, the amount of tokens it has to consume in order to do those things explodes by an
order of magnitude, but the value delivered to the end consumer as a unit of of of intelligence goes up by 100x.
So, your willingness to pay goes up dramatically.
Can I add one? Yeah, you know this week we saw mythos which is the unreleased model by Anthropic find a bug in BSD which you know think
about how many engineers and software developers and everyone else and you know PE companies using that have looked at that code. So we've gone to a place where it's doing things beyond human capability.
Correct.
Which is and we're in we're in year three.
Exactly.
We're in year three. to give you another, you know, like what's my best evidence to convince you of the value of AI? Well, my best evidence is that
AI? Well, my best evidence is that Anthropic in the month of March just added $10 billion in re in in in annualized revenue in a
single month. Okay, that is the total
single month. Okay, that is the total amount of annual revenue for data data bricks plus palunteer combined and they added it one month
and they didn't add it because they hired a million salespeople went out to a million companies and convinced them to buy their product right they added
because their product crossed a threshold of intelligent capability that millions of customers around the world said I have to have this product to make
my company better the amount that alttimeter is spending what but millions of self-interested actors around the world independently made a judgment I
have to buy a lot of those tokens a lot of those capabilities both claude code and co-work and the same thing is happening in open AAI um not quite you
know on the same exponential in terms of revenue but I think for me this was a little bit of an Oenheimer moment. This
was a little bit of the splitting of the atom. Like we've heard Daario and Sam
atom. Like we've heard Daario and Sam talk about the exponential or the end of the exponential on intelligence. But the
big question was, are they going to be able to afford to continue to build the compute in order to keep up with this? I
had this somewhat uncomfortable moment with Sam Alman on my podcast, the BG2 Pod, that went a little viral when I asked Sam, "Hey Sam, you've made $1.4
trillion of spending commitments, but you only have 13 billion of revenue. So
explain to me how that works. Like, how
can you commit to spending 1.4 trillion, you have 13 billion of revenue?" And I had hoped that Sam would make the case that his revenue was going to go up a lot. and these were kind of call options
lot. and these were kind of call options and he could renegotiate them. But
instead he said to me, "Well, if you don't like if you don't like your investment, I'll buy back your shares."
Which was not exactly the response I was hoping for out of Sam in the moment. But
that was the question heading into 20 2026.
My podcast partner Bill Gurley, a lot of other people highly skeptical saying this is an AI bubble. These guys are spending at rates they're never going to be able to pay the bills on because
there aren't people on the other end willing to pay for the products to justify that level of spending. And what
happened in January was Enthropic had a three $3.5 billion month. In February
they have an $8 billion month and in in March they have a $10.5 billion month.
That to me said, "Oh, everything's changed. The product is now sufficiently
changed. The product is now sufficiently good that you have revenue scaling on the same exponential as intelligence. So
they can afford to pay for the $50 billion per gigawatt to stand up all of these inference factories to produce all this, you know, kind of collective
intelligence. Just react to that, Sunny,
intelligence. Just react to that, Sunny, because our group talks a lot about this. There was a lot of debate in our
this. There was a lot of debate in our group and on on the all-in pod and others as to whether or not this was a bubble. Yeah, I'd say there's like kind
bubble. Yeah, I'd say there's like kind of a couple things that, you know, maybe the broader world doesn't see yet. One,
the models that we see today haven't even been trained on the latest hardware, whether you want it to be, you know, um, Blackwells or Veras are just coming out, right, or Reuben's are just coming out. Um, so we haven't even seen
coming out. Um, so we haven't even seen that yet. And so we haven't seen the
that yet. And so we haven't seen the capabilities that you get and so we'll we'll start to see that. I think one of the the first ones we'll see is, you know, the stuff out of Elon's Grock, right? So that's a. So the capabilities
right? So that's a. So the capabilities you're seeing here are things that were done on older hardware. So that's a so when you're inside the ecosystem, you know what's capable and what's coming next, right? I think b um one of the
next, right? I think b um one of the things that is really starting to take off and I think anthropics done an incredible job here. And I think you know codeex does an equally incredible
job on very hard kind of software problems is that there's not just a chat interface that majority of people are interacting with. It's not just an API,
interacting with. It's not just an API, but they've created like a harness around the models. And those harnesses, OpenClaw is just another harness as well. Those harnesses have figured out
well. Those harnesses have figured out how to extract more and continually extract. I think, you know, with cloud
extract. I think, you know, with cloud code and co-work, you can have it just ping you whenever it's stuck on your phone, even if you started somewhere else. And so, it can kind of be in this
else. And so, it can kind of be in this continuous loop and it's working for you all the time. We've never had anything like that. When it's doing that, you
like that. When it's doing that, you take that token consumption of like the, you know, you were doing a query before and it was doing some thinking and coming back. Now it's just working all
coming back. Now it's just working all night long and pinging here. You tell it don't even bother me, keep coming back.
So, we're seeing these harnesses really extract more and more tokens out of it as well. and the type of problems that
as well. and the type of problems that people are solving. We gave the code problem, you put a bunch of other ones, but like you know inside big businesses and you know I tweeted this so I think it's fair but like inside Nvidia now we
have this thing called the Nvidia personal assistant and it's connected to Slack, it's connected to Teams which you know sucks but um uh it's connected to our our email and it's connected to all
our you know files wherever wherever they may exist. And so every morning it you it runs and it like figures out like all your task items for the day. You can
have it answer those things and and it's really incredible. And so you start to
really incredible. And so you start to the the way we work and we were talking about this earlier with someone's like you don't even write email now like the someone else's agent is going in their emailing you and your agent is looking at emailing them back but a lot more
work is getting done because my time is freed up from basically answering emails all day long and approving things out of all these traditional SAS systems. The agents handle all that. So the explosion
to your point is just in the first or second inning. The amount of tokens is
second inning. The amount of tokens is is really just going up. So we don't we don't fear that. We don't look at that as like an overbuild in any way, shape
or I I I think the facts and evidence on the field is number one the cost of both training and inference um but inference in particular is plummeting and
continues to plummet. That shouldn't be altogether surprising. technology
altogether surprising. technology ultimately is highly deflationary. I've
never seen something this deflationary this quickly. I think it's a byproduct
this quickly. I think it's a byproduct of extreme code design. It's not a single chip. It's a factory. And across
single chip. It's a factory. And across
the factory, there are all sorts of Moors laws playing out combinatorally across the factory. At the same time, when you're able to produce a lot more
tokens, the unit of intelligence that you're delivering is much more valuable.
So the willingness to pay on the other end goes up a lot. And I'll tell you for an open AI or an anthropic today, if those guys were at negative gross margins a year and a half or two years
ago, they're now at very positive gross margins. Right? So all of a sudden, this
margins. Right? So all of a sudden, this business that looked diseconomic looks highly economic today. So it's kind of it's resolved a little bit that
question. Maybe just, you know, Sunny, I
question. Maybe just, you know, Sunny, I want to finish our our section with maybe just a little forecast and pre-wire you guys to we're going to open it up to questions. It can be about the
economics of inference or any part of the stack or any other questions, you know, that you all have. But you
mentioned mythos. It's a model out of, you know, that came out this week was not generally released, uh, but was sandboxed by Anthropic.
Tried to get out a few times.
try tried to escape the sandbox, you know, trained on kind of TPU7. On the
other hand, you have Spud or 55 coming out of OpenAI probably this week or next, which is a first Blackwell trained model. Elon's going to have one. Meta's
model. Elon's going to have one. Meta's
just out with a model, you know, yesterday, Google, etc. Talk us through, you get to see into the product pipeline
at Nvidia. Is there do you think that
at Nvidia. Is there do you think that the pace of the you know the cost of inference curve continuing to come down do you think that continues for the next
several years? Do you think that the
several years? Do you think that the step function or the exponential if you will of both pre-training and inference time reasoning in terms of improving the algorithmic capabilities of intelligence
continues? Well, I can tell you, you
continues? Well, I can tell you, you know, kind of having a chance to work with Jensen now, like he challenges us in everything we do to not show up unless it's 100x. So, whatever, you
know, we bring to him and I I you know, can't get into too many details, but like his first challenge back is is this 100x from what you did before. So he is challenging the engineers to take a look
at every part of the problem you know from all the way you know down into memory controllers or memory capacity or circuits whatever it happens to be to make sure we 100x everything. So on the first part of your question yes because
he pushes us to do it and he gives us the latitude to do and he gives us the resources to go do it. So I can tell you like the types of things that we've been enabled in coming in as the Grock team things we could never do as a startup
but Jensen has enabled us to do those things. Are you guys harnessing AI
things. Are you guys harnessing AI yourselves to design the next generation chips a ton right we were doing that even before because we needed to we were a small team but now we have access to you know sort of the entire ecosystem of
things that are available so I think that's a right is that we're we're being pushed to do it on on the related side though the the more we innovate the more the model makers innovate and the bigger
the models get so this is this and so um which means the capabilities that are coming out are better so We continue to need that buildout. So, you know, we'll all look back and we'll thank, you know, there's a couple companies that changed
the footprint of the internet for us.
And you could talk more about this than I can even, Brad, but like the work that Google did to build the infrastructure they did for video, for search, it really paved the way for the rest of the internet, CDNs, all types of other
things, right? And so, a lot of this
things, right? And so, a lot of this work that's happening to build out this infrastructure will pay benefits. And
you need that to continue to happen because it can't just be in the innovation of the chips. It's like you need more and more infrastructure to be built.
I I I'll wrap with this. You know, we have the great privilege of talking with Jensen or Elon or Sam or Daario and you guys all can read about, you know, kind
of the personal battles they have between the there's, you know, uh some some days, uh not a love not a lot of love lost between them, uh in the race
to, you know, to AGI. But right now, I see amazing uniformity. When I talk to them, they all in a non-hyperbolic way say, "We're there and we got there
faster than we thought. Like, we're
nearing the end of the exponential." And
if you ask Daario, Daario, what is the most surprising thing to you right now?
He says, "We're almost at the end of the exponential." And like people don't even
exponential." And like people don't even seem to realize it. And if you ask Sam, he'll say the same thing. And if you ask Elon, he'll say the same thing. That
shouldn't be scary to any of us. It just
means that we're in this recursive place where you know we have AGI and the job of everybody in this room including the folks sitting up here is going to be how
do we harness this technology for the betterment of all of us right which is going to require going back to what a poor have said about the invest America
act you know Daario says the accumulation of wealth that's about to occur people call it the age of abundance we're going to enter into, right? That's going to be easier than
right? That's going to be easier than ever, but the distribution problems we're going to encounter are going to be harder than ever, right? And so that was
really the inspiration behind the work I've done on Invest America and the work that I think we're all going to have to collectively do around the social contract, the intersection between public policy and technology because
when the exponential looks like that and all of a sudden you have uh agents that are going to be able to have more capability than kind of collective human intelligence and it's happening at an
accelerating rate. Remember all the
accelerating rate. Remember all the stuff we've talked about has occurred with almost no compute.
Anthropic and OpenAI are going to add more compute this year than all the labs put together for the last decade. Okay?
And the year after that, they're going to double it again. So, you know, I I think the rate of change is is is fairly parabolic. And that to me is both
parabolic. And that to me is both exciting. I'm an optimist about what's
exciting. I'm an optimist about what's to come, but I'm I you know, I'm not polyianic about the challenges that come with that rate of change, right? like
it's going to require active engagement like it has in other periods in history around the industrial revolution, the digital revolution, etc. because it's going to exact a lot of a lot of change
on the world. But with that, I just want to say, you know, it's been extraordinary watching Sunny orchestrate the work that he's done at Grock. He's
an incredible thought leader uh in this whole area. Um I appreciate you coming
whole area. Um I appreciate you coming in, but but maybe just open it up some questions and hopefully we can cover a lot of territory.
right here as a marginal benefit of just how do you suggest everyone positioning themselves and to make sure we're not
just like wasting our time studying that yeah I mean listen I I uh I have to answer this question for my son and um
and for so many others. Um, and humans have a unique way of finding a way to add value to society notwithstanding disruption. Right? In the industrial
disruption. Right? In the industrial revolution, like if you were a trades person or a crafts person, right, and you built a product beginning to end, almost all of them were displaced by,
you know, mass production. And for that person, it didn't feel good. I was
really good at making a wheel start to finish, but I was totally disintermediated by the means of production. Okay. So, but it's not like
production. Okay. So, but it's not like we, you know, the world just stopped.
Those people found other things to do.
And one of the observations I have is that we used to have, you know, 80% of people that were in manufacturing and that were in, you know, farming and other things. Today, we have 70% of
other things. Today, we have 70% of people in the service economy, right? We
have the luxury of people, you know, we didn't used to hire coaches as an example, right? Like couldn't afford to
example, right? Like couldn't afford to hire a coach, right? Today you have coaches and yoga instructors and tons of things in the world that adds a lot of value to the world. And I think that we
have higher order things that people do, right? And so for me, one of the things
right? And so for me, one of the things is, you know, if you were well off enough that you could hire a tutor, a specialized tutor, that was great. But
for 98% of the world who couldn't afford that, now they can get that. Or if you were part of the two or 3% that could have, you know, um concierge medicine,
it was really great. But for the other 97% wasn't great. Well, now they can get that same level of care. Um, you know, and so I think this is about democratizing intelligence,
democratizing access, etc. Um, but it it's not to say that there aren't going to be different challenges. My number
one thing again is make yourself bionic, be a creator, figure out a way you know that you add value. Um, so if somebody comes and wants to interview at
alttimeter and you know they say, "Oh, I don't use you know AI and I don't use Excel spreadsheets. I do everything by
Excel spreadsheets. I do everything by hand." That would be a problem, right?
hand." That would be a problem, right?
Like I expect somebody to use all the greatest tools at their disposal to be the most effective they can to add value to allow us to you know generate alpha in the world. And so starting in a place
like this another way of saying it you know reserving this for a tweet at some point in time but I think that IQ gets commoditized
and EQ becomes super valuable. Okay.
What do I mean by EQ? I mean a network of people in this room. I mean the ability to persuade the person sitting next to you, the ability to form your team, the ability to lead people in
different like that is super valuable, right? I think it becomes more valuable
right? I think it becomes more valuable in the future and but I think you know just being the smartest person in a room and and solving the problem at the board faster than all the other humans in the
room like that I think's commoditized and you're not going to be able to beat the machine. Doesn't mean you don't need
the machine. Doesn't mean you don't need to learn those things but I think it'll be hard to beat the machine. Brad one,
thank you for doing this. We need BG2 back. We miss it. We only get Brad once
back. We miss it. We only get Brad once in a while. The all-in pod now. You only
get your little slot. So, let's let's get maybe get you back on.
Yeah, let's get BG2 back. But can I can I just add one thing to that? I think
like there's this other moment that's occurring right now and I think about this quite often and if you actually look at what's happening in mathematics right now, there's all kinds of new discoveries happening. And I use the
discoveries happening. And I use the following analogy like humanity had to wait for like an apple to fall on Newton's head for him to kind of then start theorizing about gravity and you know start formulating that. But now if
if we can have something else working and discovering new things like and it goes back to that chart that Brad showed right it's really until we started having more you know innovation more intelligence that you know those curves
went up and to the right we're just about to make that go more vertical. So
I I think I think the overall benefit to humanity is already been shown what happens when you have more intelligence, right? And so and we don't have to wait
right? And so and we don't have to wait for things to happen. We let the ages do it without us in the loop which I think will be powerful.
There's one up.
Yeah.
Yeah. Wonder what think about the integration of hardware and software particularly given that Apple is following the strategy like everybody else in the Netherlands and we're just going to ride on top of them and OpenAI is now building a device which might
actually rival that. So I'm just curious as to how you I think it's a high state. I mean,
listen, I I'll tell you, even the people at Apple are nervous with their strategy. Um, and so part of it is their
strategy. Um, and so part of it is their challenge around privacy.
They have a real challenge with because we don't have the capability yet on the edge and they don't want to have you sharing information up to the cloud,
right? um given their you know like they
right? um given their you know like they view as one of their core consumer value propositions is con consumer property or consumer privacy but I think they put
themselves at you know at at risk. The
the the the bullc case would simply be that we're so sticky to the device and the device is so good that they have time and ultimately the Gemini modeler that they're going to put on the phone
is just going to be a much more capable Siri. We can all agree the old series
Siri. We can all agree the old series really bad and you know we'll be a more capable Syria and for the vast majority of people that will be good enough right so that would be the bull case you know
I think the bear case is like you said that other people come along and build more ambient devices that you know consumers uh you know really like but for for me I frankly wish that open AI
wasn't working on a device I wish they would just focus on building intelligence you know and I think Apple is going to be very formidable uh you know in the device world. So I think they're in a reasonably good spot.
One stat u an 8 billion parameter model which is quite small right can burn out a phone in an iPhone in 30 minutes. It
goes back just battery life.
Yeah. Battery life. Yeah. So I just say you got to go back and look at how compute intensive AI is. Right. And so
that's the real challenge on the stuff that Brad said about pushing so much of that um you know frontier intelligence to the edge over here. Yeah.
over here. Yeah.
I wanted to touch on something you were talking about with me and I think it was on a recent episode of all in and was criticizing um various
tech CEOs for in creating hype around new product launches sort of fear what AI would do to not deteriorate and
save dogs and I guess I'm curious like to what extent did CEOs need to revise and rethink fair.
He and I had that argument on the pot again today. Again, call me naive. I
again today. Again, call me naive. I
think that Dario's speaking authentically what he believes. I think
Sam's thinking speaking authentically what what he believes. They're staring
at this exponential. They believe that they see AGI or ASI. And I think they do have legitimate concerns. Like listen,
I'm glad that we sandbox mythos. You
know, they tested it internally. Um they
found 26 vulnerabilities on the Safari browser. And like I said to Chimath
browser. And like I said to Chimath today, do you want them to just throw it out there and then all your browser history is out in the public? Probably
not, right? So at the same time, I don't think it helps, you know, going out and fear-mongering, particularly if you're real intent is regulatory capture, you know, to prevent everybody else from
climbing up the ladder now that you're on the top. Like I have a real problem with that. Um, but I think we have to
with that. Um, but I think we have to find that balance and those tradeoffs, right, between reminding people about,
you know, the optimistic side of things.
Um, and I encourage you to read both of Dario's, um, you know, uh, essays. Um,
and his first essay on this is quite optimistic about, you know, what can happen. But I think he has the other
happen. But I think he has the other side of it which is but it it it doesn't happen without us being very thoughtful about the guard rails and things we need to put in place. I don't you know one of
the things I was really happy about it's called project glasswing which is this consortium that they put together this week to effectively sandbox mythos before they release it publicly. You
know Amazon, Microsoft, etc. Like that seemed to me to be a very pragmatic market-based solution to to solve the problem. and uh he and I were were just
problem. and uh he and I were were just texting before I came over here. They
they found and they've hardened a lot of things already very quickly and within a 100 days you can do a lot when you're having the AI fix the things that it
finds. And so um you know I I certainly
finds. And so um you know I I certainly I talk optimistic about it. I think a lot of other people do encourage them to to find a little bit more balance in
their commentary. Um but you know I I
their commentary. Um but you know I I also don't want us to ignore the realities that when you split the atom, okay, it can either provide unlimited
free energy for the world and totally, you know, bring people out of darkness, right? Or it can be used to make a bomb
right? Or it can be used to make a bomb to destroy cities and and nations. And
so like powerful technology is powerful technology. We can't just stick our head
technology. We can't just stick our head in the sand and act like it's a one-way street.
Well, thanks for being here. You're
mentioning obviously the cost of intrinsically there's I think Dar also said the cost of training is react every year 100 billion by 2030. So I'm just curious how
you're balancing the fact that it's going to be cheaper sort of but I I think you you see kind of a couple of phenomenons. One, the gear that's
of phenomenons. One, the gear that's used for training turns into inference gear, right? So those big clusters,
gear, right? So those big clusters, we're seeing that happen kind of all the time. So, um, there there's just a
time. So, um, there there's just a natural progression between kind of, you know, those two worlds. And then I I I really do think like the innovations that come, you know, from those larger
models and those training clusters have such a large benefit and they tie back to, you know, what Brad said, right? You
know, if if you know, I was just reading this thing today by Mustafa from from Microsoft, right? saying look GPT2 what
Microsoft, right? saying look GPT2 what what we have 50x kind of more powerful compute today than what we did when we did GPT2 but look at the capabilities right and so I think you just have to
kind of keep those two things in line with sort of the entire topic of the conversation um you know innovation is going to keep happening because you know Brad touched on a little bit we're just
now unleashing AI into designing these things right and there's things that we see and we learn in terms of optimizations and software and hardware optimizations that we don't see. So, I
continue to believe it'll come down. Um,
but yeah, the the we're just working on a problem that's just very very intensive from a comput. So, those
numbers are going to be large.
Maybe maybe one final question. You can
pit dive with Sunny and I, we can answer some some questions after the fact. And
uh um but uh otherwise, I just want to say it's a great privilege for us to get to spend some time with you guys. So,
thanks for having us. But maybe right here. Yeah. Hi, thank you for sharing
here. Yeah. Hi, thank you for sharing this question about economics.
So, congrats. What will be your major shareh?
So what do you think is the long-term sustainable economic model for because right now it's a gigantic business right and 75 7%
damage and that makes it very difficult for for the entire ecosystem because way they
take way too much stock. Um what do you think will eventually happen? Let's say
a few scenarios. A their revenue may be limited but they keep their margin and two somehow
lower their money substantially and be able to keep a a much frequent happy share.
So you know maybe just repeat it real quick and then answer it since I'm Well, I'm going to let you investor inv I I work there so I I shouldn't answer that question. Yeah, I mean I mean
that question. Yeah, I mean I mean listen, Nvidia is a $4.5 trillion company that's trading at about 13 times earnings, very cheap, half the market
multiple, growing at 70% is obviously dominant, you know, I in the market today and um I think there's a wall of worry about Nvidia because everybody
says what you know and you know and TPUs and Cerebrris and Grock and all these people can come up with inference solutions. They can steal your share.
solutions. They can steal your share.
They can compete on price. That's the
beautiful thing about capitalism. You
know what Nvidia will have to do?
Compete. They either deliver a product that people are willing to pay more for, they have to drop their price, their margins come down, and they'll have to compete in that market. I would tell you that when I look into the product
roadmap um for what's going on at Nvidia, and the acquisition of Grock was part of it, I think they're going to be in an incredible position. and they've
already announced that they have a trillion dollars a trillion dollars of sales over the course of the next eight quarters that are already booked. People
have more demand than than than they can get memory and supply to build all of this compute out. So, I think we're so early in this. I was in Silicon Valley,
you know, not so long ago, 16 years ago, when they said there could never be a trillion dollar company. There would
never be a trillion. And we asked the question, well, why? They say, well, law of large numbers. I was like, well, what what stone tablet is that etched into?
Okay, in nowhere. Okay, today we have a $4 half trillion dollar company. I've
already said publicly Nvidia will be the first $10 trillion company. Okay, and
I'm not it's not because I'm a cheerleader. I can sell my Nvidia and
cheerleader. I can sell my Nvidia and invest in anything that I want to invest in. But
in. But that company's leadership, that team, their lead, the lead that they have on both training and inference and the rate at which they're moving, right, I think
puts them in a really great competitive position. And they're doing all this
position. And they're doing all this notwithstanding the fact that Tranium's successful, TPU is successful, custom AS6 are being very successful, and they're still killing it. And I think
that says a lot more about the size of the market for intelligence and the compute that's needed to get us there than it does about, you know, the individual company. With that said, we
individual company. With that said, we have to wrap. Apor, thanks for having us.
Thank you. Thank you.
Loading video analysis...