FULL CES 2026 EVENT: NVIDIA CEO Reveals Physical AI and Autonomous Robots Changing Industries | AI14
By DWS News
Summary
## Key takeaways - **Dual Simultaneous Platform Shifts**: Every 10 to 15 years the computer industry resets with a new platform shift from mainframe to PC, PC to internet, internet to cloud, cloud to mobile, but this time there are two simultaneous platform shifts: applications now built on top of AI, and the entire software stack reinvented where you train software on GPUs instead of programming on CPUs. [04:31], [05:19] - **Physical AI Understands World Laws**: Physical AI is AIs that understand the laws of nature, interact with the physical world, grasp object permanence, causality, friction, gravity, and inertia—common sense to a child but unknown to AI—requiring simulation to learn from scarce data. [09:42], [10:27] - **Cosmos Generates Physics-Grounded Data**: Cosmos is an open frontier world foundation model for physical AI, pre-trained on internet-scale video, real driving and robotics data, and 3D simulation, generating realistic physically coherent video from 3D scenes, driving telemetry, or prompts to create synthetic data for training AVs and robots. [30:25], [34:35] - **Alpamo Reasons in Autonomous Driving**: Alpamo is the world's first thinking reasoning autonomous vehicle AI, trained end-to-end from camera input to actuation, using human demonstrations and Cosmos-generated data; it reasons about actions, explains decisions, and handles long-tail scenarios by decomposing them into normal circumstances. [35:12], [37:39] - **Vera Rubin Delivers 5x AI Performance**: Vera Rubin supercomputer with Reuben GPU provides 100 petaflops of AI performance, 5x its predecessor despite only 1.6x transistors, via extreme co-design of six chips including MVLink FP4 tensor cores, enabling 1/4th systems for training 10T parameter models and 10x better token cost. [01:00:30], [01:30:34] - **Open Models Reach AI Frontier**: Open models advanced last year with Deepseek R1, the first open reasoning system, activating global innovation; now reaching the frontier 6 months behind closed models, with exploding downloads as startups, companies, researchers, and countries participate in the AI revolution. [10:40], [11:38]
Topics Covered
- Dual Platform Shifts Reshape Computing
- Open Models Reach AI Frontier
- Agentic AI Uses Multi-Model Routing
- Physical AI Demands Synthetic Data
- Vera Rubin Delivers 5x Inference
Full Transcript
Ready, [Music] [Music] go.
[Music] [Music] Heat.
[Music] Heat.
Heat. Heat.
[Music] [Music] [Music] Heat. Heat.
Heat. Heat.
[Music] [Music] [Music]
[Music] [Music] Welcome to the stage, Nvidia founder and
CEO, Jensen Wong.
[Music] Hello, Las Vegas.
Happy New Year.
>> Welcome to CES.
Well, we have about 15 keynotes worth of material to pack in here. I'm so happy to see all of you. You got 3,000 people in this auditorium. There's 2,000 people in a courtyard watching us. There's
another thousand people apparently in the fourth floor where there were supposed to be Nvidia show floors all watching this keynote and of course millions around the world are going to
be watching this to kick off this new year. Well, every 10 to 15 years the
year. Well, every 10 to 15 years the computer industry resets.
A new platform shift happens from mainframe to PC, PC to internet, internet to cloud, cloud to mobile. Each
time the world of applications target a new platform, that's why it's called a platform shift. You write new applications for a new computer.
Except this time there are two simultaneous platform shifts in fact happening at the same time.
While we now move to AI, applications are now going to be built on top of AI.
At first people thought AIS are applications and in fact AIS are applications but you're going to build applications on top of AIS
but in addition to that how you run the software how you develop the software fundamentally changed. The entire
fundamentally changed. The entire fillary stack of the computer industry is being reinvented.
You no longer program the software you train the software. You don't run it on CPUs, you run it on GPUs.
And whereas applications were pre-recorded pre-ompiled and run on your device, now applications
understand the context and generate every single pixel, every single token completely from scratch every single time. Computing has been fundamentally
time. Computing has been fundamentally reshaped as a result of accelerated computing, as a result of artificial intelligence. Every single layer of that
intelligence. Every single layer of that five layer cake is now being re reinvented. Well, what that means is
reinvented. Well, what that means is some 10 trillion dollars or so of the last decade of computing is now being modernized to this new way of doing
computing. What that means is hundreds
computing. What that means is hundreds of billions of dollars, a couple hundred billion dollars in VC funding each year is going into modernizing and inventing
this new world. And what it means is a hundred trillion dollars of industry, several percent of which is R&D budget is shifting over to artificial
intelligence. People ask where is the
intelligence. People ask where is the money coming from? That's where the money is coming from. The modernization
of AI to AI, the shifting of R&D budgets from classical methods to now artificial intelligence methods. enormous amounts
intelligence methods. enormous amounts of investments coming into this industry, which explains why we're so busy. And this last year was no
busy. And this last year was no difference. This last year was
difference. This last year was incredible.
This last year, there's a slide coming.
This is what happens when you don't practice.
This is the first keynote of the year. I
hope it's your first keynote of the year. Otherwise, you can you have been
year. Otherwise, you can you have been pretty pretty busy. This is our first keynote of the year. We're going to get the spiderw webs out. And so 2025 was an incredible year.
It's just see it seemed like everything was happening all at the same time and it in fact it probably was. The first
thing of course is scaling loss.
In 2015 the first language model that I thought was really going to make a difference made a huge difference. It was called
BERT. 2017 transformers came. It wasn't
BERT. 2017 transformers came. It wasn't
until five years later 2022 that Chad GPT moment happened and it awakened the world to the possibilities of artificial intelligence.
Something very important happened a year after that.
The first 01 model from Chad GPT the first reasoning model completely revolutionary invented this idea called test time scaling which is very common
sens common sensical thing. Not only do we pre-train a model to learn, we post-train it with our re reinforcement learning so that it could learn skills and now we also have test time scaling
which is another way of saying thinking.
You think in real time. Each one of these phases of artificial intelligence requires enormous amount of compute and the computing law continued to scale.
Large language models continue to get better. Meanwhile, another breakthrough
better. Meanwhile, another breakthrough happened and this breakthrough happened in 2024.
Agentic systems starting to emerge in 2025. It started to pervase to to uh
2025. It started to pervase to to uh proliferate just about everywhere.
Agentic models that have the ability to reason, look up information, do research, use tools, plan futures, simulate outcomes.
All of a sudden started to solve very very important problems. One of my favorite agentic models is called cursor which revolutionized the way we do software programming at NVIDIA. Agentic
systems are going to really take off from here. Of course, there were other
from here. Of course, there were other types of AI. We know that large language models isn't the only type of information. Wherever the universe has
information. Wherever the universe has information, wherever the universe has structure, we could teach a large language model, a form of language model
to go understand that information, to understand its representation and to turn that into an AI. One of the biggest most important one is physical AI. AIs
that understand the laws of nature. And
then of course physical AI is about AI interacting with the world but the world itself has information encoded information and that's called AI
physics. AI that in the case of physical
physics. AI that in the case of physical AI you have AI that interacts with the physical world and you have AI physics AI that understands the laws of physics.
And then lastly one of the most important things that happened last year the advancement of open models. We can
now know that AI is going to proliferate everywhere when open source when open innovation when innovation across every single company and every industry around the world is activated. At the same
time, open models really took off last year. In fact, last year we saw the
year. In fact, last year we saw the advance of Deepseek R1, the first open model that's a reasoning system. It
caught the world by surprise and it activated literally this entire movement. Really, really exciting work.
movement. Really, really exciting work.
We're so happy with it. Now we have openings open model systems all over the world of all different kinds and we now know that open models have also reached
the frontier. Still solidly is six
the frontier. Still solidly is six months behind the frontier models. But
every single 6 months a new model is emerging and these models are getting smarter and smarter because of that you could see the number of downloads has
exploded.
The number of downloads is growing so fast because startups want to participate in the AI revolution. Large
companies want to, researchers want to, students want to, just about every single country wants to. How is it possible that intelligence, the digital form of intelligence will leave anyone
behind? And so open models has really
behind? And so open models has really revolutionized artificial intelligence last year. This
entire industry is going to be reshaped as a result of that. Now, we had this inkling some time ago. You might have heard that several years ago, we started
to build and operate our own AI supercomputers. We call them DGX clouds.
supercomputers. We call them DGX clouds.
A lot of people asked, are you going to in going into the cloud business? The
answer is no. We're building these DGX supercomputers for our own use. Well, it
turns out we have billions of dollars of supercomputers in operation so that we could develop our open models. I am so pleased with the work that we're doing.
It is starting to attract attention all over the world and all over the industries because we are doing frontier AI model work in so many different domains. the work that we did in
domains. the work that we did in proteins in digital biology. La Proina
to be able to synthesize and generate proteins. Open fold 3 to understand the
proteins. Open fold 3 to understand the understand the structure of proteins.
EVO 2 how to understand and generate multiple proteins otherwise the beginnings of cellular cellular representation. Earth 2 AI that
representation. Earth 2 AI that understands laws of physics. The work
that we did with Forecast net, the work that we did with Cordiff really revolutionized the way that people are doing weather prediction. Neotron,
we've now doing groundbreaking work there. The first hybrid transformer SSM
there. The first hybrid transformer SSM model that's incredibly fast can and therefore can think for a very long time or can think very quickly with that for
not a very long time and produce very very smart intelligent answers.
Neimotron 3 is groundbreaking work and you can expect us to deliver other versions of Neimotron 3 in the near future. Cosmos
future. Cosmos a frontier open world foundation model one that understand how the world works.
Groot a humanoid robotic system articulation mobility locomotion. These
models, these technologies are now being integrated and in the each one of these cases open to the world. Frontier human
or robotics models open to the world.
And then today we're going to talk a little bit about Alpamo, the work that we've been doing in self-driving cars.
Not only do we open source the models, we also open source the data that we use to train those models because that in that way only in that way can you truly
trust how the models came to be. We open
source all the models. We help you make derivatives from them. We have a whole suite of libraries we call the Nemo libraries, physics li physics nemo libraries and the clarono libraries.
Each bional libraries, each one of these libraries are life cycle management systems of AIS so that you could process the data, you could generate data, you could train the model, you could create
the model, evaluate the model, guardrail the model all the way to deploying the model. Each one of these libraries are
model. Each one of these libraries are incredibly complex and all of it is open sourced. And so now on top of this
sourced. And so now on top of this platform, Nvidia is a frontier AI model builder and we build it in a very special way. We build it completely in
special way. We build it completely in the open so that we can enable every company, every industry, every country to be part of this AI revolution. I'm
incredibly proud of the work that we're doing there. In fact, if you notice the
doing there. In fact, if you notice the tra the charts, the chart shows that our contribution to this industry is bar none. And you're going to see us in fact
none. And you're going to see us in fact continue to do that if not accelerate.
These models are also world class.
All systems are down.
This never happens in Santa Clara.
Is it because of Las Vegas?
[Applause] Somebody must have went won a jackpot outside.
All systems are down.
Okay. I think my system's still down, but that's okay. I I I've I make it up as I go. And so so uh not only are these
models uh frontier capable, not only are they open, they're also top the leaderboards. This is an area where
leaderboards. This is an area where we're very proud. They top leaderboards in intelligence. Uh we have uh uh
in intelligence. Uh we have uh uh important models that understand multimodality documents otherwise known as PDFs. The most valuable content in
as PDFs. The most valuable content in the world are captured in PDFs. But
there it takes artificial intelligence to find out what's inside interpret what's inside and help you read it. And
so our PDF retrievers, our PDF parsers are worldclass.
Our speech recognition models absolutely worldclass. Our retrieval models
worldclass. Our retrieval models basically search semantic search AI search the database engine of the modern AI era world class. So we're on top of
leaderboards constantly. This is an area
leaderboards constantly. This is an area we're very proud of. And all of that is in service of your ability to build AI
agents. This is really a groundbreaking
agents. This is really a groundbreaking area of development. You know, at first when PE when Chad GPT came out, people said, you know, uh gosh, it it produced really interesting results, but it
hallucinated greatly. And the reason why
hallucinated greatly. And the reason why it hallucinated, of course, it could memorize everything um in the past, but it can't memorize everything in the future in the current. And so, it needs
to be grounded in research. It has to do fundamental research before it an answers a question. The ability to reason about do I have to do research?
Do I have to use tools? How do I break up a problem into steps? Each one of these steps something that that the AI model knows how to do and together it is
able to compose it into a sequence of steps to perform something it's never done before, never been trained to do.
This is the wonderful capability of reasoning. We could we could be we can
reasoning. We could we could be we can encounter a circumstance we've never seen before and break it down into circumstances and knowledge or rules
that we know how to do because we've experienced it in the past. And so the ability for AI models now to be able to reason incredibly powerful. The
reasoning capability of agents open the doors to all of these different applications. We no longer have to train
applications. We no longer have to train an AI model to know everything on day one. Just as we don't have to know
one. Just as we don't have to know everything on day one that we should be able to in every circumstance reason about how to solve that problem. Large
language models has now made this fundamental leap. The ability to use
fundamental leap. The ability to use reinforcement learning and chain of thought and you know search and planning and all these different techniques in reinforcement learning has made it possible for us to have this basic
capability and is also now completely open sourced. But the thing that's
open sourced. But the thing that's really terrific is another breakthrough that happened and the first time I saw it was with Arvin's perplexity.
Perplexity, the search company, the AI search company, really f really innovative company. And the first time I
innovative company. And the first time I realized they were using multiple models at the same time, I thought it was completely genius. Of course, we would
completely genius. Of course, we would do that. Of course, an AI would also
do that. Of course, an AI would also call upon all of the world's great AIs to solve the problem it wants to solve at any part of the reasoning chain. And
this is the reason why AIs are really multi-modal meaning they understand speech and images and text and videos and 3D
graphics and proteins. It's multimodal.
It's also multi-model meaning that it should be able to use any model that best fits the task. It is
multicloud by definition. Therefore,
because these AI models are sitting in all these different places and it also is hybrid cloud because if you're an enterprise company or you've built a robot or whatever that device is,
sometimes it's at the edge, sometimes a radio cell tower, maybe sometimes it's in an enterprise or maybe it's a place where a hospital where you need to have
the the data in real time right next to you. Whatever those applications are, we
you. Whatever those applications are, we know now this is what an AI application looks like in the future. Or another way to think about that because future
applications are built on AIS.
This is the basic framework of future applications.
This basic framework, this basic structure of agentic AIs that could do the things that I'm talking about that is multi-model
has now turbocharged AI startups of all kinds. And now you can also because of all of the open models and all the tools that we
provided you, you could also customize your AIs to teach your AI skills that nobody else is teaching. Nobody else is causing their AI to become intelligent
or smart in that way. You could do it for yourself and that's the work that we do with Neimotron Nemo and all of the things that we do with open models is intended to do. You put a smart router
in front of it and that router is essentially a manager that decides which one of the task based on the intention of the prompts that you give it which one of the models is best fit for that
application for that solving that problem. Okay. So now when you think
problem. Okay. So now when you think about this architecture, what do you have?
When you think about this architecture, all of a sudden you have an AI that's on the one hand completely customizable by you. Something that you could teach to
you. Something that you could teach to do your own very skills for your company, something that's domain secret, something where you have deep domain expertise. Maybe you've got all of the
expertise. Maybe you've got all of the data that you need to train that AI model. On the other hand, your AI is
model. On the other hand, your AI is always at the frontier by definition.
You're always at the frontier on the one hand. You're always customized. On the
hand. You're always customized. On the
other hand, it should just run. And so
we thought we would make the simplest of examples to make it available to you.
This entire framework we call a blueprint and we have blueprints that are integrated into enterprise SAS platforms all over the world and we're really pleased with the progress. But
what we do is show you a a short example of something that anybody can do.
Let's build a personal assistant.
I wanted to help me with my calendar, emails, to-do lists, and even keep an eye on my home. I use Brev to turn my DGX Spark into a personal cloud. So, I
can use the same interface whether I'm using a cloud GPU or a DGX Spark. I use
a Frontier model API to easily get started.
I want him to help me with my emails.
So, I create an email tool for my agent to call.
I want my emails to stay private. So,
I'll add an open model that's running locally on the Spark.
[Music] Now, for any job, I want the agent to use the right model for the right task.
So, I'll use an intentbased model router.
This way, prompts that need email will stay on my Spark, and everything else can call the Frontier model. I want my assistant to interact with my world, so I'll hook it up to Hugging Face's Reachi
mini robot.
My agent controls the head, ears, and camera of the Reachi with tool calls. I
want to give Richi a voice, and I really like 11 Labs, so I'll hook up their API.
>> Hi, I'm Richi running on DGX Spark.
>> Hey Richi, what's on my to-do list today?
your to-do list today.
Grab groceries, eggs, milk, butter, and send Jensen the new script.
>> Okay, let's send Jensen that update.
Tell him we'll have it for him by the end of the day.
>> We'll do.
>> Richi, there's a sketch, too. Can you
turn it into an architectural rendering?
>> Sure.
[Music] [Music] >> Nice. Now make a video and show me
>> Nice. Now make a video and show me around the room.
>> Here you go.
>> That's great.
>> With Brev, I can share access to my Spark and Reachi, so I'm going to share it with Anna.
>> Hey Richi, what's Potato up to?
>> He's on the couch.
I remember you don't like this. I'll
tell him to get off. Potato, off the couch.
With all the progress in open source, it's incredible to see what you can build. I'd love to see what you create.
build. I'd love to see what you create.
[Music] >> Isn't that incredible?
Now, the amazing thing is that is utterly trivial now. That is utterly trivial now. And yet, just a couple
trivial now. And yet, just a couple years ago, all of that would have been impossible. Absolutely unimaginable.
impossible. Absolutely unimaginable.
Well, this basic framework, this basic way of building applications using language models using language models
using language models using language models that are pre-trained and they're proprietary. They're frontier. combine
proprietary. They're frontier. combine
it with customized language models into a aentic framework, a reasoning framework that allows you to access tools and files and maybe even connect
to other agents.
This is basically the architecture of AI applications or applications in the modern age and the ability for us to create these applications are incredibly
fast. And notice
fast. And notice if you give it this application um information that it's never seen before or in a structure that has is not
represented exactly as you thought. It can still reason through it and make as best effort to reason through the data the information to try to understand how to
solve the problem artificial intelligence. Okay. So this basic
intelligence. Okay. So this basic framework is now being integrated and everything that I just described. We
have the benefit of working with some of the world's leading enterprise platform companies. uh Palunteer for example
companies. uh Palunteer for example um their their entire AI and data processing platform is being integrated accelerated by NVIDIA today. Service Now
the world's leading customer service and um employee service platform. Snowflake
the world's top data platform in the cloud uh incredible work that that is being done there. Uh Code Rabbit or we're using Code Rabbit all over Nvidia.
uh CrowdStrike creating AIs to detect to find AI threats. uh NetApp their AI their data platform now has Nvidia's semantic AI on top of it and agentic
systems on top of it uh uh to for uh for them to do customer service but the important thing is this not only is this the way that you develop applications now this is going to be the user
interface of your platform so whether it's Palanteer or Service Now or Snowflake and many other companies that we're working with the agentic system is
the interface It's no longer Excel with a bunch of you know squares that you enter enter information into maybe it's no longer
could just command line the all of that multimodality information is now possible and the way you interact with your platform is much more well if you
will simple like you're interacting with people and so that's enterprise AI being revolutionized by angentic systems the next thing is physical AI This is an
area that you've seen me talk about for several years. In fact, we've been
several years. In fact, we've been working on this for eight years. The
question is, how do you take something that is intelligent inside a computer and interacts with you with screens and speakers
to something that can interact with the world, meaning it can understand the common sense of how the world works.
Object permanence. If I look away and I look back, that object is still there.
um causality. If I push it, it tips over. It understands friction and
over. It understands friction and gravity. It understands inertia. That a
gravity. It understands inertia. That a
heavy truck rolling down the road is going to need a little bit more time to stop. That a ball is going to keep on
stop. That a ball is going to keep on rolling.
These ideas are common sense to even a little child, but for AI, it's completely unknown. And so we have to
completely unknown. And so we have to create a system that allows AIS to learn the the common sense of the physical
world, learn its laws, but also to be able to of course learn from data and the data is quite scarce and to be able to evaluate whether that AI is working,
meaning it has to simulate in an environment. How does an AI know that
environment. How does an AI know that the the actions that it's performing is consistent with what it should do if it doesn't have the ability to simulate the response of the physical world back on
its actions. The response of its actions
its actions. The response of its actions is really important to simulate.
Otherwise, there's no way to evaluate it. It's different every time. And so
it. It's different every time. And so
this basic system requires three computers.
One computer, of course, the one that we know that Nvidia builds for training the AI models. Another computer that we know
AI models. Another computer that we know is to inference the computer. Inference
the models. Inferencing the model is essentially a robotics computer that runs in a car or runs in a robot or runs in a factory, runs anywhere at the edge.
But there has to be another computer that's designed for simulation and simulation is at the heart of almost everything Nvidia does. This is this is
where we are most comfortable. and
simulation was really the foundations of almost everything that we've done with physical AI. So we have three computers
physical AI. So we have three computers and multiple stacks that run on these computers, these libraries to make them useful. Omniverse is our digital twin
useful. Omniverse is our digital twin physically based simulation world.
Cosmos as I mentioned earlier is our foundation model not a foundation model for language but a foundation model of the world and is also aligned with language. You could say something like,
language. You could say something like, you know, what's happening to the ball and they'll they'll tell you the ball's rolling down the street. And so a world foundation model and then of course the
robotics models. We have two of them.
robotics models. We have two of them.
One of them is called grouped. The other
one's called Alpamo that I'm going to tell you about. Now the one of the most important things that we have to do with physical AI is to create the data to train the AI in the first place. Where
does that data come from? Rather than
instead of having languages because we created a bunch of texts that are what we consider ground truth that the AI can learn from, how do we teach an AI the
ground truth of physics? There are lots and lots of videos, lots and lots of videos, but hardly enough to capture the diversity and the type of interactions
that we need. And so this is where great minds came together and transformed what used to be compute into data.
Now using synthetic data generation that is grounded and conditioned by the laws of physics, grounded and conditioned by
ground truth, we can now selectively cleverly generate data that we can then use to train the AI. So for example,
what comes into this AI, this Cosmos AI world model on the left on over here is the output of a traffic simulator.
Now this traffic simulator is hardly enough for an AI to learn from. We can take this, put it into a
from. We can take this, put it into a Cosmos foundation model and generate surround video that is physically based
and physically plausible that the AI can now learn from. And there are so many examples of this. Let me show you what Cosmos can do.
The chat GPT moment for physical AI is nearly here, but the challenge is clear.
The physical world is diverse and unpredictable.
Collecting real world training data is slow and costly and it's never enough.
The answer is synthetic data. It starts
with NVIDIA Cosmos, an open Frontier World Foundation model for physical AI pre-trained on internet scale video,
real driving and robotics data, and 3D simulation.
Cosmos learned a unified representation of the world able to align language, images, 3D and action.
It performs physical AI skills like generation, reasoning, and trajectory prediction from a single image. Cosmos generates
realistic video from 3D scene descriptions, physically coherent motion. From driving telemetry
coherent motion. From driving telemetry and sensor logs, surround video, from planning simulators,
multi- camera environments, or from scenario prompts, it brings edge cases to life.
Developers can run interactive closed loop simulations in Cosmos. When actions
are made, the world responds.
[Music] Cosmos reasons.
It analyzes edge scenarios, breaks them down into familiar physical interactions, and reasons about what could happen next.
Cosmos turns compute into data. Training
AVs for the longtail and robots how to adapt for every scenario.
[Music] I know it's incredible.
Cosmos is the world's leading foundation model, world foundation model. It's been
downloaded millions of times, used all over the world, getting world getting the world ready for this new era of physical AI. We use it ourselves as
physical AI. We use it ourselves as well. We use it ourselves to create our
well. We use it ourselves to create our self-driving car.
Using it for scenario generation and using it for evaluation, we could have something that allows us to effectively travel billions,
trillions of miles, but doing it inside a computer. And we've made enormous
a computer. And we've made enormous progress. Today we're announcing Alpa
progress. Today we're announcing Alpa Mayo, the world's first thinking
reasoning autonomous vehicle AI. Alpo is
trained end to end, literally from camera in to actuation out. the camera
in lots and lots of miles that are driven by itself.
We're human drive it dri using human demonstration and we have lots and lots of miles that are generated by Cosmos. In addition to
that, hundreds of thousands of examples are labeled very very carefully so that we could teach the car how to drive.
Alpha Mayo does something that's really special. Not only does it take sensor
special. Not only does it take sensor input and activates steering wheel, brakes and and acceleration, it also
reasons about what action it is about to take. It tells you what action it's
take. It tells you what action it's going to take, the reasons by which it came about that action and then of course the trajectory.
All of these are coupled directly and trained very specifically by a large combination of human trained and as well as Cosmos generated data. The result of
it is just really incredible. Not only
does your car drive as you would expect it to drive and it drives so naturally because it learn directly from human demonstrators but in every single scenario when it comes up to the
scenario it reasons about it tells you what it's going to do and it reasons about what you what's about to do. Now
the reason why this is so important is because of the long tale of driving there. It's impossible for us to simply
there. It's impossible for us to simply collect every single possible scenario for everything that could ever happen in every single country in every single circumstance that's possibly ever going
to happen for all of population.
However, it is very unlikely. It's very
likely that every scenario if decomposed into a whole bunch of other smaller scenarios are quite normal for you to understand. And so these long tales will
understand. And so these long tales will be decomposed into quite normal circumstances that the card knows how to deal with. It just needs to reason about
deal with. It just needs to reason about it. And so let's take a look. Everything
it. And so let's take a look. Everything
you're about to see is one shot. It's a
no hands.
>> Routing to your destination.
Buckle up.
[Music]
[Music] Heat. Heat.
Heat. Heat.
[Music]
Heat. Heat.
Heat. Heat.
[Music] [Music] Heat. Heat.
Heat. Heat.
[Music] You have arrived.
[Music] [Applause] We started working on self-driving cars eight years ago. And the reason for that is because we reasoned early on that
deep learning and artificial intelligence was going to reinvent the entire computing stack. And if we were ever going to understand how to navigate
ourselves and how to guide the industry towards this new future, we have to get good at building the entire stack. Well,
as I mentioned earlier, AI is a five layer cake. The lowest layer is land
layer cake. The lowest layer is land power and shell. In the case of robotics, the lowest layer is the car.
The next layer above it is chips, GPUs, networking chips, CPUs, all that kind of stuff. The next layer above that is the
stuff. The next layer above that is the infrastructure.
That infrastructure in this particular case as I mentioned with physical AI is omniverse and cosmos.
And then above that are the models. And
in the case of the models above that I just shown you, the model here is called Alpha Mayo. And
Alpha Mayo today is open sourced. We
this incredible body of work. It took
several thousand people. Our AV team is several thousand people. Just to put in perspective, our partner uh Ola, I think Ola's here in the audience somewhere.
Uh, Mercedes agreed to partner with us five years ago to go make all of this possible. We imagine that someday a
possible. We imagine that someday a billion cars on a road will all be autonomous. You could either have it be
autonomous. You could either have it be a robo taxi that you're you're you're orchestrating and and renting from somebody or you could own it and is driving for driving by itself or you could decide to drive for yourself. And
so, but every single car will have autonomous vehicle capability. Every
single car will be AI powered. And so
the the the model layer in this case is Alpha Mayo and the application above that is the Mercedes-Benz.
Okay. And so so this entire stack is our first Nvidia first entire stack endeavor and we've been working on it for this entire time and I'm just so happy that
the first AV car from Nvidia is going to be on the road in Q1 and then it goes Europe in Q2 here in the United States in Q1. then Europe in Q2 and I think
in Q1. then Europe in Q2 and I think it's Asia in Q3 and Q4 and the powerful thing is that we're going to keep on updating it with next ver next versions of Alpio and versions after that.
There's no question in my mind now that this is going to be one of the largest robotics industries and I'm so happy that we worked on it and it taught us enormous amount about how to help the
rest of the world build robotic systems. that deep understanding and knowing how to build it ourselves, building the entire infrastructure ourselves and knowing what kind of chips a robotic
system would would need. In this partic particular case, dual Orins the next generation dual Thors. These processors
are designed for robotic systems and was designed for the sa highest level of safety capability. This car just got
safety capability. This car just got rated. It just went to production. The
rated. It just went to production. The
Mercedes-Benz CLA was just rated by NCAAP, the world's safest car.
It is the only system that I know that has every single line of code, the chip, the system, every line of code safety certified. The entire model system is
certified. The entire model system is based on a sensors are diverse and redundant and so is the self-driving car stack. The Alpha Mayo stack is trained
stack. The Alpha Mayo stack is trained end to end and has incredible skills.
However, nobody knows until you drive it forever that it's going to be perfectly safe. And so that we the way we guard
safe. And so that we the way we guard rail that is with another software stack, an entire AV stack underneath.
That entire AB stack is built to be fully traceable and it's taken us some five years to build that some six seven years actually to build that second stack. These two software stacks are
stack. These two software stacks are mirroring each other and then we have a policy and safety evaluator decide is this something that I'm very confident and can reason about driving very
safely. If so, I'm going to have Alpio
safely. If so, I'm going to have Alpio do it. if it's a circumstance that I'm
do it. if it's a circumstance that I'm not very confident in and the safety um policy evaluator decide that we're going to go back to a a very a simpler safer guard rail system, then it goes back to
the classical AV stack where the only car in the world with both of these AV stacks running and all safety systems should have diversity and redundancy.
Well, our vision is that someday every single car, every single truck will be autonomous and we've been working towards that future. This entire stack is vertically integrated. Of course, in the case of Mercedes-Benz, we built the
entire stack together. We're going to deploy the car. We're going to operate the stack. We're going to maintain the
the stack. We're going to maintain the stack for as long as we shall live.
However, like everything else we do as a company, we build the entire stack, but the entire stack is open for the ecosystem. And these the ecosystem
ecosystem. And these the ecosystem working with us to build L4 and robo taxis is expanding and it's going everywhere.
I fully expect this to be well this is already a giant business for us. It's a
giant business for us because they use it for training our training data processing data and training their models. They use it for synthetic data
models. They use it for synthetic data generation in some cases in some car and some companies they pretty much just build uh the computers the chips that are inside the car and some companies
work with us full stack some companies work with us some partial part of that.
Okay. So it doesn't matter uh how much you decide to use. You know my only request is use a little bit of video wherever you can and uh you know but uh
the entire thing is open. Now this is going to be the first large scale mainstream um AI physical AI market and
this is now I think we can all agree fully here and this inflection point of going from not autonomous vehicles to autonomous vehicles is probably
happening right about this time. in in
the next 10 years, I'm fairly certain a very very large percentage of the world's cars will be autonomous or highly autonomous. But this is this
highly autonomous. But this is this basic technique that I just described in using the three computers using synthetic data generation and simulation applies to every form of robotic
systems. It could be a robot that is just an articulator, a manipulator, maybe it's a mobile robot, maybe it's a fully humanoid robot. And so the next
journey, the next era for robotic systems is going to be, you know, robots. And these
robots are going to come in all kinds of different sizes. And and uh I invited
different sizes. And and uh I invited some friends. Did they come?
some friends. Did they come?
>> Hey guys, hurry up. I got a lot of stuff to cover.
hurry up. I got a lot of stuff to cover.
>> Come on, hurry.
Did you tell R2-D2 you're going to be here?
>> Did you? And C3PO.
>> Okay. All right. Come here. Before now,
one of the things that one of the things that's really You have Jetsons. They
have little Jetson computers inside them. They're trained inside Omniverse.
them. They're trained inside Omniverse.
And how about this? Let's show everybody the simulator that you were that you guys learned how to how to be robots in.
You you guys want to look at that?
>> Okay, let's look at that. Run it,
please.
[Music]
coffee.
[Music] [Music]
Isn't that amazing?
That's how you learn to be a robot. You
did it all inside Omniverse. And the
robot simulator is called Isaac. Isaac
Sim and Isaac Lab. And anybody who wants to build a robot, you know, nobody could nobody's going to be as cute as you. But
now we have all Look at all these look at all these friends that we have building robots. We have We're building
building robots. We have We're building big ones. No, like I said, nobody's as
big ones. No, like I said, nobody's as cute as you guys are. But we have Neurobot and we have we have Aubot.
Aubot over there. You know, we have uh uh LG over here. They just announced a new robot, Caterpillar. They've got the largest robots ever. That one delivers
food to your house. That's connected to Uber Eats. And that's Surf Robot. I love
Uber Eats. And that's Surf Robot. I love
those guys. Agility, Boston Dynamics, incredible. You got surgical robots, you
incredible. You got surgical robots, you got manipulator robots from Franka, you got universal robotics robot.
Incredible number of different robots.
And so this is the next chapter. We're
going to talk a lot more about robotics in the future, but it's not just about the robots in the end. I know
everything's about you guys. It's about
getting there. And one of the air, one of the most important industries in the world that will be revolutionized by physical AI and AI physics
is the industry that started all of us.
At NVIDIA, it wouldn't be possible if not for the companies that I'm about to talk to. And I'm so happy that all of
talk to. And I'm so happy that all of them starting with Cadence is going to accelerate everything. Cadence CUDA X
accelerate everything. Cadence CUDA X integrated into all of their simulations and solvers. They've got uh NVIDIA
and solvers. They've got uh NVIDIA physical physical AIs that they're going to use for uh for different um physical plants and plant simulations. You got AI
physics being integrated into these systems. So whether it's an EDA or SDA um and in the future robotic systems we're going to have basically the same
technology that made you guys possible now completely revolutionize these design stacks synopsis without synopsis you know synopsis and cadence are
completely completely indispensable in the world of chip design synopsis is uh leads in uh and uh logic design and and
IP uh In the case of Cadence, they lead physical design, the place and route and emulation and verification. Cadence is
incredible at emulation and verification. Both of them are moving
verification. Both of them are moving into the world of system design and system simulation. And so in the future,
system simulation. And so in the future, we're going to design your chips inside Cadence and inside Synopsis. We're going
to design your systems and emulate the whole thing and simulate everything inside these tools. That's your future.
We're going to give Yeah. you're going
to be born inside these inside these platforms. Pretty amazing, right? And so
we're so happy that we're working with these these industries just as we've integrated NVIDIA into Palunteer and Service Now we're integrating Nvidia
into the most computationally intensive simulation industries, Synopsis and Cadence. And today we're announcing that
Cadence. And today we're announcing that Seammens is also doing the same thing.
We're going to integrate CUDA X physical AI agentic AI Nemo Neotron deeply integrated into the world of seammens
and the reason for that is this first we design the chips and all of it in the future will be accelerated by Nvidia you're going to be very happy about that we're going to
have a Gent chip designers and system designers working with us helping us do design just as we have agent software engineers helping our software engineer
ers code today. And so we'll have aentic chip designers and system designers.
We're going to create you inside this.
But then we have to build you. We have
to build the plants, the factories that make manufacture you. We have to design the manufacturing lines that assemble all of you. And these manufacturing
plants are going to be essentially gigantic robots. Incredible. Isn't that
gigantic robots. Incredible. Isn't that
right?
>> I know. I know. And so you're going to be designed in a computer. You're going
to be made in a computer. You're going
to be tested and evaluated in a computer long before long before you have to spend any time dealing with gravity.
I know.
Do you know how to deal with gravity?
Can you jump?
Can you jump?
Okay. All right. Don't show off. Okay.
So, so this so now the industry the industry that made Nvidia possible. We're I'm just so happy
Nvidia possible. We're I'm just so happy that that now the technology that we're creating is at a level of sophistication and capability that we can now help them
revolutionize their industry. And so
what started with with uh with them, we now have the opportunity to go back and and help them revolutionize theirs.
Let's take a look at the stuff that we're going to do with Semens.
Come on.
Breakthroughs in physical AI are letting AI move from screens to our physical world.
And just in time, as the world builds factories of every kind for chips, computers, life-saving drugs, and AI, as the global labor shortage worsens, we
need automation powered by physical AI and robotics more than ever.
This, where AI meets the world's largest physical industries, is the foundation of NVIDIA and Seaman's partnership. For
nearly two centuries, Seammens has built the world's industries and now it is reinventing it for the age of AI.
Seammens is integrating NVIDIA CUDA X libraries, AI models, and Omniverse into its portfolio of EDA,
CAE, and digital twin tools and platforms. Together, we're bringing physical AI to
the full industrial life cycle.
From design and simulation to production and operations, we stand at the beginning of a new
industrial revolution, the age of physical AI built by Nvidia and Seammens for the next age of industries.
Incredible, right guys?
What do you think? All right, I'll hang on tight. Just hang on tight. And so so
on tight. Just hang on tight. And so so this is, you know, if you look at look at the world's models, there's no question OpenAI is the the the leading
token generator today. More to more open AAI tokens are generated than just about anything else. The second largest group,
anything else. The second largest group, the second largest probably open models.
And my guess is that over time because there are so many companies, so many researchers, so many different types of domains and modalities that open-source models will be by far the largest. Let's
talk about somebody really special. You
guys want to do that? Let's talk about Vera Rubin.
Vera Rubin. Yeah, go ahead. She's a
American astronomer.
She was the first to observe. She
noticed that the tails of the galaxies were moving about as fast as the center of the galaxies. Well, I
know it makes no sense. It makes no sense. Newtonian physics would say just
sense. Newtonian physics would say just like the solar system, the planets further away from the sun is circulating circ cir circling the sun slower than
the planets closer to the sun. And
therefore it makes no sense that this happens unless there's invisible bodies we call them she
discovered dark body dark matter um that occupies space even though we don't see it and so Vera Rubin is the person that we named our next computer after. Isn't
that a good idea?
I know.
Okay. Okay, Vera Rubin is designed to address this fundamental challenge that we have. The amount of computation
we have. The amount of computation necessary for AI is skyrocketing. The
demand for NVIDIA GPUs is skyrocketing.
It's skyrocketing because models are increasing by a factor of 10, an order of a magnitude every single year. And
not to mention, as I mentioned, 01's introduction was an inflection point for AI. Instead of a oneshot answer,
AI. Instead of a oneshot answer, inference is now a thinking process. And
in order to teach the AI how to think, reinforcement learning and very significant computation was introduced into post training. It wasn't no long
it's no longer supervised fine-tuning or otherwise known as imitation learning or supervision training.
You now have reinforcement learning essentially the computer trial trying different iterations itself learning how to perform a task. The amount of
computation for pre pre-training for post- trainining for test time scaling has exploded as a result of that. And
now every single inference that we do instead of just one shot the number of tokens you can just see the AI think which we appreciate. The longer it thinks oftentimes it produces a better
answer. And so test time scaling causes
answer. And so test time scaling causes the number of tokens to be generated to increase by 5x every single year. Not to
mention, meanwhile, the race is on for AI. Everybody's trying to get to the
AI. Everybody's trying to get to the next level. Everybody's trying to get to
next level. Everybody's trying to get to the next frontier. And every time they get to the next frontier, the last generation AI tokens, the cost starts to
starts to decline about a factor of 10x every year. The 10x decline every year
every year. The 10x decline every year is actually telling you something different. It's saying that the race is
different. It's saying that the race is so intense. Everybody's trying to get to
so intense. Everybody's trying to get to the next level and somebody is getting to the next level. And so therefore, all of it is a computing problem. The faster
you compute, the sooner you can get to the next level of the next frontier. All
of these things are simultaneously happening at the same time. And so we decided that we had to advance the state-of-the-art of computation
every single year. Not one year left behind. And now we've been shipping
behind. And now we've been shipping GB200s year and a half ago. Right now we're in full-scale manufacturing of GB300.
And if Vera Rubin is going to be in time for this year, it must be in production by now. And so today I can tell you that
by now. And so today I can tell you that Vera Rubin is in full production.
You guys want to take a look at Vera Rubin?
>> All right. Come on.
>> Play it, please.
Vera Rubin arrives just in time for the next frontier of AI.
This is the story of how we built it.
The architecture, a system of six chips engineered to work as one, born from extreme code design. It begins with Vera, a custom-designed CPU, double the
performance of the previous generation.
And the Reuben GPU, Vera and Reuben are co-designed from the start to birectionally and coherently share data faster and with lower latency.
Then 17,000 components come together on a Ver Rubin compute board.
High-speed robots place components with micron precision before the Vera CPU and two Reuben GPUs complete the assembly.
Capable of delivering 100 pedlops of AI, five times that of its predecessor.
AI needs data fast.
Connect X9 delivers 1.6 6 terabs per second of scale out bandwidth to each GPU.
Bluefield 4 DPU offloads storage and security so compute stays fully focused on AI.
The Vera Rubin compute tray completely redesigned with no cables, hoses, or fans. Featuring a Bluefield 4 DPU, eight
fans. Featuring a Bluefield 4 DPU, eight Connect X9 Nix, two Vera CPUs, and four Reuben GPUs, the compute building block
of the Vera Rubin AI supercomputer.
Next, the sixth generation MVLink switch, moving more data than the global internet, connecting 18 compute nodes,
scaling up to 72 Reuben GPUs, operating as one.
Then Spectrum X Ethernet Photonix, the world's first Ethernet switch with 512 lanes and 200 Gbit capable co-packaged optics scale out thousands
of racks into an AI factory.
15,000 engineer years since design began. The first Vera Rubin MVL 72 rack
began. The first Vera Rubin MVL 72 rack comes online. Six breakthrough chips, 18
comes online. Six breakthrough chips, 18 compute trays, nine NVLink switch trays, 220 trillion transistors weighing nearly
two tons.
One giant leap to the next frontier of AI.
Reuben is here.
What do you guys think?
This is a Reuben pod. 1152 GPUs
in 16 racks. Each one of the racks as you know has 72
Vera Rubin or 72 Reubins. Each one of the Reubins is two actual GPU dies connected together. I'm going to show
connected together. I'm going to show I'm going to show it to you, but there are several things that Well, I'll tell you later.
I can't tell you everything right away.
Well, we designed six different chips.
First of all, we have a rule inside our company and it's a good rule. No new
generation should have more than one or two chips change. But the problem is this. As you could see, we were
this. As you could see, we were describing the total number of transistors in each one of the chips that were being described. And we know that Moore's law has largely slowed. And
so, the number of transistors we can get year after year after year can't possibly keep up with the 10 times larger models. It can't possibly keep up
larger models. It can't possibly keep up with five times per year more tokens generated. It can't possibly keep up
generated. It can't possibly keep up with the fact that cost decline of the tokens are going to be so aggressive. It
is impossible to keep up with those kind of rates if the indust for the industry to continue to advance unless we deploy aggressive extreme code design basically
innovating across all of the chips across the entire stack all at the same time. which is the reason why we decided
time. which is the reason why we decided that this generation we had no choice but to design every chip over again. Now
every single chip that we were describing just now can be a press conference in all in itself and there's an entire company who's probably dedicated to doing that back in the old
days. Each one of them are completely
days. Each one of them are completely revolutionary and the best of its kind.
The Vera CPU I'm so proud of it in a power constrained world. Gray CPU is two times the performance in a power constrained world. It's twice the
constrained world. It's twice the performance per watt of the world's most advanced CPUs. Its data rate is insane.
advanced CPUs. Its data rate is insane.
It was designed to process supercomputers and Vera was an incredible GPU. Grace was an incredible
incredible GPU. Grace was an incredible GPU. Now Vera increases the single
GPU. Now Vera increases the single threaded performance, increases the capacity of the memory, increases everything just dramatically. It's a
giant chip. This is the Vera CPU.
This is one CPU.
And this is connected to the Reuben GPU. Look at that thing.
It's a giant chip. Now, the thing that's really special, and I I'll go through these. It's going to take three hands, I
these. It's going to take three hands, I think, four hands to do this. Okay, so
this is the Ver Vera CPU. It's got 88 CPU cores and the CPU cores are designed to be multi-threaded but the multi-threaded nature of of uh Vera was
designed so that each one of the 176 threads could get its full full performance. So it's essentially as of
performance. So it's essentially as of there's 176 cores but only 88 physical cores. So these cores were designed in
cores. So these cores were designed in in using a technology called spatial multi-threading. But the IO performance
multi-threading. But the IO performance is incredible. This is the Reuben GPU.
is incredible. This is the Reuben GPU.
It's 5x blackwell in floating performance. But the important thing is
performance. But the important thing is go to the bottom line. The bottom line it's only 1.6 times the number of transistors of black wall. That kind of tells you something about the the levels
of semiconductor physics today. If we
don't do code design, if we do don't do extreme code design at the level of basically every single chip across the entire system, how is it
possible we deliver performance levels that is, you know, at best one point 1 1.6 times each year? Because that's the total number of transistors you have.
And even if you were to have a little bit more performance per transistor, say 25%, you're this impossible to get a 100% yield out of the number of
transistors you get. And so 1.6x kind of puts a ceiling on how far performance can go each year unless you do something extreme. And we call it extreme code
extreme. And we call it extreme code design. Well, one of the things that one
design. Well, one of the things that one of the things that we did and it was a great invention is called MVF FP4 tensor core. The transformer engine inside our
core. The transformer engine inside our chip is not just a 4bit floatingoint number somehow that we put into the data path. It is an entire processor, a
path. It is an entire processor, a processing unit that understands how to dynamically, adaptively adjust its precision and structure to deal with
different levels of the transformer so that you can achieve higher throughput wherever it's possible to lose precision and to go back to the highest possible precision wherever you need to. that
ability to dynamically do that. You
can't do this in software because obviously it's just running too fast.
And so you have to be able to do it adaptively inside the processor. That's
what an MVF FP4. When somebody says FP4 or FP8, it almost means nothing to us.
And the reason for that is because it's the tensor core structure in all of the algorithms that makes makes it work. MV
FP4, we've published papers on this already. The precision that the the
already. The precision that the the level of throughput and precision it's able to retain is in completely incredible. This is groundbreaking work.
incredible. This is groundbreaking work.
I would not be surprised that the industry would like us to make this format and this structure an industry standard in the future. This is
completely revolutionary. This is how we were able to deliver such a gigantic step up in performance even though we only have 1.6 times the number of
transistors. Okay. Okay. So this is and
transistors. Okay. Okay. So this is and now once you have a great processing node and this is the processor node and inside so this is this is for example
here let me do this.
This is this is wow super heavy. You
have to be a CEO in really good shape to do this job.
Okay. All right. So, this thing is I'm gonna guess this is probably I don't know couple of hundred pounds.
I thought that was funny, too.
Come on. It could have been. Everybody's
gone. No, I don't think so.
All right. So, so look at this. This is
the last one. We revolutionized the entire MGX chassis. This node,
chassis. This node, 43 cables, zero cables, six tubes,
z of them here. It takes two hours to assemble this.
If you're lucky, it takes two hours. And
of course, you're probably going to assemble it wrong. You're going to have to retest it, test it, reassemble it. So
the assembly process is incredibly complicated and it was understandable as one of our first supercomputers that's deconstructed in this way. This from 2
hours to 5 minutes 80% liquid cooled.
100% liquid cooled.
Yeah. Really really a breakthrough.
Okay. So, so this is the new compute chassis and what connects all of these to the top of rack switches, the east west traffic is called the Spectrox
Nick. This is the world's best nick.
Nick. This is the world's best nick.
Unquestionably, Nvidia's Melanox, the acquisition Melanox that joined us a long time ago now. Um, this their networking technology for high performance computing is the world's best bar none. the algorithms, the chip
design, all of the interconnects, all the software stacks that run on top of it, their RDMA, absolutely absolutely bar none, the world's best. And now it has the ability to do programmable RDMA
and data path accelerator so that our partners like AI labs could create their own algorithms for how they want to move data around the system. But this is
completely world worldclass connect X9 and the Vera CPU were co-designed and we never revealed it. never never
released it until CX9 came along because we we co-designed it for a new type of processor.
You know, Connect X9 or CX8 and Spectrum X revolutionized how Ethernet was done for artificial intelligence. Ethernet
traffic for AI is much much more intense, requires much lower latency.
The the instantaneous surge of traffic is unlike anything Ethernet sees. And so
we created Spectrum X which is AI Ethernet.
Two years ago we announced Spectrum X.
NVIDIA today is the largest networking company the world has ever seen. So it's
been so successful and used in so many different installations. It is just
different installations. It is just sweeping uh the AI landscape. The
performance is incredible especially when you have a 200 um megawatt data center or if you have a gigawatt data center. These are billions of dollars.
center. These are billions of dollars.
Let's say a gigawatt data center is $50 billion dollars. If the networking
billion dollars. If the networking performance allows you to deliver an extra 10% in the case of Spectrum X, delivering
25% higher throughput is not uncommon.
If we were to just deliver 10% that's worth $5 billion. The networking is completely free, which is the reason why well everybody uses Spectrum X. It's just an
incredible thing. And now we're going to
incredible thing. And now we're going to invent a new type a new type of uh uh data processing. And so spectral is for
data processing. And so spectral is for east west traffic. We now have a new processor called blue field 4. Blue
field 4 allows us to take a large large very large data center isolate different parts of it so that different users could use different parts of it. Make
sure that everything could be virtualized if they decide to be virtualized. So you offload a lot of the
virtualized. So you offload a lot of the um virtualization software, the security software, the networking software for your north south traffic. And so
Bluefield 4 comes standard with every single one of these compute nodes.
Bluefield 4 has a second application I'm going to talk about in just a second.
This is a revolutionary processor and I'm so excited about it. This is the MVLink 6 switch and
it's right here. This is the this switch. This switchip there are four of
switch. This switchip there are four of them inside the MVLink switch here.
Each one of these switchips has the fastest certis in history. The world is barely getting to 200 gigabits. This is
400 gigabits per second switch. The
reason why this is so important is so that we could have every single GPU talk to every other GPU at exactly the same time. This switch, this switch on the
time. This switch, this switch on the back plane of one of these racks enables us to move the equivalent of twice the
amount of the global internet data, twice as all of the world's internet data at twice the speed. You take the cross-sectional bandwidth of the entire
planet's internet, it's about 100 terabytes per second. This is 240 terabytes per second. So it kind of puts it in perspective. This is so that every single GPU can work with every single
other GPU at exactly the same time.
Okay. Then on top of that on top of that okay so this is one rack.
This is one rack. Each one of the racks as you can see the number of transistors in this one rack is 1.7 times.
Yeah. Could you do this for me? So this
is It's usually about two tons, but today it's two and a half tons because um when they shipped it, they forgot to drain
the water out of it.
So, we we shipped a lot of water from California.
Can you hear it squealing?
You know, when you're rotating two and a half tons, you're going to squeal a little.
Oh, you could do it.
Wow.
Okay. We just we won't make you do that twice. All right. So So um so behind
twice. All right. So So um so behind behind this are the MVLink spines.
Basically two miles of copper cables.
Copper is the best conductor we know.
And these are all shielded copper cables, structured copper cables, the most the world's ever used in computing systems ever. and and um our certis
systems ever. and and um our certis drive the copper cables from the top of the rack all the way to the bottom of the rack at 400 gigabits per second.
It's incredible. And so uh this has two miles of total copper cables, 5,000 copper cables, and this makes the MVLink uh spine possible. This is the
revolution that that really started the NGX system. Now, we we decided that we
NGX system. Now, we we decided that we would create an industry standard system. So that the entire ecosystem,
system. So that the entire ecosystem, all of our supply chain could standardize on these components. There
some 80,000 different components that make up this these NGX systems and it's a total waste if we're to change it every single year.
Every single major computer company from Foxcon to Quanta to Wistron, you know, the list goes on and on and on to HP and Dell and Lenovo, everybody knows how to
build these systems. And so the fact that we could squeeze Ruben Vera Rubin into this even though the performance is so much so much higher and very
importantly the power is twice as high.
The power of Vera Rubin is twice as high as Grace Blackwell. And yet and this is the miracle the air that goes into it the the air
flow is about the same. And very
importantly, the water that goes into it is the same temperature, 45 degrees C.
With 45 degrees C, no water chillers are necessary for data centers. We're
basically cooling this supercomput with hot water. Is so incredibly efficient.
hot water. Is so incredibly efficient.
And so this is um this is the new the new rack.
1.7 times more transistors, but five times more peak inference performance.
three and a half times more peak um uh uh training performance.
Okay, they're connected on top using Spectrum X. Oh, thank you.
X. Oh, thank you.
[Applause] This is this is the world's first manufacturing chip using uh TSMC's
new process that we co-inovated called Coupe. is a silicon photonix integrated
Coupe. is a silicon photonix integrated silicon photonix process technology. And
this allows us to take silicon photonix directly right to the chip. And this is 512 ports at 200 Gbits per second. And
this is the new Ethernet AI switch, the Spectrum X Ethernet switch. And look at this giant chip. But what's really amazing, it's got silicon photonics
directly connected to it. And lasers
come in Lasers come in through here. Lasers come
in through here. The optics are here and they connect out to the rest of the data data center. This I'll show you in a
data center. This I'll show you in a second, but this is on top of the rack.
And this is the new Spectrum X um Silicon Photonix switch. Okay.
And we have something new I want to tell you about. So just as I mentioned a
you about. So just as I mentioned a couple years ago, we introduced Spectrum X so that we could reinvent the way that networking is done. Um Ethernet is really easy to
is done. Um Ethernet is really easy to manage and everybody has an Ethernet stack and every data center in the world knows how to deal with Ethernet. Um and
the only thing that we were we were using at the time was called Infiniband which is used for supercomputers.
Infiniband is very low latency. Um but
of course the software stack the entire manageability of Infiniband is very alien to the people who use Ethernet. So
we decided to enter the Ethernet switch market for the very first time. Spectrum
X that just took off and it made us the largest networking company in the world as I mentioned. This next generation Spectrum X is going to carry on that tradition. But just as I said earlier AI
tradition. But just as I said earlier AI has reinvented the whole computing stack, every layer of the computing stack. It stands to reason that when AI
stack. It stands to reason that when AI starts to get deployed in the world's enterprises, it's going to also reinvent the way storage is done. Well, AI
doesn't use SQL. AI use semantics information. And when AI is being used,
information. And when AI is being used, it creates this temporary knowledge, temporary temporary memory calls KV cache, K key value combinations, but
it's a KV cache. Basically, the cache of the AI, the working memory of the AI.
And the working memory of the AI is stored in the HBO memory. Every single
token for every single token, the H the GPU reads in the model, the entire model, it reads in the entire
working memory and it produces one token and it stores that one token back into the KV cache. And then the next to the next time it does that, it reads in the
entire memory, reads it and it streams it through our GPU and then generates another token. Well, it does this
another token. Well, it does this repeatedly, token after token after token. And obviously, if you have a long
token. And obviously, if you have a long conversation with that AI over time, that memory, that context memory is going to grow tremendously. Not to
mention, the models are growing, the number of turns that we're using, the AI are are increasing. We would like to have this AI stay with us our entire life and remember every single conversation we've ever had with it,
right? Every single lick of research
right? Every single lick of research that I've asked it for. Of course, we the number of people that will be sharing the supercomputers is going to continue to grow. And so this context
memory which started out fitting inside an HBM is no longer large enough. Last
year we created Grace Blackwell's very fast memory. we called fast context memory in that's the reason why we connected grace directly to hopper
that's why we connected grace directly to blackwell so that we can expand the context memory but even that is not enough and so the next solution of course is to go off onto the network the
north south network off to the storage of the company but if you have a whole lot of AI running at the same time that network is no longer going to be fast
enough so the answer is very clearly to do it different. And so we intro we created Bluefield 4 so that we could
essentially have a very fast KV cache context memory store right in the rack.
And so I'll show you in just one second, but there's a whole new category of storage systems. And the industry is so excited because this is a pain point for just about everybody who does a lot of
token generation today. the AI labs, the cloud service providers, they're really suffering from the amount of network traffic that's causing being caused by KV cache moving around. And so the idea
that we would create a new platform, a new processor to run the entire Dynamo KV cache context memory management system and to put it very close to the
rest of the rack is completely revolutionary. So this is it. This is it
revolutionary. So this is it. This is it sits right here.
So this this is all the compute nodes.
Each one of these is MVLink 72. So this
is Vera Rubin MVLink 72 144 U Reuben GPUs. This is the context memory that's stored here. Behind each
one of these are four blue fields.
Behind each blue field is 150 gigab 150 terabytes 150 terabytes of memory context memory.
And for each GPU once you allocate it across each GPU will get an additional 16 terabytes. Now inside this node each
16 terabytes. Now inside this node each GPU essentially has one terabyte. And
now with this backing store here directly on the same east west traffic at exactly the same data rate 200 gigabits per second across literally the
entire fabric of this compute node.
you're going to get an additional 16 terabytes of memory. Okay. And this is the management plane. These are
these are the spectrum X switches that connects all of them together.
And over here, these switches at the end connects them to the rest of the data center. Okay? And so this is the Vera
center. Okay? And so this is the Vera Rubin. Now there's several things that's
Rubin. Now there's several things that's really incredible about it. So the first thing that I mentioned is that the this
entire system is twice the energy efficiency essentially the the twice the the the temperature performance in the sense that that even though the power is
twice as high the amount of energy use is twice as high the amount of computation is many times higher than that but the liquid that goes into it still 45 degrees C that enables us to
save about 6% % of the world's data center power. So that's a very big deal.
center power. So that's a very big deal.
The second very big deal is that this entire system is now confidential computing safe. Meaning
everything is encoded in transit at rest and during compute and every single bus is now encrypted. every PCI Express,
every MV link, every H you know for a MV link between CPU me and GPU, between GPU to GPU, everything is now encrypted. And
so it's confidential computing safe.
This allows companies to feel safe that their models are being deployed by somebody else, but it will never be seen by anybody else. Okay? And so this
particular system is not only incredibly energy efficient and there's one other thing that's incredible because of the nature of the workload of
AI it spikes instantaneously with this computation layer called all reduce the amount of current the amount of energy that is used but
simultaneously is really off the charts.
Oftentimes it'll spike up 25%. We now
have power smoothing across the entire system so that you don't have to overprovision by 25 times or if you overprovision by 25 times you don't have
to leave 25 times 25% 25% not 25 times 25% of the energy um squandered or unused and so now you could fill up the entire power budget and you don't have
to over you don't have to proceed you don't have to provision beyond that and then the Last thing of course is performance. So let's take a look at the
performance. So let's take a look at the performance of this. These are only charts that people who build AI super supercomputers would love. It took exact it took every single one of these chips
complete redesign of every single one of the systems and rewriting the entire stack for us to make this possible.
Basically this is training the AI model. This
first column, the faster you train AI models, the faster you can get the next frontier out to the world. This is your time to market. This is technology leadership. This is your pricing power.
leadership. This is your pricing power.
And so in the case of the green, this is essentially a 10 trillion parameter model. We scaled
it up from deepse deep. That's why we call it deep C++. A training a 10 trillion parameter model on a 100 trillion tokens. Okay. And that's this
trillion tokens. Okay. And that's this is our simulation projection of what it would take for us to build the next frontier model. The next frontier model
frontier model. The next frontier model uh Elon's already mentioned that the next version of gro gra 5 I think is 7 trillion parameters. So this is 10 and
trillion parameters. So this is 10 and in the green is black well and here in the case of um uh Reuben notice the
throughput is so much higher and therefore it only takes 1/4th as many of these systems in order to train the model in the time that we gave it here
which is one month. Okay. And so time time is the same for everybody. Now how
much how fast you can train that model and how large of a model you can train is how you're going to get to the frontier first. The second part is your
frontier first. The second part is your factory throughput.
Blackwell is green again. And factory
throughput is important because your factory is in the case of a gigawatt is $50 billion. A $50 billion data center
$50 billion. A $50 billion data center can only consume one gigawatt of power.
And so if your performance, your throughput per watt is very good versus quite poor, that directly translates to your revenues. Your
revenues of your data center is directly related to the second second column. And
in the case of Blackwell, it was about 10 times over Hopper. In the case of Reuben, it's going to be about 10 times higher again. Okay? And in the case of
higher again. Okay? And in the case of now the um the cost of the tokens how cost effectively it is to generate the
token this is Reuben about onetenth just as in the case of Yep.
So that's how this is how we're going to get everybody to the next frontier to um push AI to the next level and of course to build these data centers
energy efficiently and cost efficiently.
So this is it. This is Nvidia today. You
know we mentioned that we build chips but as you know Nvidia builds entire systems now and AI is a full stack. We
we're reinventing AI across everything from chips to infrastructure to models to applications and our job is to create the entire stack so that all of you
could create incredible applications uh for the rest of the world. Thank you all for coming have a great CES.
Now, before before I before I let you guys go, uh there were a whole bunch of slides we have to cut we had to leave on the cutting floor and so we have some out takes here. I think it'll be fun for
you. Have a great see us guys
you. Have a great see us guys [Applause] and cut.
Nvidia live at CES. Take four marker.
>> Boom. Mike
action.
>> Sorry guys. Platform shift, huh?
>> That should do it.
>> And let's roll camera.
[Music] >> A shade of green. A bright happy green.
World's most powerful AI supercomput you can plug into the wall next to my toaster.
>> Hey guys, I'm I'm stuck again. I'm so
sorry.
>> This slide is never going to work. Let's
just cut it.
>> Hello. Can you hear me?
So, like I was saying, the router.
Because not every problem needs the biggest, smartest model. Just the right one.
>> No. No, don't lose any of them.
>> This new six chip Reuben platform makes one amazing AI supercomputer.
>> There you go, little guy.
>> Oh no, no, not the scaling laws.
>> There is a squirrel on the car. Be ready
to make the squirrel go away. Ask the
squirrel gently to move away.
>> Did you know the best models today are all mixture of experts?
>> Hey [Music] Where'd everybody go?
Loading video analysis...