Nvidia CEO Jensen Huang talks about his company's latest innovations at CES 2026

By Yahoo Finance

Summary

Topics Covered

Dual Platform Shifts Reinvent Computing
Agentic AI Solves Novel Problems
Open Models Reach AI Frontier
Physical AI Needs Synthetic Data
Vera Rubin Delivers 5x Inference

Full Transcript

lay.

Please take your seats. Our event is about to begin.

[music] >> Hey lays excuse Or would you Go fetus glory glory

kicked into the cur.

It's better than my name.

I got a fire in me. You're going to set burn.

[music] my nails.

Loose go.

Hey, hey hey.

Heat. Heat.

Hey.

Hey. Hey.

Yeah.

Yeah. Yeah.

Natalie.

Hey, hey, hey.

Heat. Heat.

Ready, Let's go.

Heat. Heat.

Heat. Heat.

Heat.

Heat.

>> [music] >> Welcome to the stage, Nvidia founder and CEO, Jensen Wong.

Hello, Las Vegas.

Happy New Year.

>> Welcome to CES.

Well, we have about 15 kilos worth of material to pack in here. I'm so happy to see all of you. You got 3,000 people in this auditorium. There's 2,000 people in a courtyard watching us. There's

another thousand people apparently in the fourth floor where there were supposed to be Nvidia show floors all watching this keynote and of course millions around the world are going to

be watching this to kick off this new year. Well, every 10 to 15 years the

year. Well, every 10 to 15 years the computer industry resets.

A new platform shift happens from mainframe to PC, PC to internet, internet to cloud, cloud to mobile. Each

time the world of applications target a new platform, that's why it's called a platform shift. You write new applications for a new computer.

Except this time there are two simultaneous platform shifts in fact happening at the same time.

While we now move to AI, applications are now going to be built on top of AI.

At first, people thought AIS are applications. And in fact, AIs are

applications. And in fact, AIs are applications. But you're going to build

applications. But you're going to build applications on top of AIS.

But in addition to that, how you run the software, how you develop the software fundamentally changed. The entire

fundamentally changed. The entire fabulary stack of the computer industry is being reinvented.

You no longer program the software, you train the software. You don't run it on CPUs, you run it on GPUs.

And whereas applications were pre-recorded pre-ompiled and run on your device, now applications understand the context and generate

every single pixel, every single token completely from scratch every single time.

Computing has been fundamentally reshaped as a result of accelerated computing, as a result of artificial intelligence. Every single layer of that

intelligence. Every single layer of that five layer cake is now being re reinvented.

Well, what that means is some 10 trillion dollars or so of the last decade of computing is now being modernized to this new way of doing

computing. What that means is hundreds

computing. What that means is hundreds of billions of dollars, a couple hundred billion dollars in VC funding each year is going into modernize and inventing

this new world. And what it means is a hundred trillion dollars of industry, several percent of which is R&D budget is shifting over to artificial

intelligence. People ask where is the

intelligence. People ask where is the money coming from? That's where the money is coming from. the modernization

of AI to AI, the shifting of R&D budgets from classical methods to now artificial intelligence methods. Enormous amounts

intelligence methods. Enormous amounts of investments coming into this industry, which explains why we're so busy. And this last year was no

busy. And this last year was no difference. This last year was

difference. This last year was incredible.

This last year, there's a slide coming.

This is what happens when you don't practice.

It's the first keynote of the year. I

hope it's your first keynote of the year. Otherwise, you can been you have

year. Otherwise, you can been you have been pretty pretty busy. This is our first keynote of the year. We're going

to get the spiderwebs out. And so 2025 was an incredible year.

It's just see it seemed like everything was happening all at the same time. And

it in fact it probably was. The first

thing of course is scaling loss.

In 2015, the first language model that I thought was really going to make a difference made a huge difference. It was called

BERT. 2017, Transformers came. It wasn't

BERT. 2017, Transformers came. It wasn't

until 5 years later, 2022, that Chad GPT moment happened and it awakened the world to the possibilities of artificial intelligence.

Something very important happened a year after that.

The first 01 model from Chad GPT, the first reasoning model, completely revolutionary, invented this idea called test time scaling, which is a very common sense, common sensical thing. Not

only do we pre-train a model to learn, we postrain it with our re reinforcement learning so that it could learn skills.

And now we also have test time scaling, which is another way of saying thinking.

You think in real time. Each one of these phases of artificial intelligence requires enormous amount of compute and the computing law continued to scale.

Large language models continue to get better. Meanwhile, another breakthrough

better. Meanwhile, another breakthrough happened and this breakthrough happened in 2024.

Agentic systems started to emerge in 2025. It started to pervase to to uh

2025. It started to pervase to to uh proliferate just about everywhere.

Agentic models that have the ability to reason, look up information, do research, use tools, plan futures, simulate outcomes.

All of a sudden started to solve very very important problems. One of my favorite Agentic models is called cursor which revolutionized the way we do software programming at NVIDIA. Agentic

systems are going to really take off from here. Of course, there were other

from here. Of course, there were other types of AI. We know that large language models isn't the only type of information. Wherever the universe has

information. Wherever the universe has information, wherever the universe has structure, we could teach a large language model a form of language model

to go understand that information to understand its representation and to turn that into an AI. One of the biggest most important one is physical AI. AI

that understand the laws of nature. And

then of course physical AI is about AI's interacting with the world but the world itself has information encoded information and that's called AI

physics. AI that in the case of physical

physics. AI that in the case of physical AI you have AI that interacts with the physical world and you have AI physics AI that understands the laws of physics.

And then lastly one of the most important things that happened last year the advancement of open models. We can

now know that AI is going to proliferate everywhere when open source when open innovation when innovation across every single company and every industry around the world is activated. At the same

time, open models really took off last year. In fact,

year. In fact, last year we saw the advance of DeepSeek R1, the first open model that's a

reasoning system. It caught the world by

reasoning system. It caught the world by surprise and it activated literally this entire movement. Really, really exciting

entire movement. Really, really exciting work. We're so happy with it. Now we

work. We're so happy with it. Now we

have openings open model systems all over the world of all different kinds and we now know that open models have also reached the frontier. still solidly

is six months behind the frontier models but every single six months a new model is emerging and these models are getting smarter and smarter because of that you

could see the number of downloads has exploded the number of downloads is growing so fast because startups want to participate in the AI revolution large

companies want to researchers want to students want to just about every single country wants How is it possible that intelligence, the digital form of intelligence will

leave anyone behind? And so open models has really revolutionized artificial intelligence last year. This entire

industry is going to be reshaped as a result of that. Now, we had this inkling some time ago. You might have heard that several years ago, we started to build

and operate our own AI supercomputers.

We call them DGX clouds. A lot of people asked, are you going to in going into the cloud business? The answer is no.

We're building these DGX supercomputers for our own use. Well, it turns out we have billions of dollars of supercomputers in operation so that we

could develop our open models. I am so pleased with the work that we're doing.

It is starting to attract attention all over the world and all over the industries because we are doing frontier AI model work in so many different domains. The work that we did in

domains. The work that we did in proteins in digital biology. La protina

to be able to synthesize and generate proteins. Open fold 3 to understand the

proteins. Open fold 3 to understand the understand the structure of proteins.

[snorts] EVO 2 how to understand and generate multiple proteins otherwise the beginnings of cellular cellular representation. Earth 2 AI that

representation. Earth 2 AI that understands laws of physics. The work

that we did with forecast net, the work that we did with Cordiff really revolutionized the way that people are doing weather prediction. Neotron,

we've now doing groundbreaking work there. The first hybrid transformer SSM

there. The first hybrid transformer SSM model that's incredibly fast can and therefore can think for a very long time or can think very quickly with that for

not a very long time and produce very very smart intelligent answers.

Neimotron 3 is groundbreaking work and you can expect us to deliver other versions of Neimotron 3 in the near future. Cosmos

future. Cosmos a frontier open world foundation model one that understand how the world works.

Groot a humanoid robotic system articulation mobility locomotion. These

models, these technologies are now being integrated and in the each one of these cases open to the world. Frontier human

and robotics models open to the world.

And then today we're going to talk a little bit about Alpamo, the work that we've been doing in self-driving cars.

Not only do we open source the models, we also open source the data that we use to train those models because that in that way only in that way can you truly

trust how the models came to be. We open

source all the models. We help you make derivatives from them. We have a whole suite of libraries we call the Nemo libraries, physics li physics Nemo libraries and the clarono libraries.

Each biono libraries each one of these libraries are life cycle management systems of AIS so that you could process the data you could generate data you could train the model you could create the model evaluate the model guardrail

the model all the way to deploying the model each one of these libraries are incredibly complex and all of it is open sourced and so now on top of this

platform NVIDIA is a frontier AI model builder and we build it in a very special way we build it completely in the open so that we can enable every

company, every industry, every country to be part of this AI revolution. I'm

incredibly proud of the work that we're doing there. In fact, if you notice the

doing there. In fact, if you notice the the charts, the chart shows that our contribution to this industry is bar none and you're going to see us in fact

continue to do that if not accelerate.

These models are also world class.

All systems are down.

This never happens in Santa Clara.

Is it because of Las Vegas?

Somebody must have went won a jackpot outside.

[clears throat] All systems are down.

Okay, I think my system's still down, but that's okay. I I I've I make it up as I go. And so so uh not only are these

models uh frontier capable, not only are they open, they're also top the leaderboards. This is an area where

leaderboards. This is an area where we're very proud. They top leaderboards in intelligence. Uh we have uh uh

in intelligence. Uh we have uh uh important models that understand multimodality documents, otherwise known as PDFs. The most valuable content in

as PDFs. The most valuable content in the world are captured in PDFs, but there it takes artificial intelligence to find out what's inside, interpret

what's inside, and help you read it. And

so our PDF retrievers, our PDF parsers are worldclass.

Our speech recognition models absolutely worldclass. Our retrieval models,

worldclass. Our retrieval models, basically search, semantic search, AI search, the database engine of the modern AI era, worldclass. So, we're on

top of leaderboards constantly. This is

an area we're very proud of. And all of that is in service of your ability to build AI agents. This is really a groundbreaking area of development. You

know, at first when pe when chat GPT came out, people said, you know, uh gosh, it it produced really interesting results, but it hallucinated greatly.

And the reason why it hallucinated, of course, it could memorize everything um in the past, but it can't memorize everything in the future, in the current. And so it needs to be grounded

current. And so it needs to be grounded in research. It has to do fundamental

in research. It has to do fundamental research before it answers a question.

The ability to reason about do I have to do research? Do I have to use tools? How

do research? Do I have to use tools? How

do I break up a problem into steps? Each

one of these steps something that that the AI model knows how to do. And

together it is able to compose it into a sequence of steps to perform something it's never done before, never been trained to do. This is the wonderful capability of reasoning. We could we

could be we can encounter a circumstance we've never seen before and break it down into circumstances and knowledge or rules that we know how to do because

we've experienced it in the past. And so

the ability for AI models now to be able to reason incredibly powerful.

The reasoning capability of agents open the doors to all of these different applications. We no longer have to train

applications. We no longer have to train an AI model to know everything on day one. Just as we don't have to know

one. Just as we don't have to know everything on day one that we should be able to in every circumstance reason about how to solve that problem. Large

language models has now made this fundamental leap. The ability to use

fundamental leap. The ability to use reinforcement learning and chain of thought and you know search and planning and all these different techniques in reinforcement learning has made it possible for us to have this basic

capability and is also now completely open sourced. But the thing that's

open sourced. But the thing that's really terrific is another breakthrough that happened and the first time I saw it was with Arvin's perplexity.

Perplexity, the search company, the AI search company, really f really innovative company. And the first time I

innovative company. And the first time I realized they were using multiple models at the same time, I thought it was completely genius. Of course, we would

completely genius. Of course, we would do that. Of course, an AI would also

do that. Of course, an AI would also call upon all of the world's great AIs to solve the problem it wants to solve at any part of the reasoning chain. And

this is the reason why AIs are really multi-modal meaning they understand speech and images and text and videos and 3D

graphics and proteins. It's multimodal.

It's also multi-model meaning that it should be able to use any model that best fits the task. It is

multicloud by definition. Therefore,

because these AI models are sitting in all these different places and it also is hybrid cloud because if you're an enterprise company or you've built a robot or whatever that device is,

sometimes it's at the edge, sometimes a radio cell tower, maybe sometimes it's in an enterprise or maybe it's a place where a hospital where you need to have

the the data in real time right next to you. Whatever those applications are, we

you. Whatever those applications are, we know now this is what an AI application looks like in the future. Or another way to think about that because future

applications are built on AIS.

This is the basic framework of future applications.

This basic framework, this basic structure of agentic AIs that could do the things that I'm talking about that is multi-model

has now turbocharged AI startups of all kinds. And now you can also because of all of the open models and all the tools that we

provided you, you could also customize your AIs to teach your AI skills that nobody else is teaching. Nobody else is causing their AI to become intelligent

or smart in that way. You could do it for yourself. And that's the work that

for yourself. And that's the work that we do with Neimotron, Nemo, and all of the things that we do with open models is intended to do. You put a smart router in front of it. And that router

is essentially a manager that decides which one of the task based on the intention of the prompts that you give it, which one of the models is best fit for that application for that solving

that problem. Okay. So now when you

that problem. Okay. So now when you think about this architecture, what do you have?

When you think about this architecture, all of a sudden you have an AI that's on the one hand completely customizable by you. Something that you could teach to

you. Something that you could teach to do your own very skills for your company, something that's domain secret, something where you have deep domain expertise. Maybe you've got all of the

expertise. Maybe you've got all of the data that you need to train that AI model. On the other hand, your AI is

model. On the other hand, your AI is always at the frontier by definition.

You're always at the frontier on the one hand. You're always customized. On the

hand. You're always customized. On the

other hand, it should just run. And so

we thought we would make the simplest of examples to make it available to you.

This entire framework we call a blueprint and we have blueprints that are integrated into enterprise SAS platforms all over the world and we're really pleased with the progress. But

what we do is show you a a short example of something that anybody can do.

Let's build a personal assistant.

I wanted to help me with my calendar, emails, [music] to-do lists, and even keep an eye on my home. I use Brev to turn my DGX Spark into a personal cloud.

So, I can use the same interface whether I'm using a cloud GPU or a DGX Spark. I

use a Frontier model API to easily get started. [music]

started. [music] I want him to help me with my emails.

So, I create an email tool for my agent to call.

I want my emails to stay private. So,

I'll add an open model that's running locally on the Spark.

Now, for any job, [music] I want the agent to use the right model for the right task. So, I'll use an intentbased

right task. So, I'll use an intentbased model router.

This way, prompts that need email will stay on my Spark, and everything else can call the Frontier model. I want my assistant to interact with my world, so I'll hook it up to Hugging Faces Reachi

mini robot.

My agent controls the head, ears, and camera of the Reichi with tool calls. I

want to give Richi a voice, and I really like 11 Labs, so I'll hook up their API.

>> Hi, I'm Richi running on DGX Spark.

[music] >> Hey Richi, what's on my to-do list today?

your to-do list today.

Grab groceries, eggs, milk, butter, and send Jensen the new script.

>> Okay, let's send Jensen an update. Tell

him we'll have it for him by the end of the day.

>> We'll do.

>> Richi, there's a sketch, too. Can you

turn it into an architectural rendering?

>> Sure.

>> Nice. Now make a video and show me around the room.

>> Here you go.

>> That's great.

>> With Brev, I can share access to my Spark and Reachi, so I'm going to share it with Anna.

>> Hey Richi, what's Potato up to?

>> He's on the couch.

I remember you don't like this. I'll

tell him to get off. Potato, off the couch.

With all the progress in open source, it's incredible to see what you can build. I'd love to see what you create.

build. I'd love to see what you create.

>> Isn't that incredible?

Now, the amazing thing is that is utterly trivial now. That is utterly trivial now. And yet, just a couple

trivial now. And yet, just a couple years ago, all of that would have been impossible. Absolutely unimaginable.

impossible. Absolutely unimaginable.

Well, this basic framework, this basic way of building applications using language models using language models

[clears throat] using language models using language models that are pre-trained and they're proprietary.

They're frontier. combine it with customized language models into a aentic framework, a reasoning framework that allows you to access tools and files and

maybe even connect to other agents.

This is basically the architecture of AI applications or applications in the modern age and the ability for us to create these applications are incredibly

fast. And notice

fast. And notice if you give it this application um information that it's never seen before or in a structure that has is not

represented exactly as you thought it can still reason through it and make it best effort to reason through the data the information to try to understand how to solve the problem artificial

intelligence. Okay. Okay, so this basic

intelligence. Okay. Okay, so this basic framework is now being integrated and everything that I just described, we had the benefit of working with some of the world's leading enterprise platform

companies. Uh, Palunteer for example

companies. Uh, Palunteer for example um their their entire AI and data processing platform is being integrated accelerated by Nvidia today. Service Now

the world's leading customer service and um employee service platform. Snowflake

the world's top data platform in the cloud. Uh, incredible work that that is

cloud. Uh, incredible work that that is being done there. Uh, Code Rabbit, we're using Code Rabbit all over Nvidia. Uh,

Crowdstrike creating AIS to detect to find AI threats. Uh, NetApp, their AI, their data platform now has NVIDIA's semantic AI on top of it and agentic

systems on top of it uh to for uh for them to do customer service. But the

important thing is this. Not only is this the way that you develop applications now, this is going to be the user interface of your platform. So

whether it's Palanteer or Service Now or Snowflake and many other companies that we're working with, the agentic system is the interface. It's no longer Excel

with a bunch of, you know, squares that you enter enter information into. Maybe

it's no longer could just command line.

the any all of that multimodality information is now possible and the way you interact with your platform is much more well if you will simple like you're

interacting with people and so that's enterprise AI being revolutionized by angentic systems the next thing is physical AI this is an area that you've seen me talk about for several years in

fact we've been working on this for eight years the question is how do you take something that is intelligent inside a computer and interacts with you

with screens and speakers to something that can interact with the world.

Meaning it can understand the common sense of how the world works. Object

permanence. If I look away and I look back, that object is still there. Um

causality. If I push it, it tips over.

It understands friction and gravity. It

understands inertia. that a heavy truck rolling down the road is going to need a little bit more time to stop, that a ball is going to keep on rolling.

These ideas are common sense to even a little child, but for AI, it's completely unknown. And so we have to

completely unknown. And so we have to create a system that allows AIS to learn the the common sense of the physical

world, learn its laws, but also to be able to of course learn from data and the data is quite scarce and to be able to evaluate whether that AI is working,

meaning it has to simulate in an environment. How does an AI know that

environment. How does an AI know that the the actions that it's performing is consistent with what it should do if it doesn't have the ability to simulate the response of the physical world back on

its actions. The response of its actions

its actions. The response of its actions is really important to simulate otherwise there's no way to evaluate it.

It's different every time. And so this basic system requires three computers.

One computer of course the one that we know that Nvidia builds for training the AI models. Another computer that we know

AI models. Another computer that we know is to inference the computer. Inference

the models. Inferencing the model is essentially a robotics computer that runs in a car or runs in a robot or runs in a factory, runs anywhere at the edge.

But there has to be another computer that's designed for simulation and simulation is at the heart of almost everything Nvidia does. This is this is

where we are most comfortable and simulation was really the foundations of almost everything that we've done with physical AI. So we have three computers

physical AI. So we have three computers and multiple stacks that run on these computers, these libraries to make them useful. Omniverse is our digital twin

useful. Omniverse is our digital twin physically based simulation world.

Cosmos as I mentioned earlier is our foundation model not a foundation model for language but a foundation model of the world.

and is also aligned with language. You

could say something like, you know, what's happening to the ball and they'll they'll tell you the ball's rolling down the street. And so a world foundation

the street. And so a world foundation model and then of course the robotics models. We have two of them. One of them

models. We have two of them. One of them is called Groot. The other one's called Alpamo that I'm going to tell you about.

Now the one of the most important things that we have to do with physical AI is to create the data to train the AI in the first place. Where does that data come from? rather than instead of having

come from? rather than instead of having languages because we created a bunch of texts that are what we consider ground truth that the AI can learn from. How do

we teach an AI the ground truth of physics? There lots and lots of videos,

physics? There lots and lots of videos, lots and lots of videos, but hardly enough to capture the diversity and the type of interactions that we need. And

so this is where great minds came together and transformed what used to be compute into data. Now

using synthetic data generation that is grounded and conditioned by the laws of physics, grounded and conditioned by

ground truth, we can now selectively cleverly generate data that we can then use to train the AI. So for example,

what comes into this AI, this Cosmos AI world model on the left on over here is the output of a traffic simulator.

Now this traffic simulator is hardly enough for an AI to learn from. We can take this, put it into a

from. We can take this, put it into a Cosmos foundation model and generate surround video that is physically based

and physically plausible that the AI can now learn from. And there are so many examples of this. Let me show you what Cosmos can do.

The chat GPT moment for physical AI is nearly here, but the challenge is clear.

The physical world is diverse and unpredictable.

Collecting real world training data is slow and costly and it's never enough.

The answer is synthetic data. It starts

with NVIDIA Cosmos, an open Frontier World Foundation model for physical AI pre-trained on internet scale video,

real driving and robotics data, and 3D [music] simulation.

Cosmos learned a unified representation of the world, able to align language, images, 3D, and action.

It performs physical AI skills like generation, reasoning, and trajectory prediction from a single image. Cosmos generates

realistic video from 3D scene descriptions, physically coherent motion,

from driving telemetry and sensor logs, surround video from planning simulators, multi- camera environments,

or from scenario prompts. It brings edge cases to life.

Developers can run interactive closed loop simulations in Cosmos. When actions

are made, the world responds.

Cosmos reasons.

It analyzes edge scenarios, breaks them down into familiar physical interactions, and [music] reasons about what could happen next.

Cosmos turns compute [music] into data, training AVs for the longtail and robots how to adapt for every scenario.

I know it's incredible.

Cosmos is the world's leading foundation model. World foundation model. It's been

model. World foundation model. It's been

downloaded millions of times, used all over the world, getting world getting the world ready for this new era of physical AI. We use it ourselves as

physical AI. We use it ourselves as well. We use it ourselves to create our

well. We use it ourselves to create our self-driving car, using it for scenario generation and using it for evaluation.

We could have something that allows us to effectively travel billions, trillions of miles, but doing it inside a computer. And we've made enormous

a computer. And we've made enormous progress. Today, we're announcing Alpio,

progress. Today, we're announcing Alpio, the world's first thinking reasoning autonomous vehicle

AI. Alpo is trained end to end.

AI. Alpo is trained end to end.

Literally from camera in to actuation out. The camera in lots and lots of

out. The camera in lots and lots of miles that are driven by itself where human drive it dri using human demonstration

and we have lots and lots of miles that are generated by cosmos. In addition to that, hundreds of thousands of examples are labeled very, very carefully so that

we could teach the car how to drive.

Alpha Mayo does something that's really special. Not only does it take sensor

special. Not only does it take sensor input and activates steering wheel, brakes, and and acceleration, it also

reasons about what action it is about to take. It tells you what action it's

take. It tells you what action it's going to take. the reason by which it came about that action and then of course the trajectory.

All of these are coupled directly and trained very specifically by a large combination of human trained and as well as Cosmos generated data. The result of

it is just really incredible. Not only

does your car drive as you would expect it to drive and it drives so naturally because it learned directly from human demonstrators but in every single scenario when it comes up to the

scenario it reasons about it tells you what it's going to do and it reasons about what you what's about to do. Now

the reason why this is so important is because of the long tale of driving there. It's impossible for us to simply

there. It's impossible for us to simply collect every single possible scenario for everything that could ever happen in every single country in every single circumstance that's possibly ever going

to happen for all the population.

However, it is very unlikely is very likely that every scenario if decomposed into a whole bunch of other smaller scenarios are quite normal for you to

understand. And so these long tails will

understand. And so these long tails will be decomposed into quite normal circumstances that the card knows how to deal with. It just needs to reason about

deal with. It just needs to reason about it. And so let's take a look. Everything

it. And so let's take a look. Everything

you're about to see is one shot. It's a

no hands.

>> Routing to your destination.

Buckle up.

Heat. Heat.

Heat. Heat.

Hallelujah.

Heat.

Heat.

You have arrived.

>> [applause] >> We started working on self-driving cars eight years ago. And the reason for that is because we reasoned early on that deep learning and artificial intelligence was going to reinvent the

entire computing stack. And if we were ever going to understand how to navigate ourselves and how to guide the industry towards this new future, we have to get

good at building the entire stack. Well,

as I mentioned earlier, AI is a five layer cake. The lowest layer is land

layer cake. The lowest layer is land power and shell. In the case of robotics, the lowest layer is the car.

The next layer above it is chips, GPUs, networking chips, CPUs, all that kind of stuff. The next layer above that is the

stuff. The next layer above that is the infrastructure.

That infrastructure in this particular case as I mentioned with physical AI is omniverse and cosmos.

And then above that are the models. And

in the case of the models above that I just shown you, the model here is called Alpha Mayo. And

Alpha Mayo today is open sourced. We

this incredible body of work. It took

several thousand people. Our AV team is several thousand people. Just to put in perspective, our partner uh Ola, I think Ola's here in the audience somewhere.

Uh, Mercedes agreed to partner with us five years ago to go make all of this possible. We imagine that someday a

possible. We imagine that someday a billion cars on the road will all be autonomous. You could either have it be

autonomous. You could either have it be a robo taxi that you're you're you'rechestrating and and renting from somebody or you could own it and is driving for driving by itself or you could decide to drive

for yourself and so but every single car will have autonomous vehicle capability.

every single car will be AI powered. And

so the the the model layer in this case is Alpha Mayo and the application above that is the Mercedes-Benz.

Okay. And so, so this entire stack is our first Nvidia first entire stack endeavor and we've been working on it for this entire time and I'm just so

happy that the first AV car from Nvidia is going to be on the road in Q1 and then it goes Europe in Q2 here in the United States in Q1 then Europe in Q2

and I think it's Asia in Q3 and Q4 and the powerful thing is that we're going to keep on updating it with next ver next versions of Alpa Mayo and versions after that. There's no question in my

after that. There's no question in my mind now that this is going to be one of the largest robotics industries and I'm so happy that we worked on it and it taught us enormous amount about how to

help the rest of the world build robotic systems. That deep understanding in knowing how to build it ourselves, building the entire infrastructure ourselves and knowing what kind of chips

a robotic system would would need. In

this partic particular case, dual Orins, the next generation dual Thors. These

processors are designed for robotic systems and was designed for the sa highest level of safety capability. This

car just got rated. It just went to production. The Mercedes-Benz CLA was

production. The Mercedes-Benz CLA was just rated by NCAAP, the world's safest car.

>> [applause] >> It is the only system that I know that has every single line of code, the chip, the system, every line of code safety

certified. The entire model system is

certified. The entire model system is based on a sensors are diverse and redundant and so is the self-driving car stack. The Alpha Mayo stack is trained

stack. The Alpha Mayo stack is trained end to end and has incredible skills.

However, nobody knows until you drive it forever that it's going to be perfectly safe. And so that we the way we guard

safe. And so that we the way we guard rail that is with another software stack, an entire AV stack underneath.

That entire AV stack is built to be fully traceable and it's taken us some five years to build that some six, seven years actually to build that second stack. These two software stacks are

stack. These two software stacks are mirroring each other and then we have a policy and safety evaluator to decide is this something that I'm very confident and can reason about driving very

safely. If so, I'm going to have Alpamo

safely. If so, I'm going to have Alpamo do it. If it's a circumstance that I'm

do it. If it's a circumstance that I'm not very confident in and the safety um policy evaluator decide that we're going to go back to a a very a simpler, safer guard rail system, then it goes back to

the classical AV stack. We're the only car in the world with both of these AV stacks running and all safety systems should have diversity and redundancy.

Well, our vision is that someday every single car, every single truck will be autonomous. And we've been working

autonomous. And we've been working towards that future. This entire stack is vertically integrated. Of course, in the case of Mercedes-Benz, we built the entire stack together. We're going to deploy the car. We're going to operate

the stack. We're going to maintain the

the stack. We're going to maintain the stack for as long as we shall live.

However, like everything else we do as a company, we build the entire stack, but the entire stack is open for the ecosystem. And these the ecosystem

ecosystem. And these the ecosystem working with us to build L4 and robo taxis is expanding and it's going everywhere.

I fully expect this to be well this is already a giant business for us. It's a

giant business for us because they use it for training our training data, processing data and training their models. They use it for synthetic data

models. They use it for synthetic data generation in some cases. In some car, in some companies, they pretty much just build uh the computers, the chips that are inside the car. And some companies

work with us full stack. Some companies

work with us some partial part of that.

Okay? So, it doesn't matter uh how much you decide to use. You know, my only request is use a little bit of video wherever you can and uh you know, but uh

the entire thing is open. Now this is going to be the first largecale mainstream um AI physical AI market and this is now

I think we can all agree fully here and this inflection point of going from not autonomous vehicles to autonomous vehicles is probably happening right

about this time in in the next 10 years I'm fairly certain a very very large percentage of the world's cars will be autonomous or highly autonomous but this This basic technique that I just

described in using the three computers using synthetic data generation and simulation applies to every form of robotic systems. It could be a robot that is just an articulator, a

manipulator, maybe it's a mobile robot, maybe it's a fully humanoid robot. And

so the next journey, the next era for robotic systems is going to be, you know, robots. And these

robots are going to come in all kinds of different sizes and and uh I invited some friends. Did they come?

some friends. Did they come?

>> Hey guys, hurry up. I got a lot of stuff to cover.

hurry up. I got a lot of stuff to cover.

>> Come on, hurry.

Did you tell R2-D2 you were going to be here?

>> Did you? And C3PO.

Okay. All right. Come here. Before now,

one of the things that one of the things that's really You have Jetson's. They

have little Jetson computers inside them. They're trained inside Omniverse.

them. They're trained inside Omniverse.

And how about this? Let's show everybody the simulator that you were that you guys learned how to how to be robots in.

You you guys want to look at that?

>> Okay, let's look at that. Run it,

please.

See, Okay.

Isn't that amazing?

That's how you learn to be a robot. You

did it all inside Omniverse. And the

robot simulator is called Isaac. Isaac

Sim and Isaac Lab. And anybody who wants to build a robot, you know, nobody could nobody's going to be as cute as you.

But now we have all look at all these look at all these friends that we have building robots. We have we're building

building robots. We have we're building big ones. No, like I said, nobody's as

big ones. No, like I said, nobody's as cute as you guys are. But we have Neurobot and we have we have Aubot.

Aubot over there, you know. We have uh LG over here. They just announced a new robot, Caterpillar. They've got the

robot, Caterpillar. They've got the largest robots ever. That one delivers food to your house. That's connected to Uber Eats. And that's Surf Robot. I love

Uber Eats. And that's Surf Robot. I love

those guys. Agility, Boston Dynamics, incredible. You got surgical robots, you

incredible. You got surgical robots, you got manipulator robots from Franka, you got universal robotics robot, incredible number of different robots.

And so this is the next chapter. We're

going to talk a lot more about robotics in the future, but it's not just about the robots in the end. I know

everything's about you guys. It's about

getting there. And one of the air one of the most important industries in the world that will be revolutionized by physical AI and AI physics

is the industry that started all of us at NVIDIA. It wouldn't be possible if

at NVIDIA. It wouldn't be possible if not for the companies that I'm about to talk to. And I'm so happy that all of

talk to. And I'm so happy that all of them starting with Cadence is going to accelerate everything. Cadence CUDA X

accelerate everything. Cadence CUDA X integrated into all of their simulations and solvers. They've got uh Nvidia

and solvers. They've got uh Nvidia physical physical AIs that they're going to use for uh for different um physical plants and plant simulations. You got AI

physics being integrated into these systems. So whether it's an EDA or STA um and in the future robotic systems, we're going to have basically the same

technology that made you guys possible now completely revolutionize these design stacks. Synopsis without synopsis

design stacks. Synopsis without synopsis you know synopsis and cadence are completely completely indispensable in

the world of chip design. Synopsis is uh leads in uh and uh logic design and and IP uh in the case of cadence they lead

physical design the place and route uh and emulation and verification. Cadence

is incredible at emulation and verification. Both of them are moving

verification. Both of them are moving into the world of system design and system simulation. And so in the future,

system simulation. And so in the future, we're going to design your chips inside Cadence and inside Synopsis. We're going

to design your systems and emulate the whole thing and simulate everything inside these tools. That's your future.

We're going to give Yeah, you're going to be born inside these inside these platforms. Pretty amazing, right? And so

we're so happy that we're working with these these industries just as we've integrated NVIDIA into Palunteer and Service Now we're integrating NVIDIA

into the most computationally intensive simulation industries synopsis and cadence. And today we're announcing that

cadence. And today we're announcing that Seammens is also doing the same thing.

We're going to integrate CUDA X physical AI agentic AI neo neotron deeply integrated into the world of seammens.

And the reason for that is this. First,

we designed the chips and all of it in the future will be accelerated by Nvidia. You're going to be very happy about that. We're going to have Agentic chip designers and system

designers working with us, helping us do design just as we have agentic software engineers helping our software engineers code today. And so, we'll have agentic

code today. And so, we'll have agentic chip designers and system designers.

We're going to create you inside this.

But then we have to build you. We have

to build the plants, the factories that make manufacture you. We have to design the manufacturing lines that assemble all of you. And these manufacturing

plants are going to be essentially gigantic robots. Incredible, isn't that

gigantic robots. Incredible, isn't that right?

I know. I know. And so you're going to be designed in a computer. You're going

to be made in a computer. You're gonna

be tested and evaluated in a computer long before long before you have to spend any time dealing with gravity.

I know.

Do you know how to deal with gravity?

Can you jump?

Can you jump?

>> [laughter] >> Okay. All right. Don't show off. Okay.

>> Okay. All right. Don't show off. Okay.

So, so this so now the industry the industry that made Nvidia possible, we're I'm just so happy that that now the technology that we're

creating is at a level of sophistication and capability that we can now help them revolutionize their industry. And so

what started with with uh with them, we now have the opportunity to go back and and help them revolutionize theirs.

Let's take a look at the stuff that we're going to do with Semens.

Come on.

Breakthroughs in physical AI are letting AI move from screens to our physical world.

And just in time, as the world builds factories of every kind for chips, computers, life-saving drugs, and AI, as the global labor shortage worsens, we

need automation powered [music] by physical AI and robotics more than ever.

This, where AI meets the world's largest physical industries, is the foundation of NVIDIA and Seaman's partnership. For

nearly two centuries, Seammens has built the world's industries.

And now [music] it is reinventing it for the age of AI.

Seammens is integrating NVIDIA CUDA X libraries, AI models, and Omniverse into its portfolio of EDA,

CAE, and digital [music] twin tools and platforms. Together, we're bringing physical AI to

the full industrial life cycle.

From design and simulation to production and operations, we stand at the beginning of a new

industrial revolution, the age of physical AI built by Nvidia and Seammens for the next age of industries.

Incredible, right guys?

What do you think? All right, I'll hang on tight. Just hang on tight. And so so

on tight. Just hang on tight. And so so this is, you know, if you look at look at the world's models, there's no question OpenAI is the the the leading

token generator today. More to more open AAI tokens are generated than just about anything else. The second largest group,

anything else. The second largest group, the second largest is probably open models. And my guess is that over time

models. And my guess is that over time because there are so many companies, so many researchers, so many different types of domains and modalities that open-source models will be by far the largest. Let's talk about somebody

largest. Let's talk about somebody really special. You guys want to do

really special. You guys want to do that? Let's talk about Vera Rubin.

that? Let's talk about Vera Rubin.

Vera Rubin. Yeah, go ahead. She's a

American astronomer.

She was the first to observe. She

noticed that the tails of the galaxies were moving about as fast as the center of the galaxies. Well, I

know it makes no sense. It makes no sense. Newtonian physics would say just

sense. Newtonian physics would say just like the solar system, the planets further away from the sun is circulating circ cir circling the sun slower than

the planets closer to the sun. And

therefore it makes no sense that this happens unless there's invisible bodies we call them she

discovered dark body dark matter um that occupies space even though we don't see it and so Vera Rubin is the person that we named our next computer after. Isn't

that a good idea?

I know.

Okay. Okay, Vera Rubin is designed to address this fundamental challenge that we have. The amount of computation

we have. The amount of computation necessary for AI is skyrocketing. The

demand for NVIDIA GPUs is skyrocketing.

It's skyrocketing because models are increasing by a factor of 10, an order of a magnitude every single year. And

not to mention, as I mentioned, 01's introduction was an inflection point for AI. Instead of a oneshot answer,

AI. Instead of a oneshot answer, inference is now a thinking process. And

in order to teach the AI how to think, reinforcement learning and very significant computation was introduced into post training. It wasn't no long

it's no longer supervised fine-tuning or otherwise known as imitation learning or supervision training.

You now have reinforcement learning.

Essentially the computer trial it trying different iterations itself learning how to perform a task. The amount of computation for pre pre-training for

post- training for test time scaling has exploded as a result of that. And now

every single inference that we do instead of just one shot the number of tokens you can just see the AI think which we appreciate. The longer it thinks oftentimes it produces a better answer. And so test time scaling causes

answer. And so test time scaling causes the number of tokens to be generated to increase by 5x every single year. Not to

mention, meanwhile, the race is on for AI.

Everybody's trying to get to the next level. Everybody's trying to get to the

level. Everybody's trying to get to the next frontier. And every time they get

next frontier. And every time they get to the next frontier, the last generation AI tokens, the cost starts to starts to decline about a factor of 10x

every year. The 10x decline every year

every year. The 10x decline every year is actually telling you something different. It's saying that the race is

different. It's saying that the race is so intense. Everybody's trying to get to

so intense. Everybody's trying to get to the next level and somebody is getting to the next level. And so therefore, all of it is a computing problem. The faster

you compute, the sooner you can get to the next level of the next frontier. All

of these things are simultaneously happening at the same time. And so we decided that we have to advance the state-of-the-art of computation

every single year. Not one year left behind. And now we've been shipping

behind. And now we've been shipping GB200s year and a half ago. Right now we're in fullscale manufacturing of GB300.

And if Vera Rubin is going to be in time for this year, it must be in production by now. And so today I can tell you that

by now. And so today I can tell you that Vera Rubin is in full production.

You guys want to take a look at Vera Rubin?

>> All right. Come on.

>> Play it, please.

Vera Rubin arrives just in time for the next frontier of AI.

This is [music] the story of how we built it. The architecture, a system of

built it. The architecture, a system of six chips [music] engineered to work as one, born from extreme code design. It

begins with Vera, [music] a custom-designed CPU, double the performance of the previous generation.

And the Reuben GPU, Vera and Reuben are co-designed from the [music] start to birectionally and coherently share data faster and with lower latency.

Then 17,000 components come together on a Ver Rubin compute board.

High-speed robots place components with micro precision before the Vera CPU and two Reuben GPUs complete the assembly.

Capable of delivering 100 pedlops of AI, five times that of its predecessor.

AI needs data fast.

Connect X9 delivers 1.6 6 terabts per second of scale out bandwidth to each GPU.

Bluefield 4 DPU offloads storage and security [music] so compute stays fully focused on AI.

The Vera Rubin compute tray completely redesigned with no cables, hoses, or fans. Featuring a Bluefield 4 DPU, eight

fans. Featuring a Bluefield 4 DPU, eight Connect X9 Nix, two Vera CPUs, and four Reuben GPUs. The compute building block

Reuben GPUs. The compute building block of the Vera Rubin AI supercomput.

Next, the sixth generation MVLink switch. Moving more data than the global

switch. Moving more data than the global internet, connecting 18 compute nodes, scaling up to 72 Reuben GPUs, operating as one.

Then Spectrum X Ethernet Photonix, the world's first Ethernet [music] switch with 512 lanes and 200 Gbit

capable co-packaged optics scale out thousands of racks into an AI factory.

15,000 engineer years since design began, the first Vera Rubin MVL 72 [music] rack comes online. Six

breakthrough chips, 18 compute trades, nine MVLink switch trays, 220 trillion transistors weighing nearly two tons.

One giant leap to the next frontier of AI.

Reuben is here.

What do you guys think?

This is a Reuben pod. 1152 GPUs

in 16 racks. Each one of the racks, as you know, has 72

Vera Rubin or 72 Reubins. Each one of the Reubins is two actual GPU dies connected together. I'm going to show

connected together. I'm going to show I'm going to show it to you, but there are several things that Well, I'll tell you later.

I can't tell you everything right away.

Well, we designed six different chips.

First of all, we have a rule inside our company and it's a good rule. No new

generation should have more than one or two chips change. But the problem is this. As you could see, we were

this. As you could see, we were describing the total number of transistors in each one of the chips that were being described. And we know that Moore's law has largely slowed. And

so, the number of transistors we can get year after year after year can't possibly keep up with the 10 times larger models. It can't possibly keep up

larger models. It can't possibly keep up with five times per year more tokens generated. It can't possibly keep up

generated. It can't possibly keep up with the fact that cost decline of the tokens are going to be so aggressive. It is impossible to keep up with those kind of rates if the

indust for the industry to continue to advance unless we deploy aggressive extreme code design basically innovating across all of the chips across the

entire stack all at the same time. which

is the reason why we decided that this generation we had no choice but to design every chip over again. Now every

single chip that we were describing just now can be a press conference in all in itself and there's an entire company who's probably dedicated to doing that back in the old days. Each one of them

are completely revolutionary and the best of its kind.

The Vera CPU I'm so proud of it in a power constrained world. Gray CPU is two times the performance in a power constrained world. It's twice the

constrained world. It's twice the performance per watt of the world's most advanced CPUs. Its data rate is insane.

advanced CPUs. Its data rate is insane.

It was designed to process supercomputers and Vera was an incredible GPU. Grace was an incredible

incredible GPU. Grace was an incredible GPU. Now Vera increases the single

GPU. Now Vera increases the single threaded performance, increases the capacity of the memory, increases everything just dramatically. It's a

giant chip. This is the Vera CPU.

This is one CPU.

And this is connected to the Reuben GPU. Look at that thing.

It's a giant chip. Now, the thing that's really special, and I I'll go through these. It's going to take three hands. I

these. It's going to take three hands. I

think four hands to do this. Okay. So,

this is the Vera CPU. It's got 88 CPU cores. And the CPU cores are designed to

cores. And the CPU cores are designed to be multi-threaded. But the

be multi-threaded. But the multi-threaded nature of of Vera was designed so that each one of the 176 threads could get its full full

performance. So it's essentially as of

performance. So it's essentially as of there's 176 cores but only 88 physical cores. So these cores were designed in

cores. So these cores were designed in in using a technology called spatial multi-threading. But the IO performance

multi-threading. But the IO performance is incredible. This is the Reuben GPU.

is incredible. This is the Reuben GPU.

It's 5x blackwell in floating performance. But the important thing is

performance. But the important thing is go to the bottom line. The bottom line it's only 1.6 times the number of transistors in black wall. That kind of tells you something about the the levels

of semiconductor physics today. If we

don't do code design, if we do don't do extreme code design at the level of basically every single chip across the entire system, how is it

possible we deliver performance levels that is, you know, at best one point 1 1.6 times each year? Because that's the total number of transistors you have.

And even if you were to have a little bit more performance per transistor, say 25%, you're this impossible to get a 100% yield out of the number of

transistors you get. And so 1.6x kind of puts a ceiling on how far performance can go each year unless you do something extreme. And we call it extreme code

extreme. And we call it extreme code design. Well, one of the things that one

design. Well, one of the things that one of the things that we did and it was a great invention. It's called MVF FP4

great invention. It's called MVF FP4 tensor core. The transformer engine

tensor core. The transformer engine inside our chip is not just a 4bit floatingoint number somehow that we put into the data path. It is an entire

processor, a processing unit that understands how to dynamically, adaptively adjust its precision and structure to deal with different levels

of the transformer so that you can achieve higher throughput wherever it's possible to lose precision and to go back to the highest possible precision

wherever you need to. That ability to dynamically do that. You can't do this in software because obviously it's just running too fast. And so you have to be

able to do it adaptively inside the processor. That's what an MVF FP4 is.

processor. That's what an MVF FP4 is.

When somebody says FP4 or FP8, it almost means nothing to us. And the reason for that is because it's the tensor core structure in all of the algorithms that makes makes it work. MVFP4, we've

published papers on this already. The

precision that the the level of throughput and precision is able to retain is in completely incredible. This

is groundbreaking work. I would not be surprised that the industry would like us to make this format and this structure and industry standard in the future. This is completely

future. This is completely revolutionary. This is how we were able

revolutionary. This is how we were able to deliver such a gigantic step up in performance even though we only have 1.6 times the number of transistors. Okay.

So this is and now once you have a great processing node and this is the processor node and inside so this is this is for example here let me do this.

This is this is wow it's super heavy.

You have to be a co in really good shape to do this job.

Okay. All right. So, this thing is I'm gonna guess this is probably I don't know couple of hundred pounds.

[laughter] I thought that was funny, too.

Come on. It could have been. Everybody's

gone. No, I don't think so.

All right. [clears throat]

So, so look at this. This is the last one. We revolutionized the entire MGX

one. We revolutionized the entire MGX chassis. This node,

chassis. This node, 43 cables, zero cables, six tubes,

z just two of them here. It takes two hours to assemble this.

If you're lucky, it takes two hours. And

of course, you're probably going to assemble it wrong. You're going to have to retest it, test it, reassemble it. So

the assembly process is incredibly complicated and it was understandable as one of our first supercomputers that's deconstructed in this way. This from 2

hours to 5 minutes 80% liquid cool.

100% liquid cool.

Yeah. Really really a breakthrough.

Okay. So, so this is the new compute chassis and what connects all of these to the top of rack switches, the east west traffic is called the Spectrox

Nick. This is the world's best nick.

Nick. This is the world's best nick.

Unquestionably, Nvidia's Melanox, the acquisition Melanox that joined us a long time ago now. Um, this their networking technology for high performance computing is the world's

best bar none. the algorithms, the chip design, all of the interconnects, all the software stacks that run on top of it, their RDMA, absolutely absolutely bar none, the world's best. And now it

has the ability to do programmable RDMA and data path accelerator so that our partners like AI labs could create their own algorithms for how they want to move data around the system. But this is

completely world worldclass connect X9 and the Vera CPU were co-designed and we never revealed it. not never never released it until CX9 came along because

we we co-designed it for a new type of processor.

You know, Connect X9 or CX8 and Spectrum X revolutionized how Ethernet was done for artificial intelligence. Ethernet

traffic for AI is much much more intense, requires much lower latency.

The the instantaneous surge of traffic is unlike anything Ethernet sees. And so

we created Spectrum X which is AI Ethernet.

Two years ago we announced Spectrum X.

NVIDIA today is the largest networking company the world has ever seen. So it's

been so successful and used in so many different installations. It is just

different installations. It is just sweeping uh the AI landscape. The

performance is incredible especially when you have a 200 um megawatt data center or if you have a gigawatt data center. These are billions of dollars.

center. These are billions of dollars.

Let's say a gigawatt data center is $50 billion dollars. If the networking

billion dollars. If the networking performance allows you to deliver an extra 10% in the case of Spectrum X, delivering

25% higher throughput is not uncommon.

If we were to just deliver 10% that's worth $5 billion. The networking is completely free, which is the reason why well everybody uses Spectrum X. It's just an

incredible thing. And now we're going to

incredible thing. And now we're going to invent a new type a new type of uh uh data processing. And so spectral is for

data processing. And so spectral is for east west traffic. We now have a new processor called blue field 4. Blue

field 4 allows us to take a large large very large data center isolate different parts of it so that different users could use different parts of it. Make

sure that everything could be virtualized if they decide to be virtualized. So you offload a lot of the

virtualized. So you offload a lot of the um virtualization software, the security software, the networking software for your north south traffic. And so

Bluefield 4 comes standard with every single one of these compute nodes.

Bluefield 4 has a second application I'm going to talk about in just a second.

This is a revolutionary processor and I'm so excited about it. This is the MVLink 6 switch and it's right here.

This is the this switch. This switchip

there are four of them inside the MVLink switch here.

Each one of these switchips has the fastest certis in history. The world is barely getting to 200 gigabits. This is

400 gigabits per second switch. The

reason why this is so important is so that we could have every single GPU talk to every other GPU at exactly the same time. This switch, this switch on the

time. This switch, this switch on the back plane of one of these racks enables us to move the equivalent of twice the

amount of the global internet data, twice as all of the world's internet data at twice the speed. You take the cross-sectional bandwidth of the entire

planet's internet, it's about 100 terabytes per second. This is 240 terabytes per second. So it kind of puts it in perspective. This is so that every single GPU can work with every single

other GPU at exactly the same time.

Okay. Then on top of that on top of that okay so this is one rack.

This is one rack. Each one of the racks as you could see the number of transistors in this one rack is 1.7 times.

Yeah. Could you do this for me? So, this

is it's usually about two tons, but today it's two and a half tons because um when they shipped it, they forgot to drain the water out of it.

So, we we shipped a lot of water from California.

[clears throat] Can you hear it squealing?

You know, when you're rotating two and a half tons, you're going to squeal a little.

Oh, you could do it. Wow.

Okay, we just we won't make you do that twice. All right. So, so um so behind

twice. All right. So, so um so behind behind this are the MVLink spines.

Basically, two miles of copper cables.

Copper is the best conductor we know.

And these are all shielded copper cables, structured copper cables, the most the world's ever used in computing systems ever. and and um uh our certis

systems ever. and and um uh our certis drive the copper cables from the top of the rack all the way to the bottom of the rack at 400 gigabits per second.

It's incredible. And so uh this has two miles of total copper cables, 5,000 copper cables, and this makes the MVLink uh spine possible. This is the

revolution that that really started the NGX system. Now we we decided that we

NGX system. Now we we decided that we would create an industry standard system so that the entire ecosystem all of our supply chain could standardize on these

components. There some 80,000

components. There some 80,000 different components that make up this these NGX systems and it's a total waste if we're to change it every single year.

Every single major computer company from Foxcon to Quanta to Wistron, you know, the list goes on and on and on to HP and Dell and Lenovo, everybody knows how to

build these systems. And so the fact that we could squeeze Ruben, Vera Rubin into this even though the performance is so much so much higher and very

importantly the power is twice as high.

The power of Vera Rubin is twice as high as Grace Blackwell. And yet, and this is the miracle, the air that goes into it, the the air

flow is about the same. And very

importantly, the water that goes into it is the same temperature, 45° C. With 45°

C, no water chillers are necessary for data centers. We're basically cooling

data centers. We're basically cooling this supercomput with hot water. Is so

incredibly efficient. And so

this is um this is the new the new rack.

1.7 times more transistors but five times more peak inference performance.

Three and a half times more peak um uh uh training performance.

Okay.

They're connected on top using Spectrum X. Oh, thank you.

X. Oh, thank you.

This is this is the world's first manufacturing chip using uh TSMC's new process that we co-inovated called

coupe. is a silicon photonix integrated

coupe. is a silicon photonix integrated silicon photonix process technology. And

this allows us to take silicon photonix directly right to the chip. And this is 512 ports at 200 Gbits per second. And

this is the new Ethernet AI switch, the Spectrum X Ethernet switch. And look at this giant chip. But what's really amazing, it's got silicon photonics

directly connected to it. And lasers

come in Lasers come in through here. Lasers come

in through here. The optics are here and they connect out to the rest of the data data center. This I'll show you in a

data center. This I'll show you in a second, but this is on top of the rack.

And this is the new Spectrumax um Silicon Photonix switch. Okay.

And we have something new I want to tell you about. So just as I mentioned a

you about. So just as I mentioned a couple years ago, we introduced Spectrum X so that we could reinvent the way that networking is done. Um Ethernet is really easy to

is done. Um Ethernet is really easy to manage and everybody has an Ethernet stack and every data center in the world knows how to deal with Ethernet. Um and

the only thing that we were we were using at the time was called Infiniband which is used for supercomputers.

Infiniband is very low latency. Um but

of course the software stack the entire manageability of Infiniband is very alien to the people who use Ethernet. So

we decided to enter the Ethernet switch market for the very first time. Spectrum

X that just took off and it made us the largest networking company in the world as I mentioned. This next generation Spectrum X is going to carry on that tradition. But just as I said earlier AI

tradition. But just as I said earlier AI has reinvented the whole computing stack, every layer of the computing stack. It stands to reason that when AI

stack. It stands to reason that when AI starts to get deployed in the world's enterprises, it's going to also reinvent the way storage is done. Well, AI

doesn't use SQL. AI use semantics information. And when AI is being used,

information. And when AI is being used, it creates this temporary knowledge, temporary temporary memory calls KV cache, K key value combinations, but

it's a KV cache. Basically, the cache of the AI, the working memory of the AI.

And the working memory of the AI is stored in the HBM memory. Every single

token for every single token, the H the GPU reads in the model, the entire model, it reads in the entire

working memory and it produces one token and it stores that one token back into the KV cache. And then the next to the next time it does that, it reads in the

entire memory, reads it and it streams it through our GPU and then generates another token. Well, it does this

another token. Well, it does this repeatedly, token after token after token. And obviously, if you have a long

token. And obviously, if you have a long conversation with that AI over time, that memory, that context memory is going to grow tremendously. Not to

mention, the models are growing, the number of turns that we're using, the AI are are increasing. We would like to have this AI stay with us our entire life and remember every single conversation we've ever had with it,

right? Every single lick of research

right? Every single lick of research that I've asked it for. Of course, we the number of people that will be sharing the supercomputers is going to continue to grow. And so this context

memory which started out fitting inside an HBM is no longer large enough. Last

year we created Grace Blackwell's very fast memory. we called fast context memory in that's the reason why we connected grace directly to hopper

that's why we connected grace directly to blackwell so that we can expand the context memory but even that is not enough and so the next solution of course is to go off onto the network the

north south network off to the storage of the company but if you have a whole lot of AI running at the same time that network is no longer going to be fast

enough so the answer is very clearly to do it different. And so we intro we created Bluefield 4 so that we could

essentially have a very fast KV cache context memory store right in the rack.

And so I'll show you in just one second, but there's a whole new category of storage systems. And the industry is so excited because this is a pain point for just about everybody who does a lot of

token generation today. the AI labs, the cloud service providers, they're really suffering from the amount of network traffic that's causing being caused by KV cache moving around. And so the idea

that we would create a new platform, a new processor to run the entire Dynamo KV cache context memory management system and to put it very close to the

rest of the rack is completely revolutionary. So this is it. This is it

revolutionary. So this is it. This is it sits right here.

So this this is all the compute nodes.

Each one of these is MVLink 72. So this

is Vera Rubin MVLink 72.4 U Reuben GPUs. This is the context memory that's stored here. Behind each

one of these are four blue fields.

Behind each blue field is 150 gigab 150 terabytes 150 terabytes of memory context memory.

And for each GPU once you allocate it across each GPU will get an additional 16 terabytes. Now inside this node each

16 terabytes. Now inside this node each GPU essentially has one terabyte. And

now with this backing store here directly on the same east west traffic at exactly the same data rate 200 gigabits per second across literally the

entire fabric of this compute node.

you're going to get an additional 16 terabytes of memory. Okay. And this is the management plane. These are these

are the spectrum X switches that connects all of them together.

And over here, these switches at the end connects them to the rest of the data center. Okay? And so this is the Vera

center. Okay? And so this is the Vera Rubin. Now there's several things that's

Rubin. Now there's several things that's really incredible about it. So the first thing that I mentioned is that the this

entire system is twice the energy efficiency essentially the the twice the the the temperature performance in the sense that that even though the power is

twice as high the amount of energy use is twice as high the amount of computation is many times higher than that but the liquid that goes into it still 45 degrees C that enables us to

save about 6% % of the world's data center power. So that's a very big deal.

center power. So that's a very big deal.

The second very big deal is that this entire system is now confidential computing safe. Meaning

everything is encoded in transit at rest and during compute and every single bus is now encrypted. every PCI Express,

every MV link, every H you know for a MV link between CPU me and GPU between GPU to GPU, everything is now encrypted and

so it's confidential computing safe.

This allows companies to feel safe that their models are being deployed by somebody else, but it will never be seen by anybody else. Okay? And so this

particular system is not only incredibly energy efficient and there's one other thing that's incredible because of the nature of the workload of

AI it spikes instantaneously with this computation layer called all reduce the amount of current the amount of energy that is used sp simultaneously

is really off the charts oftentimes it'll spike up 25%. We now have power smoothing across the entire system so that you don't have to overprovision by

25 times or if you overprovision by 25 times you don't have to leave 25 times

25% 25% not 25 times 25% of the energy um squandered or unused and so now you could fill up the entire power budget and you don't have to over you don't

have to proceed you don't have to provision beyond that and then the Last thing of course is performance. So let's

take a look at the performance of this.

These are only charts that people who build AI super supercomputers would love. It took exact it took every single

love. It took exact it took every single one of these chips complete redesign of every single one of the systems and rewriting the entire stack for us to make this possible. Basically

this is training the AI model. This

first column, the faster you train AI models, the faster you can get the next frontier out to the world. This is your time to market. This is technology leadership. This is your pricing power.

leadership. This is your pricing power.

And so in the case of the green, this is essentially a 10 trillion parameter model. We scaled

it up from deepse. That's why we call it deep C++. A training a 10 trillion

deep C++. A training a 10 trillion parameter model on a 100 trillion tokens. Okay. And that's this is our

tokens. Okay. And that's this is our simulation projection of what it would take for us to build the next frontier model. The next frontier model uh Elon's

model. The next frontier model uh Elon's already mentioned that the next version of Grock Grock 5 I think is 7 trillion parameters. This is 10 and in the green

parameters. This is 10 and in the green is black well and here in the case of um Reuben notice the throughput is so much

higher and therefore it only takes 1/4th as many of these systems in order to train the model in the time that we gave

it here which is one month. Okay. And so

time time is the same for everybody. Now

how much how fast you can train that model and how large of a model you can train is how you're going to get to the frontier first. The second part is your

frontier first. The second part is your factory throughput.

Blackwell is green again. And factory

throughput is important because your factory is in the case of a gigawatt is $50 billion. A $50 billion data center

$50 billion. A $50 billion data center can only consume one gawatt of power.

And so if your performance, your throughput per watt is very good versus quite poor, that directly translates to

your revenues. Your revenues of your

your revenues. Your revenues of your data center is directly related to the second second column. And in the case of Blackwell, it was about 10 times over

Hopper. In the case of Reuben, it's

Hopper. In the case of Reuben, it's going to be about 10 times higher again.

Okay? And in the case of now the um the cost of the tokens, how cost effectively it is to generate the token. This is

Reuben about onetenth just as in the case of Yep. [clears throat]

>> [applause] >> So that's how this is how we're going to get everybody to the next frontier to um push AI to the next level and of

course to build these data centers energy efficiently and costefficiently.

So this is it. This is Nvidia today. You

know, we mentioned that we build chips, but as you know, Nvidia builds entire systems now. And AI is a full stack. We

systems now. And AI is a full stack. We

we're reinventing AI across everything from chips to infrastructure to models to applications. And our job is to

to applications. And our job is to create the entire stack so that all of you could create incredible applications uh for the rest of the world. Thank you

all for coming. Have a great CES.

Now, Before [applause and cheering] Before I Before I let you guys go, uh there were a whole bunch of slides we have to cut we had to leave on the cutting floor and so we have some out

takes here. I think it'll be fun for

takes here. I think it'll be fun for you. Have a great see us guys

you. Have a great see us guys and cut.

Nvidia live at CES. Take four. Marker

>> boom mic action.

>> Sorry guys. Platform shift, huh?

>> That should do it.

>> And let's [music] roll camera.

>> A shade of green. A bright happy green.

>> World's most powerful AI supercomput you can plug into the wall. next to my toaster.

>> Hey guys, I'm I'm stuck again. I'm so

sorry.

>> This slide is never going to work. Let's

just cut it.

>> Hello. Can you hear me?

>> So, like [music] I was saying, the router. Because not every problem needs

router. Because not every problem needs the biggest, smartest model. Just the

right one.

>> No, no, don't lose any of them. This new

six chip Reuben [music] platform makes one amazing AI supercomputer.

>> There you go, little guy.

>> Oh no, no, not the scaling laws.

>> There is a squirrel on the car. Be ready

to make the squirrel [music] go away.

Ask the squirrel gently to move away.

>> Did you know the best models today are all mixture of experts?

>> Hey >> [music] >> Where'd everybody go?

Hey, hey, hey. [music]

hey. [music] Hey.

Hey. Hey.

Hey.

Hey. Hey.

Loading...

Loading video analysis...