「中英字幕」輝達執行長黃仁勳 NVIDIA GTC 大會演講

By 電腦王阿達

Summary

Topics Covered

Accelerated Computing Ends Dennard Scaling
CUDA Libraries Unlock Domain Revolutions
Nvidia Arc Reclaims 6G Leadership
AI Factories Produce Tokens Not Files
Extreme Codesign Delivers 10x AI Leaps

Full Transcript

Welcome to the stage Nvidia founder and CEO Jensen Wong.

Washington DC.

Washington DC. Welcome to GTC.

[Music] It's hard not to be sentimental and proud of America. I got to tell you that. Was that video amazing?

that. Was that video amazing?

Thank you.

Nvidia's creative team does an amazing job. Welcome to GTC. We have a lot to

job. Welcome to GTC. We have a lot to cover with you today. Um GTC is where we talk about industry, science,

computing, the present, and the future. So, I've

got a lot to cover with you today, but before I start, I want to thank all of our partners who helped sponsor this great event. You'll see all of them

great event. You'll see all of them around the show. They're here to meet with you and uh uh really great. We

couldn't do what we do without all of our ecosystem partners. Uh this is the Super Bowl of AI, people say. And

therefore, every Super Bowl should have an amazing pregame show. What do you guys think about the pregame show? and

our all allstar allstar athletes and allstar cast. Look at these guys.

allstar cast. Look at these guys.

Somehow I turned out the buffest. What

do you guys think?

I don't know if I had something to do with that.

Nvidia invented a new computing model for the first time in 60 years. As you

saw in the video, a new computing model rarely comes about. It takes an enormous amount of time and set of conditions. We

observed, we invented this computing model because we wanted to solve problems that generalpurpose computers, normal computers could not. We also

observed that someday transistors will continue. The number of transistors will

continue. The number of transistors will grow, but the performance and the power of transistors will slow down. that Moore's law will

not continue beyond limited by the laws of physics and that moment has now arrived. Dinard scaling has stopped.

arrived. Dinard scaling has stopped.

It's called dinard scaling. Dard scaling

has stopped nearly a decade ago and in fact the transistor performance and its power associated has slowed tremendously and yet the number of transistor

continued. We made this observation a

continued. We made this observation a long time ago and for 30 years we've been advancing this form of computing we call accelerated computing. We invented

the GPU. We invented the the programming model called CUDA. And we observed that if we could add a processor that takes advantage of more and more and more

transistors, apply parallel computing, add that to a sequential processing CPU that we could extend the capabilities of

computing well beyond well beyond. And

that moment has really come. We have now seen that inflection point. Accelerated

computing its moment has now arrived.

However, accelerated computing is a fundamentally different programming model. You can't just take a CPU

model. You can't just take a CPU software software written by hand executing sequentially and put it onto a GPU and have it run properly. In fact,

if you just did that, it actually runs slower. And so, you have to reinvent new

slower. And so, you have to reinvent new algorithms. You have to create new libraries. You have to in fact rewrite

libraries. You have to in fact rewrite the application which is the reason why it's taken so long. It's taken us nearly 30 years to get here. But we did it one

domain at a time.

This is the treasure of our company.

Most people talk about the GPU. The GPU

is important, but without a programming model that sits on top of it, and without dedication to that programming model, keeping it compatible over

generations, we're now CUDA 13 coming up with CUDA 14, hundreds of millions of GPUs running in every single computer, perfectly

compatible. If we didn't do that, then

compatible. If we didn't do that, then developers wouldn't target this computing platform. If we didn't create

computing platform. If we didn't create these libraries, then developers wouldn't know how to use the algorithm and use the architecture to its fullest.

One application after another. I mean,

these this is really the this is really the treasure of our company. CU litho

computational lithography.

It took us nearly seven years to get here with KU Litho and now TSMC uses it.

Samsung uses it, ASML uses it. This is

an incredible library for computational lithography. the first step of making a

lithography. the first step of making a chip. Sparse solvers for CAE

chip. Sparse solvers for CAE applications.

Co-op, a numerical optimization is broken just about every single record.

The traveling salesperson problem, how to connect millions of products with millions of customers in the supply

chain. Warp Python solver for CUDA for

chain. Warp Python solver for CUDA for simulation. QDF a dataf frame approach

simulation. QDF a dataf frame approach basically accelerating SQL dataf frame pro dataf frame

databases. Um this library is the one

databases. Um this library is the one that started AI alto together coupnn the the the library on top of it called

megatron core made it possible for us to simulate and train extremely large language models. The list goes on. Uh,

language models. The list goes on. Uh,

Monai, really, really important, the number one medical imaging AI framework in the world. Uh, by the way, we're not going to talk a lot about healthcare today, but be sure to see

Kimberly's keynote. She's going to talk

Kimberly's keynote. She's going to talk a lot about the work that we do in healthcare. And the list goes on. Uh,

healthcare. And the list goes on. Uh,

genomics processing, Ariel, pay attention. We're going to do something

attention. We're going to do something really important here today. Um, coup

quantum, quantum computing. This is just a representative of 350 different libraries in our company. And each one of these libraries redesigned the

algorithm necessary for accelerated computing. Each one of these libraries

computing. Each one of these libraries made it possible for all of the ecosystem partners to take advantage of accelerated computing. And each one of

accelerated computing. And each one of these libraries opened new markets for us. Let's take a look at what CUDA X can

us. Let's take a look at what CUDA X can do.

Ready, go.

[Music] Heat.

[Music] Heat.

[Music] Heat. Heat.

Heat. Heat.

[Music] Heat.

[Music] Heat.

Heat.

[Music] Heat.

[Music] [Applause]

[Music]

[Music] Heat.

[Music]

Is that amazing?

Every everything you saw was a simulation.

There was no art, no animation. This is

the beauty of mathematics. This is deep computer science, deep math. And it's

just incredible how beautiful it is.

Every industry was covered from healthcare and life sciences to manufacturing robotics autonomous vehicles, computer graphics, even video

games. That first shot that you saw was

games. That first shot that you saw was the first application Nvidia ever ran.

And that's where we started in 1993. And

we kept believing in what we were trying to do. And it took, it's hard to imagine

to do. And it took, it's hard to imagine that you could see that first virtual fighter scene come alive.

And that same company believed that we would be here today. It's just a really, really incredible journey. I want to thank all the NVIDIA employees for everything that you've done. It's really

incredible.

We have a lot of industries to cover today. I'm going to cover AI,

today. I'm going to cover AI, 6G, quantum models, enterprise computing, robotics, and

factories. Let's get started. We have a

factories. Let's get started. We have a lot to cover, a lot of big announcements to make, a lot of new partners that would very much surprise you.

Telecommunications is the backbone, the lifeblood of our economy, our industries, our national security. And yet,

security. And yet, ever since the beginning of wireless where we defined the technology, we defined the global standards. We

exported American technology all around the world so that the world can build on top of American technology and standards. It has been a long time since

standards. It has been a long time since that's happened. Wireless technology

that's happened. Wireless technology around the world largely today deployed on foreign technologies.

Our fundamental communication fabric built on foreign technologies.

That has to stop and we have an opportunity to do that especially during this fundamental platform shift. As you

know, computer technology is at the foundation of literally every single industry. It is the single most

industry. It is the single most important instrument of science. It's

the single most important instrument of industry.

And I just said we're going through a platform shift. That platform shift

platform shift. That platform shift should be the once-in-a-lifetime opportunity for us to get back into the game for us to start innovating with

American technology. today. Today we're

American technology. today. Today we're

announcing that we're going to do that.

We have a big partnership with Nokia.

Nokia is the second largest telecommunications maker in the world.

It's a $3 trillion industry.

Infrastructure is hundreds of billions of dollars. There are millions of base

of dollars. There are millions of base stations around the world.

If we could partner, we could build on top of this incredible new technology fundamentally based on accelerated computing and AI and for United States,

for America to be at the center of the next revolution in 6G. So today we're announcing that Nvidia has a new product

line. It's called the Nvidia Arc, the

line. It's called the Nvidia Arc, the aerial radio network computer. Aerial

RAM computer ARC. ARC is built from three fundamental new technologies. The

gray CPU, the Blackwell GPU, and our ConnectX Melanox Connect X networking designed for this application. And all

of that makes it possible for us to run this library, this CUDA X library that I mentioned earlier called Aerial. Aerial

is essentially a wireless communication system running on top of CUDA X. We're going to we're going to

CUDA X. We're going to we're going to create for the first time a softwaredefined programmable computer that's able to

communicate wirelessly and do AI processing at the same time. This is

completely revolutionary. We call it Nvidia Arc and Nokia is going to work with us to integrate our technology, rewrite their stack.

This is a company with 7,000 fundamental essential 5G patents.

Hard to imagine any greater leader in telecommunications. So, we're going to

telecommunications. So, we're going to partner with Nokia. They're going to make Nvidia Arc their future base station. Nvidia Arc is also compatible

station. Nvidia Arc is also compatible with airscale, the current Nokia base stations. So what that means is we're

stations. So what that means is we're going to take this new technology and we'll be able to upgrade millions of base stations around the world with 6G

and AI. Now 6G and AI is really quite

and AI. Now 6G and AI is really quite fundamental in the sense that for the first time we'll be able to use AI technology

AI for RAN to make radio communications more spectral efficient doing using artificial intelligence reinforcement learning adjusting the

beam forming in real time in context depending on the surroundings and the traffic and the mobility the weather all of that could be taken into account so that we could improve spectral

efficiency. Spectral efficiency consumes

efficiency. Spectral efficiency consumes about one and a half to 2% of the world's power. So improving spectral

world's power. So improving spectral efficiency not only improves the amount of data we can put through wireless networks without increasing the amount of energy necessary. The other thing

that we could do AI for RAN is AI on RAM. This is a brand

new opportunity. Remember the internet

new opportunity. Remember the internet enabled communications but amazingly smart companies AWS built a cloud

computing system on top of the internet.

We are now going to do the same thing on top of the wireless telecommunications network. This new cloud will be an edge

network. This new cloud will be an edge industrial robotics cloud. This is where

AI on RAN the first is AI for RAN to improve radio radio spectrum efficiency.

The second is AI on RAN essentially cloud computing for wireless telecommunications.

Cloud computing will be able to go right out to the edge where data centers are not are not because we have base stations all over the world. This

announcement is really exciting. Justin

Hodar the CEO I think he's somewhere in the room. Thank you for your

the room. Thank you for your partnership. Thank you for helping

partnership. Thank you for helping United States bring telecommunication technology back to America. This is

really a fantastic, fantastic partnership. Thank you very much.

partnership. Thank you very much.

That's the best way to celebrate Nokia.

Let's talk about quantum computing.

1981 particle physicist quantum physicist Richard Feman imagined a new type of computer that can simulate nature

directly to simulate nature directly because nature is quantum. He called it a quantum computer. 40 years later the industry has made a fundamental

breakthrough. 40 years later, just last

breakthrough. 40 years later, just last year, a fundamental breakthrough. It is

now possible to make one logical cubit. One logical cubit. One

logical cubit. One logical cubit. One

logical cubit that's coherent, stable, and error corrected. In past, now that one logical cubit consists of could be sometimes tens, sometimes hundreds of

physical cubits all working together. As

you know, cubits, these particles are incredibly fragile. They could be

incredibly fragile. They could be unstable very easily. Any observation,

any sampling of it, any environmental condition causes it to become decoherent. And so it takes

decoherent. And so it takes extraordinarily well-controlled environments and now also a lot of different physical cubits for them to work together and for us to do error

correction on these what are called auxiliary or syndrome cubits for us to error correct them and infer what that

logical cubit state is.

There are all kinds of different types of quantum computers. Superconducting,

photonic, trapped ion, stable atom, all kinds of different ways to create a quantum computer. Well, we now realize

quantum computer. Well, we now realize that it's essential for us to connect a quantum computer directly to a GPU supercomput so that we could do the

error correction, so that we could do the artificial intelligence calibration and control of the quantum computer and

so that we could do simulations collectively working together. the right

algorithms running on the GPUs, the right algorithms running on the QPUs and the two processors, the two computers

working side by side. This is the future of quantum computing. Let's take a look.

There are many ways to build a quantum computer.

Each uses cubits, quantum bits as its core building block. But no matter the method, all cubits, whether superc conducting cubits, trapped ions, neutral

atoms, or photons, share the same challenge. They're fragile and extremely

challenge. They're fragile and extremely sensitive to noise. Today's Qbits remain stable for only a few hundred operations. But solving meaningful

operations. But solving meaningful problems requires trillions of operations. The answer is quantum error

operations. The answer is quantum error correction. Measuring disturbs a cubit

correction. Measuring disturbs a cubit which destroys the information inside it. The trick is to add extra cubits in

it. The trick is to add extra cubits in tangle so that measuring them gives us enough information to calculate where errors occurred without damaging the

cubits we care about. It's brilliant but needs beyond state-of-the-art conventional compute.

That's why we built NVQL link, a new interconnect architecture that directly connects quantum processors with NVIDIA GPUs.

Quantum error correction requires reading out information from Qbits, calculating where errors occur and sending data back to correct them. MVQL

link is capable of moving terabytes of data to and from quantum hardware, the thousands of times every second needed for quantum error correction.

At its heart is CUDAQ, our open platform for quantum GPU computing. Using MVQL

link and CUDAQ, researchers will be able to do more than just error correction.

They will also be able to orchestrate quantum devices and AI supercomputers to run quantum GPU applications.

Quantum computing won't replace classical systems. They will work together fused into one accelerated quantum supercomputing platform.

Wow, this is a really long stage.

You know, CEOs, we don't just sit at our desk typing. It's this is a physically

desk typing. It's this is a physically job. Physical job. So, so today we're

job. Physical job. So, so today we're announcing the NV MVQL link MVQ link and it's made possible by two things. Of

course, this interconnect that does quantum computer control and calibration, quantum error correction as well as

connects two computers, the QPU and our GPU supercomputers to do hybrid simulations.

It is also completely scalable. It

doesn't just do error correction for today's number of few cubits. It does

error correction for tomorrow where we're going to essentially scale up these quantum computers from the hundreds of cubits we have today to tens of thousands of cubits, hundreds of

thousands of cubits in the future. So we

now have an architecture that can do control, co- simulation, quantum error correction and scale into that future.

The industry support has been incredible between the invention of CUDA Q.

Remember CUDA was designed for GPU CPU accelerated computing. Basically using

accelerated computing. Basically using both processors to do use the right tool to do the right job. Now CUDAQ has been extended beyond CUDA so that we could

support QPU and have the two processors QPU and the GPU work and have computation move back and forth within just a few microsconds. The essential

latency to be able to cooperate with the quantum computer. So now CUDAQ is such

quantum computer. So now CUDAQ is such an incredible breakthrough adopted by so many different developers. We are

announcing today 17 different quantum computer industry companies supporting the MVQ link and and I'm so excited

about this eight different DOE labs Berkeley Brook Haven Fermy Labs in Chicago Lincoln Laboratory Los Alamos Oakidge Pacific Northwest San Diego

Lancha Lab just about every single DOE lab has engaged us working with our ecosystem of quantum computer companies and these quantum controllers so that we

could integrate quantum computing in into the future of science.

Well, I have one more additional announcement to make. Today, we're

announcing that the Department of Energy is partnering with NVIDIA to build seven new AI supercomputers to advance our nation's science.

I have to have a shout out for Secretary Chris Wright. He has brought so much

Chris Wright. He has brought so much energy to the DOE, a surge of energy, a surge of passion to make sure that America leads science. Again as I

mentioned computing is the fundamental instrument of science and we are going through several platform shifts on the one hand we're going to accelerated computing that's why every future supercomputer will be GPUbased

supercomputer we're going to AI so that AI and principled solvers principled simulation principal physics simulation is not going to go away but it could be

augmented enhanced scaled use surrogate models AI models working together. We

also know that principal solvers, classical computing, could be enhanced to understand the state of nature using quantum computing. We also know that in

quantum computing. We also know that in the future, we have so much signal, so much data we have to sample from the world, remote sensing is more important

than ever. And these laboratories are

than ever. And these laboratories are impossible to experiment at the scale and speed we need to unless they're robotic factories, robotic laboratories.

So all of these different technologies are coming into science at exactly the same time. Secretary Wright understands

same time. Secretary Wright understands this and he wants the DOE to take this opportunity to supercharge themselves and make sure the United States stay at

the forefront of science. I want to thank all of you for that. Thank you.

Let's talk about AI.

What is AI? Most people would say that AI is a chatbot and it it's rightfully so. There's no question that chat GPT is

so. There's no question that chat GPT is at the forefront of what people would consider AI. However, just as you see

consider AI. However, just as you see right now, these scientific supercomputers are not going to run chatbots. They're going to do basic

chatbots. They're going to do basic science. Science, AI, the world of AI is

science. Science, AI, the world of AI is much, much more than a chatbot. Of

course, the chatbot is extremely important and AGI is fundamentally critical. Deep computer science,

critical. Deep computer science, incredible computing, great breakthroughs are still essential for AGI. But beyond that, AI is a lot more.

AGI. But beyond that, AI is a lot more.

AI is in fact, I'm going to describe AI in a couple different ways. That's first

way. The first way you think about AI is that it has completely reinvented the computing stack.

The way we used to do software was hand coding. Hand coding software running on

coding. Hand coding software running on CPUs.

Today AI is machine learning training data inensive programming if you will trained and learned by AI that runs on a

GPU. In order to make that happen, the

GPU. In order to make that happen, the entire computing stack has changed.

Notice you don't see Windows up here.

You don't see CPU up here. You see a whole different a whole fundamentally different stack. Everything from the

different stack. Everything from the need for energy and this is another area where our administration President Trump gets deserves enormous credit. His pro-

energy initiative, his recognition that this industry needs energy to grow. It

needs energy to advance and we need energy to win. His recognition of that and putting the weight of the nation behind pro- energy growth completely changed the game. If this didn't happen,

we could have been in a bad situation.

And I want to thank President Trump for that.

On top of energy are these GPUs. And

these GPUs are connected into built into infrastructure that I'll show you later.

On top of this infrastructure which in consists of giant data centers like easily many times the size of this room enormous amount of energy which then

transfer transforms the energy through this new machine called GPU supercomputers to generate numbers.

These numbers are called tokens.

the language, if you will, the computational unit, the vocabulary of artificial intelligence. You can

artificial intelligence. You can tokenize almost anything. You can

tokenize, of course, the English word.

You can tokenize images. That's the

reason why you're able to recognize images or generate images, tokenize video, tokenize 3D structures. You could

tok to tokenize chemicals and proteins and genes. You could tokenize cells,

and genes. You could tokenize cells, tokenize almost anything with structure, anything with information content.

Once you could tokenize it, AI can learn that language and the meaning of it.

Once it learns the meaning of that language, it can translate. It can

respond just like you respond just like you interact with chatgpt. And it could generate just as chat GPD can generate.

So all of the fundamental things that you see Chad GPD do, all you have to do is imagine what if it was a protein, what if it was a chemical, what if it

was a 3D structure like a factory, what if it was a robot and the token was understanding behavior

and tokenizing motion and action. All of

those concepts are basically the same, which is the reason why AI is making such extraordinary progress. And on top of these models are applications.

Transformers.

Transformers is not a universal model. It's

incredibly effective model. But there's

no one universal model. It's just that AI has universal impact. There are so many different types of models. There's

in the last several years we enjoyed the invention and went through the innovation breakthroughs of multimodality.

There's so many different types of models. There's CNN models, competition

models. There's CNN models, competition neuronet network models, their state space models, the graph neuronet network models, multimodal models, of course, all the different tokenizations and

token methods that I just described. You

could have models that are spatial and it's understanding optimized for spatial awareness. You could have models that

awareness. You could have models that are optimized for long sequence recognizing subtle information over a long period of time. There are so many

different types of models.

On top of these models architectures, on top of these model architectures are applications, the software of the past. And this is a a profound understanding, a profound

observation of artificial intelligence that the software industry of the past was about creating tools. Excel is a tool.

Word is a tool. A web browser is a tool.

The reason why I know these are tools is because you use them. The tools

industry, just as screwdrivers and hammers, the tools industry is only so large. In the case of IT tools, they

large. In the case of IT tools, they could be database tools. These IT tools is about a trillion dollars or so. But

AI is not a tool.

AI is work.

That is the profound difference. AI is

in fact workers that can actually use tools. One of the things I'm really

tools. One of the things I'm really excited about is the work that Irvin's doing at Perplexity. Perplexity using

web browsers to book vacations or do shopping. Basically an AI using tools.

shopping. Basically an AI using tools.

Cursor is an AI anantic AI system that we use at NVIDIA. Every single software engineer at NVIDIA uses Cursor. That's

improved our productivity tremendously.

It's basically a partner for every one of our software engineers to generate code and it uses a tool and the tool it uses is called VS code. So cursor is an

AI agentic AI system that uses VS code.

Well, all of these different industries, these different industries, whether it's chat bots or digital biology where we have AI assistant researchers or what is

a robo taxi inside a robo taxi? Of course, it's invisible, but obviously there's a AI

chauffeur. That chauffeur is doing work

chauffeur. That chauffeur is doing work and the tool that it uses to do that work is the car. And so everything that we've made up until now, the whole

world, everything that we've made up until now are tools. Tools for us to use. For the very first time, technology

use. For the very first time, technology is now able to do work and help us be more productive. The list of

more productive. The list of opportunities go on and on, which is the reason why AI addresses the segment of the economy that it has

never addressed. It is a few trillion

never addressed. It is a few trillion dollars that sits underneath the tools of a hundred trillion dollar global

economy. Now for the first time AI is

economy. Now for the first time AI is going to engage that hundred trillion dollar economy and make it more productive, make it grow faster, make it

larger. We have a severe shortage of

larger. We have a severe shortage of labor. Having AI that augments labor is

labor. Having AI that augments labor is going to help us grow. Now what's

interesting about this from a technology industry perspective also is that in addition to the fact that AI is new technology that addresses new segments of the economy AI in itself is also a

new industry this token as I was explaining earlier these numbers after you tokenize all these different modalities of information there's a

factory that needs to produce these numbers unlike the computer industry indry and the chip industry of the past.

Notice if you look at the chip industry of the past, the chip industry represents about 5 to 10%

maybe less 5% or so of a multi- trillion dollar few trillion dollar IT industry.

And the reason for that is because it doesn't take that much computation to use Excel. It doesn't take that much

use Excel. It doesn't take that much computation to use browsers. It doesn't

take that much computation to use word.

We do the computation. But in this new world, there needs to be a computer that understands context all the time. It

can't precomputee that because every time you use the computer for AI, every time you ask the AI to do something, the context is different. So, it has to process all of that information.

Environmental, for example, in the case of a self-driving car, it has to process the context of the car. context

processing. What is the instruction you're asking the AI to do? Then it's

got to go and break down the problem step by step, reason about it, and come up with a plan and execute it. Every

single one of that step requires enormous number of tokens to be generated, which is the reason why we need a new type of system and I call it

an AI factory. It's an AI factory for sure. It's unlike a data center of the

sure. It's unlike a data center of the past. is an AI factory because

past. is an AI factory because this factory produces one thing unlike the data centers of the past that does everything. Stores files for all of us,

everything. Stores files for all of us, runs all kinds of different applications. You could use that data

applications. You could use that data center like you can use your computer for all kinds of applications. You could

use it to play game one day. You could

use it to browse the web. You could use it, you know, to do your accounting. And

so that is a computer of the past, a universal generalpurpose computer.

The computer I'm talking about here is a factory. It runs basically one thing. It

factory. It runs basically one thing. It

runs AI and its purpose, its purpose is designed to produce tokens that are as valuable as possible, meaning they have to be smart. And you want to produce

these tokens at incredible rates because when you ask an AI for something, you would like it to respond. And notice

during peak hours, these AIs are now responding slower and slower because it's got a lot of work to do for a lot of people. And so you wanted to produce

of people. And so you wanted to produce valuable tokens at incredible rates and you wanted to produce it cost effectively. Every single word that I

effectively. Every single word that I used are consistent with an AI factory, with a car factory or any factory. It is

absolutely a factory. And these

factories, these factories never existed before. And inside these factories are

before. And inside these factories are mountains and mountains of chips.

Which brings to today.

What happened in the last couple years?

And in fact, what happened this last year? Something fairly profound happened

year? Something fairly profound happened this year. Actually, if you look in the

this year. Actually, if you look in the beginning of the year, everybody has some attitude about AI. That attitude is generally this is going to be big. It's

going to be the future. And somehow a few months ago, it kicked into turbocharge. And the reason for that is

turbocharge. And the reason for that is several things.

The first is that we in the last couple years have figured out how to make AI much much smarter.

Rather than just pre-training, pre-training basically says let's take all of the all of the information that humans have ever created. Let's give it to the AI to learn from. It's

essentially memorization and generalization.

It's no it's not unlike going to school back when we were kids. the first stage of learning. Pre-training was never

of learning. Pre-training was never meant just as preschool was never meant to be the end of education.

Pre-training, preschool was simply teaching you the basic skills of intelligence so that you can understand how to learn everything else. Without

vocabulary, without understanding of language and how to communicate, how to think, it's impossible to learn everything else. The next is post

everything else. The next is post training. Post-training after

training. Post-training after pre-training is teaching you skills.

Skills to solve problem. Break down

problems, reason about it, how to solve math problems, how to code, how to think about these problems step by step, use first principal reasoning. And then

after that is where computation really kicks in. As

you know, for many of us, you know, we went to school and that's in my case decades ago, but ever since I've learned more, thought about more, and the reason

for that is because we're constantly grounding oursel in new knowledge. We're

constantly doing research, and we're constantly thinking. Thinking is really

constantly thinking. Thinking is really what intelligence is all about. And so

now we have three fundamental technology skills. We have these three technology

skills. We have these three technology pre-training which still requires enormous enormous amount of computation.

We now have post-training which uses even more computation. And now thinking puts incredible amounts of computation load on the infrastructure because it's

thinking on our behalf for every single human. So the amount of computation

human. So the amount of computation necessary for AI to think inference is really quite extraordinary. Now I used to hear people say that inference is easy. Nvidia should do training. Nvidia

easy. Nvidia should do training. Nvidia

is going to do you know they are really good at this so they're going to do training. That inference was easy. How

training. That inference was easy. How

could thinking be easy? Regurgitating

memorized content is easy. Regurgitating

the multiplication tables easy. Thinking

is hard. Which is the reason why these three scales, these three new scaling laws which is all of it in in full steam has put so much pressure on the amount

of computation. Now another thing has

of computation. Now another thing has happened from these three scaling laws. We get

smarter models and these smarter models need more compute. But when you get smarter models, you get more intelligence.

People use it as if anything happens. I want to be the first one out.

Just kidding. I'm sure it's fine.

Probably just lunch. My stomach. Was

that me?

And so, so where was I? The smarter your models are, the smarter your models are, the more people use it, it's now more grounded. It's able to reason. It's able

grounded. It's able to reason. It's able

to solve problems it never learn how to solve before because it could do research. Go learn about it, come back,

research. Go learn about it, come back, break it down, reason about how to solve your how to answer your question, how to solve your problem, and go off and solve it. The amount of thinking is making the

it. The amount of thinking is making the models more intelligent. The more

intelligent it is, the more people use it. The more intelligent it is, the more

it. The more intelligent it is, the more computation is necessary. But here's

what happened.

This last year, the AI industry turned the corner, meaning that the AI models are now smart

enough. They're making they're worthy.

enough. They're making they're worthy.

They're worthy to pay for. Nvidia pays

for every license of Cursor. And we

gladly do it.

We gladly do it because cursor is helping a several hundred,000 employee software engineer or AI researcher be many many times more productive. So of

course we'd be more than happy to do that. These AI models have become good

that. These AI models have become good enough that they are worthy to be paid for. Cursor 11 Labs synthesia a bridge

for. Cursor 11 Labs synthesia a bridge open evidence the list goes on. Of

course, open AI, of course, claude.

These models are now so good that people are paying for it. And because people are paying for it and using more of it, and every time they use more of it, you need more compute. We now have two

exponentials.

These two exponentials, one is the exponential compute requirement of the three scaling law. And the second exponential, the more people, the smarter it is, the more people use it,

the more people use it, the more computing it needs. two exponentials now putting pressure on the world's computational resource

at exactly the time when I told you earlier that Moore's law has largely ended and so the question is what do we do if we have these two exponential

demands growing and if we don't if we don't find a way to drive the cost down then this positive feedback system this circular feedback system essentially

called the virtual cycle. Essential for

almost any industry, essential for any platform industry. It

was essential for Nvidia. We have now reached the virtual cycle of CUDA.

The more applications, the more the more applications people create, the more valuable CUDA is, the more valuable CUDA is, the more CUDA computers are purchased. the more could p computers

purchased. the more could p computers are purchased more developers want to create applications for it that virtual cycle

for Nvidia has now been achieved after 30 years we have achieved that also 15 years later we've achieved that for AI

AI has now reached the virtual cycle and so the more you use it because the AI is smart and we pay for it the more profit is generated the more profit generated the more computes put to on the on the

grid. The more compute is put into AI

grid. The more compute is put into AI factories, the more comput the AI becomes smarter, the smarter, more more people use it, more applications use it, the more problems we can solve. This

virtual cycle is now spinning. What we

need to do is drive the cost down tremendously so that one, the user experience is better. When you prompt the AI, it responds to you much faster.

And two, so that we keep this virtual cycle going by driving its cost down so that it could get smarter, so that more people use it, so that so on so forth.

That virtual cycle is now spinning. But

how do we do that when Moore's law has really reached this limit? Well, the

answer is called codeesign.

You can't just design chips and hope that things on top of it is going to go faster. The best you could do in

faster. The best you could do in designing chips is add I don't know 50% more transistors every couple of years and if you added more transistors just you know we can add more transistors and

TSMC's got a lot of transistor incredible company we just keep adding more transistors however that's all in percentages not exponentials

we need to compound exponentials to keep this virtual cycle going extreme code design is the only company in the world today that literally starts from a blank

sheet of paper and can think about new fundamental architecture, computer architecture, new chips, new systems, new software, new model architecture and

new applications all at the same time.

So many of the people in this room are here because you're different parts of that layer that different parts of that stack and working with Nvidia.

We fundamentally rearchitect everything from the ground up and then because AI is such a large problem, we scale it up.

We created a whole computer, a computer for the first time that has scaled up into an entire racket. That's one

computer, one GPU. And then we scale it out by inventing a new AI Ethernet technology we call Spectrum X Ethernet.

Everybody will say Ethernet is Ethernet.

Ethernet is hardly Ethernet. Ethernet

spectrum X Ethernet is designed for AI performance and it's the reason why it's so successful. And even that's not big

so successful. And even that's not big enough. We'll fill this entire room of

enough. We'll fill this entire room of AI supercomputers and GPUs.

That's still not big enough because the number of applications and the number of users for AI is continuing to grow exponentially. And we connect multiple

exponentially. And we connect multiple of these data centers together and we call that scale across spectrum XGS

gigascale X spectrum X gigascale XGS. By

doing so, we do code design at such a such an enormous level, such an extreme level that the performance benefits are shocking. Not 50% better each

shocking. Not 50% better each generation, not 25% better each generation, but much much more. This is

the most extreme code-designed computer we've ever made and quite frankly made in modern times. Since the IBM system 360, I don't think a computer has been

ground up, reinvented like this ever.

This system was incredibly hard to create. I'll show you the benefits in

create. I'll show you the benefits in just a second. But essentially what we've done, essentially what we've done, we've created otherwise

Hey Janine, you can come out. It's

you have to have to meet me like halfway.

All right. So, this is kind of like Captain America shield.

So, MVLink 72, MVLink 72, if we were to create one giant chip, one giant GPU, this is what it would look like. This is

the level of wafer scale processing we would have to do.

It's incredible. All of this, all of these chips are now put into one giant rack.

Did I do that or did somebody else do that? Into that one giant rack.

that? Into that one giant rack.

You know, sometimes I don't feel like I'm up here by myself.

Just this one giant rack makes all of these chips work together as one. It's

actually completely incredible. And I'll

show you the benefits of that. The way

it looks is this. So, thanks Janine.

I I like this. All right, ladies and gentlemen. Janine Paul.

gentlemen. Janine Paul.

I got it. In the future next, I'm just going to go like Thor.

It's like when you're at home and and you can't reach the remote and you just go like this and somebody brings it to you. That's Yeah. Same idea.

you. That's Yeah. Same idea.

It never happens to me. I'm just

dreaming about it. I'm just saying.

Okay. So, so anyhow, anyhow, um we basically this is what we created in the past. This is MVLink MVLink 8. Now these

past. This is MVLink MVLink 8. Now these

models are so gigantic. The way we solve it is we turn this model, this gigantic model into a whole bunch of experts.

It's a little bit like a team. And so

these experts are good at certain types of problems and we collect a whole bunch of experts together. And so this giant multi- trillion dollar AI model has all

these different experts and we put all these different experts on a GPU. Now

this is NVLink 72.

We could put all of the chips into one giant fabric and every single expert can talk to each other. So the master the the primary expert could talk to all of

the work and all of the necessary context and prompts and bunch of data that we have to bunch of tokens that we have to send to all of the experts. The

experts would whichever one of the experts are selected to solve the answer would then go off and try to respond and then it would go off and do that layer

after layer after layer. Sometimes

eight, sometimes 16 and sometimes these experts, sometimes 64, sometimes 256.

But the point is there are more and more and more experts. Well, here MVLink72, we have 72 GPUs. And because of that, we

could put four experts in one GPU.

The most important thing you need to do for each GPU is to generate tokens, which is the amount of bandwidth that you have in HPM memory.

We have one H one GPU generating thinking for four experts versus here because each one of the computers can

only put eight GPUs. We have to put 32 experts into one GPU. So this one GPU has to think for 32 experts versus this

system each GPU only has to think for four. And because of that the speed

four. And because of that the speed difference is incredible. And this just came out. This is the benchmark done by

came out. This is the benchmark done by semi analysis. They do a really really

semi analysis. They do a really really thorough job and they benchmarked all of the GPUs that are benchmarkable and it turns out it's not that many. If

you look at the list of looks list of GPUs you could actually benchmark is like 90% Nvidia. Okay. And but so we're comparing ourselves to ourselves but the second best GPU in the world is the H200

and runs all the workload.

Grace Blackwell per GPU is 10 times the performance.

Now, how do you get 10 times the performance when it's only twice the number of transistors?

Well, the answer is extreme code design.

And by understanding the nature of the future of AI models and we're thinking across that entire stack, we can create architectures for the future. This is a big deal. It says we can now respond a

big deal. It says we can now respond a lot faster. But this is even bigger

lot faster. But this is even bigger deal. This next one, look at this. This

deal. This next one, look at this. This

says that the lowest cost tokens in the world are generated by Grace Blackwell Envy

Link 72. The most expensive computer

Link 72. The most expensive computer on the one hand GB200 is the most expensive computer. On the other hand,

expensive computer. On the other hand, its token generation capability is so great that it produces it at the lowest cost because the tokens per second

divided by the t by the total cost of ownership of Grace Blackwell is so good.

It is the lowest cost way to generate tokens. By doing so, delivering

tokens. By doing so, delivering incredible performance, 10 times the performance, inc delivering 10 times lower cost, that virtual cycle can

continue. Which then brings me to this

continue. Which then brings me to this one. I just saw this literally

one. I just saw this literally yesterday. This is uh the CSP capex.

yesterday. This is uh the CSP capex.

People are asking me about capex these days and this is a good way to look at it. In fact, the capex of the top six

it. In fact, the capex of the top six CSPs and this one, this one is Amazon, Core Weave, Google, Meta, Microsoft, and Oracle. Okay, these CSPs together

Oracle. Okay, these CSPs together are going to invest this much in capex.

And I would I would tell you the timing couldn't be better. And the reason for that is now we have the Grace Blackwell NVLink72 in all volume production, supply chain, everywhere in the world is

manufacturing it. So we can now deliver

manufacturing it. So we can now deliver to all of them this new architecture so that the capex invests in instruments

computers that deliver the best TCO. Now

underneath this there are two things that are going on. So when you look at this it's actually fairly extraordinary and it's fairly extraordinary anyhow.

But what's happening under underneath is this there are two platform shifts happening at the same time.

One platform shift is going from general purpose computing to accelerated computing. Remember accelerated

computing. Remember accelerated computing as I mentioned to you before it does data processing it does image processing computer graphics it does computation of all kinds. It runs SQL

runs spark it runs you know you you ask it you tell us what you need to have run and I'm fairly certain we have an amazing library for you. You could be you know a data center trying to make

masks to manufacture semiconductors. we

have a great library for you. And so

underneath irrespective of AI, the world is moving from general purpose computing to accelerated computing irrespective of AI. And in fact, many of the CSPs

AI. And in fact, many of the CSPs already have services that have been here long ago before AI. Remember, they

were invented in the era of machine learning. classical machine learning

learning. classical machine learning algorithms like XG Boost, algorithms like um uh data frames that are used for recommener systems, collaborative

filtering, content filtering, all of those technologies were created in the old days of general purpose computing.

Even those algorithms, even those architectures are now better with accelerated computing. And so even

accelerated computing. And so even without AI, the world's CSPs are going to invest into acceleration. Nvidia's

GPU is the only GPU that can do all of that plus AI. And ASIC might be able to do AI, but it can't do any of the others.

NVIDIA could do all of that, which explains why it is so safe to just lean into NVIDIA's architecture. We have now

reached our virtual cycle, our inflection point. And this is quite

inflection point. And this is quite extraordinary. I have many partners in

extraordinary. I have many partners in the room and all of you are part of our supply chain and I know how hard you guys are working. I want to thank all of

you how hard you are working and thank you very much.

Now I'm going to show you why this is what's going on in our company's business. We're seeing extraordinary

business. We're seeing extraordinary growth for Grace Blackwell for all the reasons that I just mentioned. It's

driven by two exponentials. We now have visibility.

I think we're probably the first technology company in history to have visibility into half a trillion dollars of cumulative blackwell and early ramps

of Reubin through 2026. And as you know, 2025 is

through 2026. And as you know, 2025 is not over yet and 2026 hasn't started.

This is how much business is on the books. Half a trillion dollars worth so

books. Half a trillion dollars worth so far. Now, this is out of that. We've

far. Now, this is out of that. We've

already shipped 6 million of the Blackwells in the first several quarters. I guess the first four

quarters. I guess the first four quarters of production, three and a half quarters of production. We still have one more quarter to go for 2025. And

then we have four quarters. So the next five quarters there's $500 million $500 billion half a trillion dollars. That's

five times the growth rate of Hopper.

That kind of tells you something. This

is Hopper's entire life. This doesn't

include China and and um and Asia. So

this is just uh the West. Okay. This is

just uh we're excluding China. So Hopper

in its entire life 4 million GPUs.

Blackwell. Each one of the Blackwells has two GPUs in it in one large package.

20 million GPUs of Blackwells in the early parts of Reuben. Incredible

growth. So, I want to thank all of our supply chain partners. Everybody, I know how hard you guys are working. I made a video to celebrate your work. Let's play

it.

The age of AI has begun.

Blackwell is its engine, an engineering marvel.

In Arizona, it starts as a blank silicon wafer.

Hundreds of chip processing and ultraviolet lithography steps build up each of the 200 billion transistors

layer by layer on a 12in wafer. In

Indiana, HBM stacks will be assembled in parallel. HBM memory dies with 1,024

parallel. HBM memory dies with 1,024 IO's are fabricated using advanced EUV technology through silicon via is used

in the back end to connect 12 stacks of HBM memory and base dye to produce HBM.

Meanwhile, the wafer is scribed into individual Blackwell die, tested and sorted, separating the good dyes to move forward. The chip on wafer on substrate

forward. The chip on wafer on substrate process attaches 32 Blackwell dyes and 128 HPM stacks on a custom silicon

interposer wafer.

Metal interconnect traces are etched directly into it, connecting Blackwell GPUs and HBM stacks into each system and package unit, locking everything into

place. Then the assembly is baked,

place. Then the assembly is baked, molded, and cured, creating the GB300 Blackwell Ultra Super Chip. In Texas,

robots will work around the clock to pick and place over 10,000 components onto the Grace Blackwell PCB.

In California, Connect X8 Supernix for scaleout communications and Bluefield 3 DPUs for offloading and accelerating networking, storage, and security are

carefully assembled into GB300 compute trays.

[Music] MVLink is the breakthrough high-speed link that Nvidia invented to connect multiple GPUs and scale up into a

massive virtual GPU.

The MVLink switch tray is constructed with MVLink switch chips providing 14.4 terabytes per second of all to all

bandwidth. MVLink spines form a custom

bandwidth. MVLink spines form a custom blindmated back plane with 5,000 copper cables connecting all 72 black wells or

144 GPU dies into one giant GPU delivering 130 terabytes per second of all to-all bandwidth. Nearly the global internet's peak traffic.

Skilled technicians assemble each of these parts into a rack scale AI supercomput.

[Music] In total, 1.2 million components, 2 m of copper cable, 130 trillion transistors, weighing

nearly 2 tons.

From silicon in Arizona and Indiana to systems in Texas, Blackwell and future Nvidia AI factory generations will be built in America,

writing a new chapter in American history and industry.

America's return to making and reindustrialization, reignited by the age of AI.

The age of AI has begun.

Made in America.

Made for the world.

We are manufacturing in America again.

It is incredible. The first thing that President Trump asked me for is bring manufacturing back. Bring manufacturing

manufacturing back. Bring manufacturing back because it's it's necessary for national security. bring manufacturing

national security. bring manufacturing back because we want the jobs and we want that part of the economy. And nine

months later, nine months later, we are now manufacturing in full production Blackwell in Arizona.

Extreme Blackwell GB 200 MV Grace Blackwell Envy 72 extreme code design gives us 10x generationally. It's

utterly incredible. Now, the part that's really incredible is this. This is the first AI supercomputer we made. This is

in 2016 when I delivered it to a startup in San Francisco which turned out to have been open AAI. This was the computer. And in order to do the create

computer. And in order to do the create that computer, we designed one chip.

We designed one new chip in order for us to do code design. Now, look at all of the chips we have to do. This is what it takes. You're not going to take one chip

takes. You're not going to take one chip and make a computer 10 times faster.

That's not going to happen. The way to make computers 10 times faster that we can keep increasing the performance exponentially, we can keep driving cost down exponentially is extreme code

design and working on all these different chips at the same time. We now

have Ruben back home. This is Ruben.

This is the Vera Rubin and and uh Ruben.

Ladies and gentlemen, Ruben This is this is our third generation NVLink 72 rack scale computer. Third

generation GB200 was the first one. All

of our partners around the world, I know how hard you guys worked. It was

insanely hard. It was insanely hard to do. Second generation, so much smoother.

do. Second generation, so much smoother.

And this generation, look at this.

Completely cableless.

completely cableless. And this is this is all back in the lab now. This is the next generation Reuben. While we're

shipping GB300's, uh we're preparing Reuben to be in production. You know, this time next

production. You know, this time next year, maybe slightly earlier. And so,

every single year, we are going to come up with the most extreme code design system so that we can keep driving up performance and keep driving down the token generation cost. Look at this.

This is just an incredibly beautiful computer. Now,

computer. Now, so this is amazing. This is 100 pedlops.

I know it's doesn't mean anything. 100

pedlops. But

compared to the DGX1 I delivered to OpenAI 10 years ago, nine years ago, it's 100 times the performance right

here versus 100 times of that supercomput. 100 times a 100 of those,

supercomput. 100 times a 100 of those, let's see, a 100 of those would be like 25 of these racks all replaced by this one thing.

One Vera Rubin. Okay. So this is this is the

Vera Rubin. Okay. So this is this is the compute tray and this is so Vera Rubin super chip.

Okay. And this is the compute tray. This

Oh right here.

It's incredibly easy to install. Just

flip these things open, shove it in.

Even I could do it. Okay. And this is the ver Vera Rubin compute tray. If you

decide you wanted to add a special processor, we've added another processor. It's called a context

processor. It's called a context processor because the amount of context that we give AIS are larger and larger.

We wanted to read a whole bunch of PDFs before it answer a question. Wanted to

read a whole bunch of archive papers, watch a whole bunch of videos. Go learn

all this before you answer a question for me. All of that context processing

for me. All of that context processing could be added. And so you see on the bottom eight connectx9

new super nicks you have CX you have uh CPXs eight of them you have uh blue

field 4 this new data processor two Vera CPUs and four Reuben packages or eight

Reuben GPUs all of that in this one node completely cableless.

100% liquid cooled. And then this new processor, I won't talk too much about it today. I don't have enough time, but

it today. I don't have enough time, but this is completely revolutionary. And

the reason for that is because your AIs need to have more and more memory.

You're interacting with it more. You

wanted to remember our last conversation. Everything that you've

conversation. Everything that you've learned on my behalf, please don't forget it when I come back next time.

And so all of that memory is going to create this thing called KV caching. and

that KV caching retrieving it. You might

have noticed every time you go into your your your AIS these days, it takes longer and longer to refresh and retrieve all of the previous conversations and and the reason for

that is we need a revolutionary new processor and that's called Blue Fuel 4.

Next is this the ConnectX switch, excuse me, the MVLink switch which is right here.

Okay, this is the MVLink switch. This is

what makes it possible for us to con connect all of the computers together.

And this switch is now several times the bandwidth of the entire world's peak internet traffic. And so that spine is

internet traffic. And so that spine is going to communicate and carry all of that data simultaneously to all of the GPUs. On top of that, on top of that,

GPUs. On top of that, on top of that, this is the this is the Spectrum X switch. And this Ethernet switch was

switch. And this Ethernet switch was designed so that all of the processors could talk to each other at the same time and not gum up the network. Gum up

the network. That's very technical.

Okay. So um so these are the these three combined. And then this is the quantum

combined. And then this is the quantum switch. This is for Infiniband. This is

switch. This is for Infiniband. This is

Ethernet. We don't care what language you would like to use. Whatever standard

you like to use, we have great scale out fabrics for you. whether it's infiniband or quantum or spectrum Ethernet. This

one uses silicon photonics and is completely co-ackaged options.

Basically, the laser comes right up to the silicon and connects it to our chips. Okay, so this is the spectrum X

chips. Okay, so this is the spectrum X Ethernet. And so now let's talk about

Ethernet. And so now let's talk about Thank you. Oh, this is this is what it

Thank you. Oh, this is this is what it looks like. This is a rack. This is two

looks like. This is a rack. This is two and a half. This is two uh 2000. This is

two tons, 1.5 million parts.

And the spine, this spine right here carries the entire internet traffic in one second. Same speed moves across all

one second. Same speed moves across all of these different processors. 100%

liquid cooled all for the, you know, fastest token generation rate in the world. Okay, so that's what a rack looks

world. Okay, so that's what a rack looks like. Now that's one rack. A gigawatt

like. Now that's one rack. A gigawatt

data center would have you know call it let's see 16 racks would be a thousand

um and then 500 of those. So whatever

500 time 16 and so call it 9,000 of these 8,000 of these would be a 1 gawatt data center. Okay and so that's a future

data center. Okay and so that's a future AI factory. Now we used as you notice

AI factory. Now we used as you notice Nvidia started out by designing chips and then we started to design systems and we designed AI supercomputers. Now

we're designing entire AI factories every single time we move out and we integrate more of the problem to solve.

We come up with better solutions. We now

build entire AI factories.

This is going this AI factory is what we're building for Vera Rubin and we created a technology that makes it possible for all of our partners to

integrate into this factory digitally.

Let me show it to you.

[Music] The next industrial revolution is here

and with it a new kind of factory.

AI infrastructure is an ecosystem scale challenge requiring hundreds of companies to collaborate.

NVIDIA Omniverse DSX is a blueprint for building and operating gigascale AI factories.

For the first time, the building, power, and cooling are co-designed with NVIDIA's AI infrastructure stack.

It starts in the Omniverse digital twin.

Jacob's engineering optimizes compute density and layout to maximize token generation according to power constraints.

They aggregate SIM ready open USD assets from Seammen's Schneider Electric Train and Verta into PTC's product life cycle management.

Then simulate thermals and electricals with CUDA accelerated tools from EAB and Cadence.

Once designed, Nvidia partners like Bectal and Vertive deliver pre-fabricated modules factory-built,

tested, and ready to plug in. This

shrinks build time significantly, achieving faster time to revenues.

When the physical AI factory comes online, the digital twin acts as an operating system.

Engineers prompt AI agents from FIDRA and Emerald AI, previously trained in the digital twin to optimize power consumption and reduce strain on both

the AI factory and the grid.

In total, for a 1 gawatt AI factory, DSX optimizations can deliver billions of dollars in additional revenue per year

across Texas, Georgia, and Nevada.

NVIDIA's partners are bringing DSX to life. In Virginia, Nvidia is building an

life. In Virginia, Nvidia is building an AI factory research center using DSX to test and productize Vera Rubin from

infrastructure to software.

With DSX, NVIDIA partners around the world can build and bring up AI infrastructure faster than ever.

completely completely in digital long long before Vera Rubin exists as a real computer we've been using it as a digital twin computer now long before

these AI factories exist we will use it we will design it we'll plan it we'll optimize it and we'll operate it as a digital twin and so all of our partners that are working with us I'm incredibly

happy for all of you supporting us And Gio is here and G ver Vernova is here.

Schneider I I think um I think uh uh Olivia is here. Olivia Blum is here. Um

uh uh Seaman's incredible partners.

Okay. Roland Bush, I think he's watching. Hi Roland. And so anyways, uh

watching. Hi Roland. And so anyways, uh really really great partners working with us.

In the beginning we had CUDA and we have all these different ecosystems of software partners. Now we have Omniverse

software partners. Now we have Omniverse DSX and we're building AI factories and again we have these incredible ecosystem of partners working with us. Let's talk

about models.

Open source models particularly in the last couple years several things have happened. One, open source models have

happened. One, open source models have become quite capable because of reasoning capabilities. It has become

reasoning capabilities. It has become quite capable because they're multimodality and they're incredibly efficient because of distillation. So

all these different capabilities have become uh has made open source models for the very first time incredibly useful for developers. They are now the

lifeblood of startups.

Lifeblood of startups in different industries because obviously as I mentioned before each one of the industries have its own use case, its own use cases, it own data, it owns

data, its own flywheels. All of that capability, that domain expertise needs to have the ability to embed into a model. Open source makes that possible.

model. Open source makes that possible.

Researchers need open-source. Developers

need open-source. Companies around the world, we need open source. Open- source

models is really, really important.

The United States has to lead in open source as well. We have amazing proprietary models. We have amazing

proprietary models. We have amazing proprietary models. We need also amazing

proprietary models. We need also amazing open source models. Our country depends on it. Our startups depend on it. And so

on it. Our startups depend on it. And so

Nvidia is dedicating ourselves to go do that. We are now the largest the largest

that. We are now the largest the largest we lead in open-source contribution. We

have 23 models in leaderboards. We have

all these different domains from language models the physical AI models.

I'm going to talk about robotics models to biolog biology models. Each one of these models has enor enormous teams and that's one of the reasons why we built

supercomputers for ourselves to enable all these models to be created. We have

number one speech model, number one reasoning model, number one physical AI model. The number of downloads is really

model. The number of downloads is really really terrific. We are dedicated to

really terrific. We are dedicated to this and the reason for that is because science needs it, researchers need it, startups need it and companies need it.

I'm delighted that AI startups build on Nvidia. They do so for several reasons.

Nvidia. They do so for several reasons.

One, of course, our ecosystem is rich.

Our tools work great. All of our tools work on all of our GPUs. Our GPUs are everywhere. It's literally in every

everywhere. It's literally in every single cloud. It's available on prem.

single cloud. It's available on prem.

You could build it yourself. You could

you could, you know, build up a an enthusiast gaming PC with multiple GPUs in it and you could download our software stack and it it just works. We

have the benefit of rich developers who are making this ecosystem richer and richer and richer. So, I'm really pleased with all of the startups that we're working with. I'm I'm thankful for that. It is also the case that many of

that. It is also the case that many of these startups are now starting to create even more ways to enjoy our GPUs.

the Cordweaves, Nscale, Nbius, Llama, Lambda, all of these companies, Crusoe companies are building these new GPU clouds to serve the startups and I

really appreciate that this is all possible because NVIDIA is everywhere.

We integrate our libraries, all of the CUDA X libraries I talked to you about, all the open-source AI models that I talked about, all of the models that I talked about, we integrated into AWS,

for example. really love working with

for example. really love working with Matt. We integrated into Google Cloud,

Matt. We integrated into Google Cloud, for example. Really love working with

for example. Really love working with Thomas. Each one of these clouds

Thomas. Each one of these clouds integrate NVIDIA GPUs and our computing, our libraries as well as our models.

Love working with Satia over at Microsoft Azure. Love working with uh

Microsoft Azure. Love working with uh Clay at Oracle. Each one of these clouds integrate the NVIDIA stack. As a result,

wherever you go, whichever cloud you use, it works incredibly. We also

integrate Nvidia libraries into the world's SAS so that each one of these SAS will eventually become agentic SAS.

I love Bill McDormer's vision for Service Now. There Yeah, there you go. I

Service Now. There Yeah, there you go. I

think that might have been Bill.

Hi, Bill. And so, Service Now, what is it? 85% of the world's enterprise

it? 85% of the world's enterprise workloads workflows SAP 80% of the world's commerce Christian Klein and I are working together to integrate NVIDIA

libraries CUDA X and Nemo and Neotron all of our AI systems into SAP working with Cine over at Synopsis accelerating

the world CAE CAD EDA tools so that they could be faster and could scale helping them create AI agents one of these days I would love to hire a AI agent ASICH

designers to work with our ASIC designers essentially the cursor of Synopsis if you will. We're working with uh Annie Rude. Annie Rude here I saw him earlier today. He was part of the

earlier today. He was part of the pregame show. Cadence doing incredible

pregame show. Cadence doing incredible work accelerating their stack creating AI agents so that we can have cadence AI as designers and system designers

working with us. Today we're announcing a new one. AI

will supercharge productivity. AI will

transform just about every industry, but AI will also supercharge cyber security challenges, the bad AIs.

And so we need an incredible defender. I

can't imagine a better defender than Crowd Strike. George George is here. Uh

Crowd Strike. George George is here. Uh

he was here. Yeah, I saw him earlier.

We are partnering with Crowdstrike to make cyber security speed of light to create a system that has cyber security AI agents in the cloud but also

incredibly good AI agents on prem or at the edge. This way you whenever there's

the edge. This way you whenever there's a threat you are moments away from detecting it. We need speed and we need

detecting it. We need speed and we need a fast agentic AI super a super smart AIs.

I have a second announcement. This is

the single fastest enterprise enterprise company in the world.

Probably the single most important enterprise stack in the world today.

Palunteer ontology.

Anybody from Palunteer here? I just I was just talking to Alex earlier.

This is Palunteer ontology. They take

information, they take data, they take human judgment and they turn it into business insight.

We work with Palanteer to accelerate everything Palanteer does so that we could do data processing data processing at a much much larger scale and more

speed whether it's structured data of the past and of course we'll have structured data human recorded data unstructured data and process that data

for our government for national security and for enterprises around the world process that data at speed of and to find insight from it. This is what it's going to look like in the future.

Palunteer is going to integrate NVIDIA so that we could process at the speed of light and at extraordinary scale.

Okay, Nvidia and Palunteer.

Let's talk about physical AI. Physical

AI requires three computers. Just as it takes two computers to train a language model, one that's to train it, evaluate

it, and then inference it. Okay, so

that's the large GB200 that you see. In

order to do it for physical AI, you need three computers. You need the computer

three computers. You need the computer to train it. This is GB the Grace Blackwell Envy 72. We need a computer that does all of the simulations that I

showed you earlier with Omniverse DSX.

It basically is a digital twin for the robot to learn how to be a good robot and for the factory to essentially be a digital twin. That computer is the

digital twin. That computer is the second computer, the omniverse computer.

This computer has to be incredibly good at generative AI and it has to be good at computer graphics, sensor simulation, ray tracing, signal processing, this

computer is called the omniverse computer. And once we train the model,

computer. And once we train the model, simulate that AI inside a digital twin and that digital twin could be a digital twin of a factory as long as well as a

whole bunch of digital twins of robots.

Then you need to operate that robot. And

this is the robotic computer. This is

this one goes into a self-driving car.

Half of it could go into a robot. Okay?

Or you could actually have, you know, robots that are quite agile and quite quite fast in operations. And it might take two of these computers. And so this is the Thor Jetson Thor robotics

computer. These three computers all run

computer. These three computers all run CUDA. And it makes it possible for us to

CUDA. And it makes it possible for us to advance physical AI. AI that understand the physical world, understand laws of physic causality

permanence, you know, physical AI. We have incredible partners

physical AI. We have incredible partners working with us to create the physical AI of factories. We're using it ourselves to create our factory in Texas. Now, once we create the robotic

Texas. Now, once we create the robotic factory, we have a bunch of robots that are inside it. And these robots also need the physical AI applies physical AI

and works inside the digital twin. Let's

take a look at it.

America is re-industrializing reshoring manufacturing across every industry. In Houston, Texas, Foxcon is

industry. In Houston, Texas, Foxcon is building a state-of-the-art robotic facility for manufacturing NVIDIA AI infrastructure systems.

With labor shortages and skills gaps, digitalization, robotics, and physical AI are more important than ever,

the factory is born digital in Omniverse.

Foxcon engineers assemble their virtual factory in a seaman's digital twin solution developed on Omniverse Technologies. Every system, mechanical,

Technologies. Every system, mechanical, electrical, plumbing, is validated before construction.

Seaman's plant simulation runs design space exploration optimizations to identify ideal layout.

When a bottleneck appears, engineers update the layout with changes managed by Seaman's team center.

In Isaac sim, the same digital twin is used to train and simulate robot AIs.

In the assembly area, fanic manipulators build GB300 tray modules by manual manipulators from FII and

Skilled AI install bus bars into the trays and AMRs shuttle the trays to the test pods.

Then Foxcon uses Omniverse for large-scale sensor simulation where robot AIs learn to work as a fleet.

In Omniverse, vision AI agents built on NVIDIA Metropolis and Cosmos.

Watch the fleets of robots and workers from above to monitor operations and alert Foxcon engineers of anomalies and safety violations.

[Music] or even quality issues.

And to train new employees, agents power interactive AI coaches for easy worker on boarding.

[Music] The age of US re-industrialization is here with people and robots working together.

[Music] That's the the future of manufacturing, the future of factories. I want to thank our partner Foxcon Younglu, the CEO, is here, but all of these ecosystem

partners make it possible for us to create the future of robotic factories.

The factory is essentially a robot that's orchestrating robots to build things that are robotic. You

know this is the amount of software necessary to do this is so intense that unless you could do it inside a digital twin to dis to plan it to design it to

operate inside a digital twin the hopes of getting this to work is nearly impossible. I'm so happy to see also

impossible. I'm so happy to see also that Caterpillar, my friend Joe Joe Creed and his hundred-year-old company

is also incorporating digital twins and the way they manufacture. Um, these

factories will have future robotic systems and one of the most advanced is figure. Brett Atcock is here today. He

figure. Brett Atcock is here today. He

just he founded a company three and a half years ago. They're worth almost $40 billion. Today we're working together in

billion. Today we're working together in training the the AI, training the robot, simulating the robot and of course the robotic computer that goes into figure

really amazing. Uh I had the benefit of

really amazing. Uh I had the benefit of seeing it. Uh it's really quite quite

seeing it. Uh it's really quite quite extraordinary. It is very likely that

extraordinary. It is very likely that humano robots and uh my friend Elon is also working on this that this is likely going to be one of the largest consumer

new consumer electronics markets and surely one of the largest industrial equipment market. Peggy Johnson and the

equipment market. Peggy Johnson and the folks at Agility are working with us on robots for warehouse automation. The

folks at Johnson Johnson working with us again training the robot, simulating it in digital twins and also operating the robot. These John Johnson and Johnson

robot. These John Johnson and Johnson surgical robots are even going to perform surgery that are completely non-inv non-invasive surgery at a precision the world's never seen before.

And of course, the cutest robot ever, the cutest robot ever, the Disney robot.

And this is this is um something really close to our heart. We're working with Disney research on a entirely new framework and sim simulation platform uh

based on revolutionary technology called Newton. And that Newton uh simulator

Newton. And that Newton uh simulator makes it possible for the the robot to learn how to be a good robot inside a physically aware, physically based

environment. Let's take a look at it.

environment. Let's take a look at it.

[Music]

[Music]

Okay.

[Music] Blue. Ladies and gentlemen, Disney Blue.

Blue. Ladies and gentlemen, Disney Blue.

Tell me that's not adorable. He's not

adorable.

We all want one. We all want one. Now,

remember everything you were just seeing, that is not animation. It's not

a movie. It's a simulation. That

simulation is an omniverse. Omniverse,

the digital twin. So these digital twins of factories, digital twins of warehouses, digital twins of surgical rooms, digital twins where blue could

learn how to manipulate and navigate and you know interact with the world. All

completely done in real time. This is

going to be the largest consumer electronics product line in the world.

Some of them are just really working incredibly well now. This is the future of human or robotics and of course blue.

Okay.

Now, human robots is still in development. But meanwhile,

development. But meanwhile, there's one robot that is clearly at an inflection point and it is basically

here. And that is a robot on wheels.

here. And that is a robot on wheels.

This is a robo taxi. A robo taxi is essentially an AI chauffeur. Now, one of the things that we're doing today, we're announcing the NVIDIA drive Hyperion.

This is a big deal.

We created this architecture so that every car company in the world could create cars, vehicles could be commercial, could be passenger, could be

dedicated to robo taxi. Create vehicles

that are robo taxi ready. The sensor

suite with surround cameras and radars and LAR make it possible for us to achieve the highest level of surround cocoon sensor

perception and redundancy necessary for the highest level of safety.

Hyperion drive drive Hyperion is now designed into Lucid Mercedes-Benz my friend Ola Ken Canel Kenas um the folks at Stalantis and there are many other

cars coming and once you have a basic standard platform then developers of AV systems and there's so many talented

ones wave wabby Aurora Momenta Neuro there's so many of them we ride there's so many of them that can then take their AV V system and run it on the standard

chassis. Basically, the standard chassis

chassis. Basically, the standard chassis has now become a computing platform on wheels. And because it's standard and

wheels. And because it's standard and the sensor suite is comprehensive, all of them could deploy their AI to it.

Let's take a quick look.

[Music] Okay, that's the be that's beautiful San Francisco. And as you could see, as you

Francisco. And as you could see, as you could see, robo taxis inflection point is about to get here. And in the future, a trillion miles a year that are driven,

a 100 million cars made each year.

There's some 50 million taxis around the world. It's going to be augmented by a

world. It's going to be augmented by a whole bunch of robo taxis. So, it's

going to be a very large market to connect it and deploy it around the world. Today, we're announcing a

world. Today, we're announcing a partnership with Uber. Uber Dar Dar K Dara is going to we're working together

to connect these Nvidia drive Hyperion cars into a global network. And now in the future, you'll, you know, be able to hail up one of these cars and the

ecosystem is going to be incredibly rich and we'll have Hyperion or Roboaxi cars all over the world. This is going to be a new computing platform for us and I'm

expecting it to be quite successful.

Okay, so this is what we talked about today.

We talked about a large large number of things we spoke about. Remember at the core of this is two are two platform transitions from general purpose

computing to accelerated computing.

NVIDIA CUDA and those suite of libraries called CUDA X has enabled us to address practically every industry and we're at

the inflection point. It is now growing as a virtual cycle would suggest. The

second inflection point is now upon us.

The second platform transition AI from classical handwritten software to artificial intelligence. Two platform

artificial intelligence. Two platform transitioning happening at the same time which is the reason why we're feeling such incredible growth.

Quant quantum computing we spoke about open models. We spoke about we spoke

open models. We spoke about we spoke about enterprise with crowd strike and uh palunteer accelerating their platforms. Uh we spoke about robotics, a

new large potentially one of the largest consumer electronics and industrial manufacturing sectors. And of course we

manufacturing sectors. And of course we spoke about 6G. Nvidia has new platforms for 6G. We call it ARC. We have a new

for 6G. We call it ARC. We have a new platform for robotics cars. We call that Hyperion.

We have new platforms even for factories. Two types of factories. The AI factory. we call that

factories. The AI factory. we call that DSX. And then factories with AI, we call

DSX. And then factories with AI, we call that Mega. And so now we're also

that Mega. And so now we're also manufacturing in America. Ladies and

gentlemen, thank you for joining us today and thank you for allowing me to bring Thank Thank you for for allowing us to bring GTC to Washington DC. We're going

to do it hopefully every year. And thank

you all for your service and making America great again. Thank you.

[Music] We start with a handshake. Solid and

true. One step at a time, we're breaking through. Brick by brick, we're stacking

through. Brick by brick, we're stacking dreams high. Side by side, we'll touch

dreams high. Side by side, we'll touch the sky. Handshakes and high hopes we're

the sky. Handshakes and high hopes we're making our way. Shoulder to shoulder come what may shed vision brighter than

the sun.

Friendship and business rolling as one.

Plans on paper but hearts in sink.

Building together faster than you think.

Laughter's the glue in the grind we share. We've got the spark. We're going

share. We've got the spark. We're going

somewhere. Handshakes and high hopes.

We're making our way. Shoulder to

shoulder. Come what may. Shared vision

brighter than the sun.

Friendship and business.

Handshakes and high. We're making our way shoulder to come one way. Vision

brighter than the sun.

The business rolling as one.

[Music]

Loading...

Loading video analysis...