LongCut logo

How Nvidia GPUs Compare To Google’s And Amazon’s AI Chips

By CNBC

Summary

## Key takeaways - **Nvidia shipped 6M Blackwell GPUs**: Nvidia has shipped 6 million Blackwell GPUs over the last year, with systems connecting all 72 GPUs to act as a single GPU powering advanced AI workloads. [00:02], [00:32] - **ASICs gaining on GPUs for inference**: As models mature, inference needs outpace training, favoring smaller, cheaper custom ASICs over GPUs, expected to grow faster than the GPU market. [00:32], [05:04] - **Google TPUs birthed transformers**: Google's first TPU in 2015 helped lead to the invention of the transformer architecture in 2017, powering almost all modern AI. [06:21], [06:49] - **AWS Trainium 30-40% better price perf**: Trainium provides between 30 and 40% better price performance compared to other hardware vendors in AWS, serving both training and inference. [07:40], [08:09] - **Broadcom dominates ASIC design market**: Broadcom wins 70-80% of the custom ASIC design market for hyperscalers like Google, Meta, and OpenAI, accelerating at mid double-digit CAGR. [09:07], [09:32] - **Edge NPUs enable on-device privacy**: NPUs in phones and laptops run AI locally to preserve data privacy and provide responsive performance without cloud communication. [10:28], [11:48]

Topics Covered

  • GPUs Excel at Parallel AI Math
  • ASICs Trade Flexibility for Efficiency
  • Inference Drives Edge AI Shift
  • TSMC Monopolizes AI Chip Fabrication
  • Energy Bottleneck Threatens US AI Lead

Full Transcript

Nvidia graphics processing units like these latest Blackwell GPUs, are inside server racks all over the world. Nvidia has catapulted from gaming giant, to the very core of generative AI, training the models, running the workloads and sending Nvidia's valuation soaring. With 6 million Blackwell GPUs shipped over the last year. This connects all 72 GPUs, allowing the act as a single GPU, to power the most advanced AI workloads. GPUs are the general purpose workhorse stars of Nvidia

and its top competitor, AMD. But another big category of AI chip is gaining ground, the custom ASIC, application specific integrated circuit, now being designed by all the major hyperscalers, Google, Amazon Meta Microsoft OpenAI alongside Broadcom. These AI chips are smaller, cheaper, accessible and, as the name suggests, specifically built for one purpose. But we see that growing even faster than the GPU market over the next few years. AI chips also include FPGAs,

field programable gate arrays, and an entire group of chips used to power edge AI on device instead of in the cloud. In short, there's a lot of AI chips and the space is only getting more crowded. So let's take a pause to break it all down. The various categories of AI chips, what's different, good and bad about them each, and a brief look at all the major companies getting in on the AI chip making trend.

For a brief moment in October, Nvidia was the first company ever to reach a $5 trillion valuation, thanks to its original gaming engine, the GPU becoming king of AI chips. So what exactly optimizes a chip for AI? GPU, in this case, was really purpose built to deliver parallel programing. Because when you think about rendering an image or a scene, you need to calculate all those pixels at once. And so AI really lends itself to taking advantage of that capability.

To understand how GPUs became synonymous with AI. Let's go back to 2012 to what many consider AI's big bang moment, AlexNet. It was a new, incredibly accurate neural network that obliterated the competition during a prominent image recognition contest. Turns out, the same parallel processing used by Nvidia GPUs to render lifelike graphics, is also great for training neural networks, where a computer learns from data rather than relying on a programmer's code.

The researchers took a GPU and said, I'm going to hack it to get it to expose the parallel computation capabilities to unlock that performance for the deep learning use case. While AI workloads are usually accelerated on a GPU or custom ASIC, it still often needs a host central processing unit CPU like the Grace in Nvidia's Grace-Blackwell server rack system. But while a CPU has a small number of powerful cores running sequential general purpose tasks,

GPUs have thousands of smaller cores more narrowly focused on parallel math, like matrix multiplication, used to process multidimensional data structures known as tensors. This ability to perform many operations simultaneously makes GPUs ideal for the two main phases of AI computation, training and inference. Training is teaching the AI model to learn from patterns in large amounts of data, while inference uses the AI to make decisions based on new information.

The way AI shows up in everyday applications. Whether it's the way you buy your coffee on your Starbucks app or how you interact with Salesforce for work, or speaking to us through our EarPods, that is being done through inference. Nvidia sells its GPUs directly to AI companies, like in a recent deal, to sell at least 4 million to OpenAI and to foreign governments including South Korea, Saudi Arabia and the U.K. It also sells GPUs to cloud providers like Amazon,

Microsoft and Google, who go on to rent them out by the hour, minute or second. Nvidia also has a GPU rental program. Today, systems that can handle AI workloads are in such high demand, one of these 72 Blackwell server racks sells for around $3 million. And Nvidia told us, it's shipping 1000 each week. And for Nvidia, they're looking to sell the entire system, not just the chip. Because when you think of that system level, you get more efficiencies in terms of speed and power and

performance. Of course, Nvidia isn't the only one making rack scale GPU systems optimized for AI. Its top competitor, Advanced Micro Devices, AMD, has seen major gains with its instinct GPU line and major commitments from OpenAI and Oracle. A big difference in AMD GPUs is that they use a largely open source software ecosystem, while Nvidia GPUs are tightly optimized around CUDA, Nvidia's proprietary software platform, Nvidia says its next generation Rubin GPU will be

performance. Of course, Nvidia isn't the only one making rack scale GPU systems optimized for AI. Its top competitor, Advanced Micro Devices, AMD, has seen major gains with its instinct GPU line and major commitments from OpenAI and Oracle. A big difference in AMD GPUs is that they use a largely open source software ecosystem, while Nvidia GPUs are tightly optimized around CUDA, Nvidia's proprietary software platform, Nvidia says its next generation Rubin GPU will be

in full production next year. This is just an incredibly beautiful computer.

In the early boom days of large language models, compute intensive training has been key. A perfect job for GPUs. But as the models mature, more and more inference is needed. Now, Post-training techniques have been introduced to make these models very capable. Now, to extract the value you need to go to inference. And so that's how people actually use AI is through inference. Inference can happen on less powerful chips programed for more specific tasks.

Enter custom ASICs, application specific integrated circuits, and every major cloud provider designing their own. While a GPU is like a swiss army knife, able to do many kinds of parallel math for different AI workloads, think of an ASIC like a single-purpose tool. Very efficient and fast, but hard wired to do the exact math for one type of job. They can be much, much more efficient in running those workloads, but when you specialize

them, you can't change them once they're already carved into silicon. And so there's a trade off in terms of flexibility. While Nvidia GPUs cost upwards of $40,000 and can be hard to get, start ups rely on them because it's even more costly to make your own custom ASIC for AI. It is hugely expensive, costs minimum tens, and often hundreds of millions of dollars. But for the biggest cloud providers who can't afford them, custom Asics pay off because they're more power

efficient and reduce reliance on Nvidia. They want to bring cost of AI down. They want to have a little bit more control over the workloads that they build. But at the same time, they're going to continue to work very closely with Nvidia, with AMD, because they also need the capacity. Google was the first to make a custom ASIC for AI acceleration, coining the term tensor processing unit when its first ASIC came out in 2015. The TPU also helped lead to the invention of the

transformer at Google in 2017, the architecture powering almost all modern AI. We got a tour at Google's Chip lab in 2024. There's actually four chips inside there. It's connected to, actually, two of those are connected to a host machine that has CPUs in it. And then all these colorful cables are actually linking together all of the Trillium chips to work as one large supercomputer.

A decade after the first TPU, Google just released Ironwood, its seventh generation, and with it a big deal with Anthropic, to train its LLM Claude on up to a million TPUs. Some people even think that they're technically on par or even superior to Nvidia, but traditionally Google has only used them for in-house purposes. There's a lot of speculation that in the longer run, Google might open up access to TPUs more broadly. Amazon Web Services was the next cloud provider to

design its own AI chips, after acquiring Israeli chip startup Annapurna Labs in 2015, where Ron Diamont has worked since day one. AWS announced Inferentia in 2018 and launched Trainium in 2022, now approaching its third generation. Think of Trainium like a cluster of workshops with many small, flexible tensor engines, while Google's TPU is like one big factory conveyor belt, with one big rigid grid specialized for matrix math. The difference between the names is quite telling, but

over time we've seen that Trainium chips can serve both inference and training workloads quite well. On average, we see that Trainium provides between 30 and 40% better price performance compared to other hardware vendors in AWS. In October, we went to Northern Indiana for the first on-camera tour of Amazon's biggest AI data center, where Anthropic is training its models on half-a-million Trainium2 chips. Behind me right now, they are working on delivering these towers.

Two of them together is going to make up one ultra-server. And what's especially interesting is there are no Nvidia GPUs in there. Although AWS is filling its other data centers with lots of Nvidia GPUs to feed hungry AI customers like OpenAI. The investment they're making on our platform is still significant and significantly more than you're seeing even in some of the upcoming ASIC deployments. And we can't talk about custom ASICs without mentioning Broadcom and its top competitor,

Marvell, which act as backend partners so clients don't have to hire full silicon teams to handle everything in-house. All of the big hyperscalers, actually, that have ASIC programs partner with at least one chip design company, Broadcom, being the most important, which helps provide the IP and the know-how and the networking. And so you've seen Broadcom in particular be one of the biggest beneficiaries of the AI boom. Broadcom helped build Google's TPUs,

Meta's training and inference accelerator, launched in 2023 and now, a huge new deal to help OpenAI build its own custom ASICs starting in 2026. We see Broadcom winning 70, maybe even 80% of this market. We see this market accelerating, you know, at a mid double digit CAGR over the next five years. Microsoft is also getting into the ASIC game, hoping to primarily use its own Maia chips in its Azure data centers, although little has been

revealed since plans were announced in 2023 and its next Maia chip is facing delays. Intel has its own Gaudi line of custom ASICs for AI. Tesla has announced an ASIC, and Qualcomm is breaking into data center chips with the AI200. Of course, there's also a slew of startups going all in on custom AI chips, too. Like Cerebras, making huge full way for AI chips, and Grok with inference focused language processing units.

The final big category of AI chips are those made for running edge AI, on device, instead of in the cloud. You might want to be able to run an AI model locally on your personal device, even a smartphone, and so there may be as part of the processor in that chip, a module that has AI processing capability. So you don't have to have communication all the way back to a data center and you can preserve privacy of your data on your phone. In NPU, Neural Processing Unit is a major edge AI

chip, a dedicated AI accelerator that's integrated into a phone or laptop's primary chip, what's called an SoC. A system on a chip, which is kind of a chip that has, like many different modules in it, that do different aspects of the computation. Because obviously in a phone, you need to have a very compressed system that is able to do lots of things. And so it's not kind of a separately packaged AI chip. They take up less silicon and therefore they cost

less, often substantially less than a big data center chip would. NPUs enabling AI and PCs are primarily made by Qualcomm, Intel, and AMD. For MacBooks, apple's in-house M-series chips include a dedicated neural engine, too, although Apple doesn't use the term NPU. Similar neural accelerators, tiny processors dedicated to AI math, are now built into the latest iPhone A-series chips, too, which we got to see in September. Why is on-device AI so important?

We know that when we can do things on device, we are able to manage people's privacy in the best way. The other thing about it is, it is efficient for us, it is responsive. We know that we are much more in control over the experience.

The latest Android phones also have NPUs for AI built into their primary Qualcomm Snapdragon chips. And Samsung has its own NPU on its Galaxy phones, too. NPUs also power AI in cars, robots, cameras, smart home devices, and more made by companies like NXP and Nvidia. And so right now, most of the attention, most of the dollars, are going towards the data center. But over time that's going to change because we'll have AI deployed in our phones and our cars and

on wearables, all sorts of other applications to a much greater degree than today. Then there's FPGAs, field programable gate arrays, which can be used in data centers or embedded on devices. These chips can be reconfigured with software after they're made for use in all sorts of applications, like signal processing, networking and yes AI. Although far more flexible than NPUs or ASICs, FPGAs have lower raw performance and lower energy efficiency for AI workloads.

You would choose an FPGA because you didn't want to have to design your own ASIC. But if you're going to operate thousands and thousands of them, ASICs are cheaper. AMD became the largest FPGA maker after acquiring Xilinx for $49 billion in 2022, with Intel in second thanks to its $16.7 billion purchase of Altera in 2015.

Now that we've laid out all the types of AI chips and the biggest players designing them all, it's important to talk about where they're actually manufactured, because the giants like Nvidia, Google and Amazon almost all rely on a single company to make them all, Taiwan Semiconductor Manufacturing Company. Saif Kah was an AI and semiconductor policy adviser for the Biden administration. They were largely manufactured in Taiwan, and that was kind of a new geopolitical issue in the

semiconductor industry. Fast forward through the chips act to today, and TSMC has a giant new chip fabrication plant in Arizona, where we got the first on camera tour in December. Apple has committed to moving some chip production to TSMC Arizona, although its latest iPhone A19 Pro chip is made on TSMC three nanometer, currently only possible in Taiwan. Nvidia's Blackwell, however, is made with TSMC's four nanometer node. Which means, we are now manufacturing in full

semiconductor industry. Fast forward through the chips act to today, and TSMC has a giant new chip fabrication plant in Arizona, where we got the first on camera tour in December. Apple has committed to moving some chip production to TSMC Arizona, although its latest iPhone A19 Pro chip is made on TSMC three nanometer, currently only possible in Taiwan. Nvidia's Blackwell, however, is made with TSMC's four nanometer node. Which means, we are now manufacturing in full

production. Blackwell in Arizona. With help from a major investment from the US government, Intel has revived its foundry business and is also making advanced node 18 chips at a new Arizona fab. Fair to say that AI might be bringing the silicon back to Silicon Valley. That's a great, that's a great take, yes. You know, it creates an incredible opportunity, not just in manufacturers chips here, but actually leverage, you know, physical AI and leverage all these other capabilities

production. Blackwell in Arizona. With help from a major investment from the US government, Intel has revived its foundry business and is also making advanced node 18 chips at a new Arizona fab. Fair to say that AI might be bringing the silicon back to Silicon Valley. That's a great, that's a great take, yes. You know, it creates an incredible opportunity, not just in manufacturers chips here, but actually leverage, you know, physical AI and leverage all these other capabilities

that help advance, you know, a lot of the manufacturing that will be coming back here to make sure that the US stays globally competitive. Huawei, ByteDance and Alibaba are some major Chinese players making custom ASICs, although they are limited by export controls on the most advanced equipment and AI chips like Nvidia's Blackwell. The other big differentiator will be who can secure enough power for all this huge AI data center build out. If the US wants to continue to lead in AI,

we continue to be fraught with energy risk. China has done that much better than us, for instance. And while we do have the best chips in the world, and I believe so by multiple generations, I believe that our need to build out energy is critical. But with Nvidia's lead as the most valuable company in the world, companies are not slowing down in the race to make their own AI chips. Although dethroning Nvidia won't come easily. They have that position because they've earned it

and they've spent the years building it, and they've won that developer ecosystem. But that market's going to get so big that we're going to continue to see new entrants.

Loading...

Loading video analysis...