LongCut logo

Leopold Aschenbrenner — 2027 AGI, China/US super-intelligence race, & the return of history

By Dwarkesh Patel

Summary

## Key takeaways - **Trillion-Dollar Clusters by 2030**: AI training clusters are rapidly scaling, projected to cost hundreds of billions of dollars and consume over 20% of US electricity production by 2030, requiring millions of GPUs. [04:02:00], [04:18:00] - **AGI by 2027-2028 with Un-Unhobbling**: By 2027-2028, AI systems are expected to reach expert-level intelligence and function as 'drop-in remote workers' due to advancements in 'unhobbling' capabilities, moving beyond simple chatbots. [08:01:00], [08:46:00] - **Geopolitical Stakes: Superintelligence Race**: The race for superintelligence is a critical geopolitical factor, with potential for a few years' lead to be decisive in military competition, akin to the technological advantage in the first Gulf War. [25:06:00], [27:11:00] - **US vs. China: AI Cluster Location Matters**: Building AI clusters in authoritarian regimes like the UAE poses security risks, including potential theft of AGI or seizure of compute, making it crucial to keep these facilities in the US or allied democracies. [42:37:00], [43:07:00] - **Historical Parallels: WWII Mobilization**: The rapid industrial mobilization during WWII, despite challenges like labor strikes, offers a historical precedent for how a nation can rapidly scale production for critical technological advancements. [50:36:00], [51:01:00]

Topics Covered

  • AI Compute Growth: A Decade of Exponential Scaling
  • AGI by 2027: Beyond Chatbots to Remote Workers
  • Scaling AI from preschooler to remote worker by 2027
  • OpenAI's alleged plan: AGI bidding war between US, China, Russia
  • Stealing AI weights: The atomic bomb analogy

Full Transcript

What will be at stake will  not just be cool products 

But whether liberal democracy survives, Whether the CCP survives, what the world  

order for the next century is going to be The CCP is going to have an all out  

effort to infiltrate American AI labs Billions of dollars, thousands of people 

CCP is going to try to out-build us. People don’t realize like how  

intense state level espionage can be When we have like literal superintelligence 

They can like Stuxnet the chinese data centers You really think that will be like  

a private company And the government  

wouldn’t be like “oh my god what is going on?” I do think it is incredibly important that these  

clusters are in the united states I mean would you do the manhattan  

project in the UAE? 2023 was the moment for me  

when it went from AGI as this sort of theoretical,  abstract thing, and you’d make the models to like,  

I see it, I feel it. I can see the cluster where  it’s trained on, like the rough combination of  

algorithms, the people, like how it’s happening,  and I think most of the world is not; most of  

the people who feel it are like right here Today I’m chatting with my friend Leopold  

Aschenbrenner. He grew up in Germany and graduated  as valedictorian of Columbia when he was 19. After  

that, he had a very interesting gap year which  we’ll talk about. Then, he was on the OpenAI  

superalignment team, may it rest in peace. Now, with some anchor investments — from  

Patrick and John Collison, Daniel Gross, and Nat  Friedman — he is launching an investment firm. 

Leopold, you’re off to a slow start but  life is long. I wouldn’t worry about it  

too much. You’ll make up for it in due  time. Thanks for coming on the podcast. 

Thank you. I first discovered your  podcast when your best episode had  

a couple of hundred views. It’s been amazing to  follow your trajectory. It’s a delight to be on. 

In the Sholto and Trenton episode, I mentioned  that a lot of the things I’ve learned about AI  

I’ve learned from talking with them. The  third, and probably most significant, part  

of this triumvirate has been you. We’ll  get all the stuff on the record now. 

Here’s the first thing I want to get on the  record. Tell me about the trillion-dollar cluster. 

I should mention this for the context of  the podcast. Today you’re releasing a series  

called Situational Awareness. We’re going to  get into it. First question about that is,  

tell me about the trillion-dollar cluster. Unlike most things that have recently come  

out of Silicon Valley, AI is an industrial  process. The next model doesn’t just require  

some code. It’s building a giant new cluster.  It’s building giant new power plants. Pretty soon,  

it’s going to involve building giant new fabs. Since ChatGPT, this extraordinary techno-capital  

acceleration has been set into  motion. Exactly a year ago today,  

Nvidia had their first blockbuster earnings call.  It went up 25% after hours and everyone was like,  

"oh my God, AI is a thing." Within a year, Nvidia  data center revenue has gone from a few billion a  

quarter to $25 billion a quarter and continues  to go up. Big Tech capex is skyrocketing. 

It’s funny. There’s this crazy scramble going on,  but in some sense it’s just the continuation of  

straight lines on a graph. There’s this long-run  trend of almost a decade of training compute  

for the largest AI systems growing by about  half an order of magnitude, 0.5 OOMs a year. 

Just play that forward. GPT-4 was reported  to have finished pre-training in 2022.  

On SemiAnalysis, it was rumored to have a cluster  size of about 25,000 A100s. That’s roughly a $500  

million cluster. Very roughly, it’s 10 megawatts. Just play that forward half a year. By 2024,  

that’s a cluster that’s 100 MW and 100,000  H100 equivalents with costs in the billions. 

Play it forward two more years. By 2026, that’s  a gigawatt, the size of a large nuclear reactor.  

That’s like the power of the Hoover Dam.  That costs tens of billions of dollars  

and requires a million H100 equivalents. By 2028, that’s a cluster that’s ten GW.  

That’s more power than most US states.  That’s 10 million H100 equivalents,  

costing hundreds of billions of dollars. By 2030, you get the trillion-dollar cluster  

using 100 gigawatts, over 20% of US electricity  production. That’s 100 million H100 equivalents. 

That’s just the training cluster. There are more  inference GPUs as well. Once there are products,  

most of them will be inference GPUs. US power  production has barely grown for decades.  

Now we’re really in for a ride. When I had Zuck on the podcast,  

he was claiming not a plateau per se, but that AI  progress would be bottlenecked by this constraint  

on energy. Specifically, he was like, "oh,  gigawatt data centers, are we going to build  

another Three Gorges Dam or something?" According to public reports, there  

are companies planning things on the scale of  a 1 GW data center. With a 10 GW data center,  

who’s going to be able to build that? A 100 GW  center is like a state project. Are you going  

to pump that into one physical data center? How  is it going to be possible? What is Zuck missing? 

Six months ago, 10 GW was the talk of the town.  Now, people have moved on. 10 GW is happening.  

There’s The Information report on OpenAI and  Microsoft planning a $100 billion cluster. 

Is that 1 GW? Or is that 10 GW? I don’t know but if you try to map  

out how expensive the 10 GW cluster would be,  that’s a couple of hundred billion. It’s sort  

of on that scale and they’re planning it.  It’s not just my crazy take. AMD forecasted  

a $400 billion AI accelerator market by 2027. AI  accelerators are only part of the expenditures. 

We’re very much on track for a $1 trillion of  total AI investment by 2027. The $1 trillion  

cluster will take a bit more acceleration. We  saw how much ChatGPT unleashed. Every generation,  

the models are going to be crazy  and shift the Overton window. 

Then the revenue comes in. These are  forward-looking investments. The question is,  

do they pay off? Let’s estimate the GPT-4 cluster  at around $500 million. There’s a common mistake  

people make, saying it was $100 million for GPT-4.  That’s just the rental price. If you’re building  

the biggest cluster, you have to build and pay  for the whole cluster. You can’t just rent it  

for three months. Can’t you? 

Once you’re trying to get into the  hundreds of billions, you have to get  

to like $100 billion a year in revenue. This  is where it gets really interesting for the  

big tech companies because their revenues  are on the order of hundreds of billions. 

$10 billion is fine. It’ll pay off  the 2024 size training cluster.  

It’ll really be gangbusters with Big Tech when  it costs $100 billion a year. The question is  

how feasible is $100 billion a year from  AI revenue? It’s a lot more than right  

now. If you believe in the trajectory of  AI systems as I do, it’s not that crazy. 

There are like 300 million Microsoft Office  subscribers. They have Copilot now. I don’t  

know what they’re selling it for. Suppose  you sold some AI add-on for $100/month to  

a third of Microsoft Office subscribers. That’d  be $100 billion right there. $100/month is a lot. 

That’s a lot for a third of Office subscribers. For the average knowledge worker, it’s a few  

hours of productivity a month. You have  to be expecting pretty lame AI progress to  

not hit a few hours of productivity a month. Sure, let’s assume all this. What happens in  

the next few years? What can the AI trained  on the 1 GW data center do? What about the  

one on the 10 GW data center? Just map out  the next few years of AI progress for me. 

The 10 GW range is my best guess  for when you get true AGI. Compute  

is actually overrated. We’ll talk about that. By 2025-2026, we’re going to get models that  

are basically smarter than most college  graduates. A lot of the economic usefulness  

depends on unhobbling. The models are smart  but limited. There are chatbots and then there  

are things like being able to use a computer  and doing agentic long-horizon tasks. 

By 2027-2028, it’ll get as smart as the smartest  experts. The unhobbling trajectory points to it  

becoming much more like an agent than a chatbot.  It’ll almost be like a drop-in remote worker. 

This is the question around the economic returns.  Intermediate AI systems could be really useful,  

but it takes a lot of schlep to  integrate them. There’s a lot you  

could do with GPT-4 or GPT-4.5 in a business  use case, but you really have to change your  

workflows to make them useful. It’s a very Tyler  Cowen-esque take. It just takes a long time to  

diffuse. We’re in SF and so we miss that. But in some sense, the way these systems  

want to be integrated is where you get this kind  of sonic boom. Intermediate systems could have  

done it, but it would have taken schlep. Before  you do the schlep to integrate them, you’ll get  

much more powerful systems that are unhobbled. They’re agents, drop-in remote workers. You’re  

interacting with them like coworkers. You  can do Zoom calls and Slack with them. You  

can ask them to do a project and they go off and  write a first draft, get feedback, run tests on  

their code, and come back. Then you can tell them  more things. That’ll be much easier to integrate. 

You might need a bit of overkill to make  the transition easy and harvest the gains. 

What do you mean by overkill?  Overkill on model capabilities? 

Yeah, the intermediate models could do it but  it would take a lot of schlep. The drop-in  

remote worker AGI can automate cognitive  tasks. The intermediate models would have  

made the software engineer more productive.  But will the software engineer adopt it? 

With the 2027 model, you just don’t need the  software engineer. You can interact with it  

like a software engineer, and it’ll  do the work of a software engineer. 

The last episode I did was with John Schulman. I was asking about this. We have these models  

that have come out in the last year and none seem  to have significantly surpassed GPT-4, certainly  

not in an agentic way where they interact  with you as a coworker. They’ll brag about  

a few extra points on MMLU. Even with GPT-4o,  it’s cool they can talk like Scarlett Johansson  

(I guess not anymore) but  it’s not like a coworker. 

It makes sense why they’d be good at answering  questions. They have data on how to complete  

Wikipedia text. Where is the equivalent training  data to understand a Zoom call? Referring back to  

your point about a Slack conversation,  how can it use context to figure out the  

cohesive project you’re working on?  Where is that training data coming from? 

A key question for AI progress in the next few  years is how hard it is to unlock the test time  

compute overhang. Right now, GPT-4 can do a few  hundred tokens with chain-of-thought. That’s  

already a huge improvement. Before, answering a  math question was just shotgun. If you tried to  

answer a math question by saying the first thing  that comes to mind, you wouldn’t be very good. 

GPT-4 thinks for a few hundred tokens. If  I think at 100 tokens a minute, that’s like  

what GPT-4 does. It’s equivalent to me thinking  for three minutes. Suppose GPT-4 could think  

for millions of tokens. That’s +4 OOMs on test  time compute on one problem. It can’t do it now.  

It gets stuck. It writes some code. It can do a  little bit of iterative debugging, but eventually  

gets stuck and can’t correct its errors. There’s a big overhang. In other areas of ML,  

there’s a great paper on AlphaGo, where you can  trade off train time and test time compute. If  

you can use 4 OOMs more test time compute,  that’s almost like a 3.5x OOM bigger model. 

Again, if it’s 100 tokens a minute, a few million  tokens is a few months of working time. There’s a  

lot more you can do in a few months of working  time than just getting an answer right now. The  

question is how hard is it to unlock that? In the short timelines AI world,  

it’s not that hard. The reason it might not be  that hard is that there are only a few extra  

tokens to learn. You need to learn things like  error correction tokens where you’re like “ah,  

I made a mistake, let me think about that again.”  You need to learn planning tokens where it’s like  

“I’m going to start by making a plan. Here’s my  plan of attack. I’m going to write a draft and  

now I’m going to critique my draft and think about  it.” These aren’t things that models can do now,  

but the question is how hard it is. There are two paths to agents. When  

Sholto was on your podcast, he talked about  scaling leading to more nines of reliability.  

That’s one path. The other path is the  unhobbling path. It needs to learn this  

System 2 process. If it can learn that, it can  use millions of tokens and think coherently. 

Here’s an analogy. When you drive, you’re  on autopilot most of the time. Sometimes you  

hit a weird construction zone or intersection.  Sometimes my girlfriend is in the passenger seat  

and I’m like “ah, be quiet for a moment,  I need to figure out what’s going on.” 

You go from autopilot to System 2 and  you’re thinking about how to do it. Scaling  

improves that System 1 autopilot. The brute force  way to get to agents is improving that system. If  

you can get System 2 working, you can  quickly jump to something more agentified  

and test time compute overhang is unlocked. What’s the reason to think this is an easy  

win? Is there some loss function that easily  enables System 2 thinking? There aren’t many  

animals with System 2 thinking. It took a long  time for evolution to give us System 2 thinking. 

Pre-training has trillions of tokens of Internet  text, I get that. You match that and get  

all of these free training capabilities. What’s  the reason to think this is an easy unhobbling? 

First of all, pre-training is magical.  It gave us a huge advantage for models of  

general intelligence because you can predict the  next token. But there’s a common misconception.  

Predicting the next token lets the model learn  incredibly rich representations. Representation  

learning properties are the magic of  deep learning. Rather than just learning  

statistical artifacts, the models learn models  of the world. That’s why they can generalize,  

because it learned the right representations. When you train a model, you have this raw bundle  

of capabilities that’s useful. The unhobbling  from GPT-2 to GPT-4 took this raw mass and RLHF’d  

it into a good chatbot. That was a huge win. In the original InstructGPT paper, comparing  

RLHF vs. non-RLHF models it’s like a 100x model  size win on human preference rating. It started  

to be able to do simple chain-of-thought and  so on. But you still have this advantage of all  

these raw capabilities, and there’s still  a huge amount you’re not doing with them. 

This pre-training advantage is also the difference  to robotics. People used to say it was a hardware  

problem. The hardware is getting solved,  but you don’t have this huge advantage  

of bootstrapping with pre-training. You don’t  have all this unsupervised learning you can do.  

You have to start right away with RL self-play. The question is why RL and unhobbling might work.  

Bootstrapping is an advantage. Your Twitter  bio is being pre-trained. You’re not being  

pre-trained anymore. You were pre-trained in  grade school and high school. At some point,  

you transition to being able to learn by yourself.  You weren’t able to do it in elementary school.  

High school is probably where it started and by  college, if you’re smart, you can teach yourself.  

Models are just starting to enter that regime. It’s a little bit more scaling and then you figure  

out what goes on top. It won’t be trivial. A lot  of deep learning seems obvious in retrospect.  

There’s some obvious cluster of ideas. There  are some ideas that seem a little dumb but  

work. There are a lot of details you have to  get right. We’re not going to get this next  

month. It’ll take a while to figure out. A while for you is like half a year. 

I don’t know, between six months and three  years. But it's possible. It’s also very  

related to the issue of the data wall. Here’s one  intuition on learning by yourself. Pre-training  

is kind of like the teacher lecturing to  you and the words are flying by. You’re  

just getting a little bit from it. That's not what you do when you learn  

by yourself. When you learn by yourself,  say you're reading a dense math textbook,  

you're not just skimming through it once.  Some wordcels just skim through and reread  

and reread the math textbook and they memorize. What you do is you read a page, think about it,  

have some internal monologue going on, and have  a conversation with a study buddy. You try a  

practice problem and fail a bunch of times.  At some point it clicks, and you're like,  

"this made sense." Then you read a few more pages. We've kind of bootstrapped our way to  

just starting to be able to do that  now with models. The question is,  

can you use all this sort of self-play, synthetic  data, RL to make that thing work. Right now,  

there's in-context learning, which is super  sample efficient. In the Gemini paper, it just  

learns a language in-context. Pre-training, on  the other hand, is not at all sample efficient. 

What humans do is a kind of in-context  learning. You read a book, think about it,  

until eventually it clicks. Then you somehow  distill that back into the weights. In some sense,  

that's what RL is trying to do. RL is super  finicky, but when it works it's kind of magical. 

It's the best possible data for the model.  It’s when you try a practice problem, fail,  

and at some point figure it out in a way  that makes sense to you. That's the best  

possible data for you because it's the  way you would have solved the problem,  

rather than just reading how somebody else solved  the problem, which doesn't initially click. 

By the way, if that take sounds familiar it's  because it was part of the question I asked  

John Schulman. It goes to illustrate the thing  I said in the intro. A bunch of the things I've  

learned about AI comes from these dinners we  do before the interviews with me, you, Sholto,  

and a couple of others. We’re like, “what should  I ask John Schulman, what I should ask Dario.” 

Suppose this is the way things  go and we get these unhobblings— 

And the scaling. You have this baseline of this  enormous force of scaling. GPT-2 was amazing. It  

could string together plausible sentences, but it  could barely do anything. It was kind of like a  

preschooler. GPT-4, on the other hand, could write  code and do hard math, like a smart high schooler.  

This big jump in capability is explored in the  essay series. I count the orders of magnitude  

of compute and scale-up of algorithmic progress. Scaling alone by 2027-2028 is going to do another  

preschool to high school jump on top of GPT-4. At  a per token level, the models will be incredibly  

smart. They'll gain more reliability, and with the  addition of unhobblings, they'll look less like  

chatbots and more like agents or drop-in remote  workers. That's when things really get going. 

I want to ask more questions about this but let's  zoom out. Suppose you're right about this. This is  

because of the 2027 cluster which is at 10 GW? 2028 is 10 GW. Maybe it'll be pulled forward. 

Something like a 5.5 level by 2027, whatever  that's called. What does the world look like at  

that point? You have these remote workers who can  replace people. What is the reaction to that in  

terms of the economy, politics, and geopolitics? 2023 was a really interesting year to experience  

as somebody who was really following the AI stuff. What were you doing in 2023? 

OpenAI. When you were at OpenAI in 2023, it was a  weird thing. You almost didn't want to talk about  

AI or AGI. It was kind of a dirty word. Then  in 2023, people saw ChatGPT for the first time,  

they saw GPT-4, and it just exploded. It triggered huge capital expenditures  

from all these firms and an explosion in revenue  from Nvidia and so on. Things have been quiet  

since then, but the next thing has been in the  oven. I expect every generation these g-forces  

to intensify. People will see the models.  They won’t have counted the OOMs so they're  

going to be surprised. It'll be kind of crazy. Revenue is going to accelerate. Suppose you  

do hit $10 billion by the end of this year.  Suppose it just continues on the trajectory  

of revenue doubling every six months. It's not  actually that far from $100 billion, maybe by  

2026. At some point, what happened to Nvidia  is going to happen to Big Tech. It's going to  

explode. A lot more people are going to feel it. 2023 was the moment for me where AGI went from  

being this theoretical, abstract thing. I see  it, I feel it, and I see the path. I see where  

it's going. I can see the cluster it's trained on,  the rough combination of algorithms, the people,  

how it's happening. Most of the world is not there  yet. Most of the people who feel it are right  

here. A lot more of the world is going to start  feeling it. That's going to start being intense. 

Right now, who feels it? You can go on Twitter  and there are these GPT wrapper companies, like,  

"whoa, GPT-4 is going to change our business." I'm so bearish on the wrapper companies because  

they're betting on stagnation. They're  betting that you have these intermediate  

models and it takes so much schlep to integrate  them. I'm really bearish because we're just  

going to sonic boom you. We're going to get the  unhobblings. We're going to get the drop-in remote  

worker. Your stuff is not going to matter. So that's done. SF, this crowd, is paying  

attention now. Who is going to be paying  attention in 2026 and 2027? Presumably,  

these are years in which hundreds of  billions of capex is being spent on AI. 

The national security state is  going to start paying a lot of  

attention. I hope we get to talk about that. Let’s talk about it now. What happens? What  

is the immediate political reaction? Looking  internationally, I don't know if Xi Jinping  

sees the GPT-4 news and goes, "oh, my God,  look at the MMLU score on that. What are  

we doing about this, comrade?" So what happens when he sees a  

remote worker replacement and it has $100  billion in revenue? There’s a lot of businesses  

that have $100 billion in revenue, and people  aren't staying up all night talking about it. 

The question is, when does the CCP and when does  the American national security establishment  

realize that superintelligence is going to be  absolutely decisive for national power? This  

is where the intelligence explosion stuff  comes in, which we should talk about later. 

You have AGI. You have this drop-in  remote worker that can replace you or me,  

at least for remote jobs. Fairly quickly, you  turn the crank one or two more times and you  

get a thing that's smarter than humans. Even more than just turning the crank a  

few more times, one of the first jobs to  be automated is going to be that of an  

AI researcher or engineer. If you can automate  AI research, things can start going very fast. 

Right now, there's already at this trend  of 0.5 OOMs a year of algorithmic progress.  

At some point, you're going to have GPU fleets  in the tens of millions for inference or more.  

You’re going to be able to run 100 million human  equivalents of these automated AI researchers. 

If you can do that, you can maybe do a decade's  worth of ML research progress in a year. You  

get some sort of 10x speed up. You can make  the jump to AI that is vastly smarter than  

humans within a year, a couple of years. That broadens from there. You have this  

initial acceleration of AI research. You apply  R&D to a bunch of other fields of technology. At  

this point, you have a billion super intelligent  researchers, engineers, technicians, everything.  

They’re superbly competent at all things. They're going to figure out robotics. We  

talked about that being a software problem. Well,  you have a billion super smart — smarter than the  

smartest human researchers — AI researchers  in your cluster. At some point during the  

intelligence explosion, they're going to be able  to figure out robotics. Again, that’ll expand. 

If you play this picture forward, it is fairly  unlike any other technology. A couple years of  

lead could be utterly decisive in say, military  competition. If you look at the first Gulf War,  

Western coalition forces had a 100:1 kill ratio.  They had better sensors on their tanks. They had  

better precision missiles, GPS, and stealth.  They had maybe 20-30 years of technological  

lead. They just completely crushed them. Superintelligence applied to broad fields  

of R&D — and the industrial explosion that comes  from it, robots making a lot of material — could  

compress a century’s worth of technological  progress into less than a decade. That means  

that a couple years could mean a Gulf War 1-style  advantage in military affairs. That’s including a  

decisive advantage that even preempts nukes. How do you find nuclear stealth submarines?  

Right now, you have sensors and software to  detect where they are. You can do that. You  

can find them. You have millions or billions  of mosquito-sized drones, and they take out the  

nuclear submarines. They take out the mobile  launchers. They take out the other nukes. 

It’s potentially enormously destabilizing  and enormously important for national power.  

At some point people are going to realize  that. Not yet, but they will. When they do,  

it won’t just be the AI researchers in charge. The CCP is going to have an all-out effort to  

infiltrate American AI labs. It’ll involve  billions of dollars, thousands of people,  

and the full force of the Ministry of State  Security. The CCP is going to try to outbuild us. 

They added as much power in the last decade as an  entire US electric grid. So the 100 GW cluster,  

at least the 100 GW part of it, is going to be  a lot easier for them to get. By this point,  

it's going to be an extremely  intense international competition. 

One thing I'm uncertain about in this picture  is if it’s like what you say, where it's more  

of an explosion. You’ve developed an AGI. You  make it into an AI researcher. For a while,  

you're only using this ability to make hundreds  of millions of other AI researchers. The thing  

that comes out of this really frenetic process  is a superintelligence. Then that goes out in the  

world and is developing robotics and helping  you take over other countries and whatever. 

It's a little bit more gradual. It's an  explosion that starts narrowly. It can do  

cognitive jobs. The highest ROI use for  cognitive jobs is to make the AI better  

and solve robotics. As you solve robotics, now  you can do R&D in biology and other technology. 

Initially, you start with the factory workers.  They're wearing the glasses and AirPods, and the  

AI is instructing them because you can make any  worker into a skilled technician. Then you have  

the robots come in. So this process expands. Meta's Ray-Bans are a complement to Llama. 

With the fabs in the US, their constraint is  skilled workers. Even if you don't have robots,  

you have the cognitive superintelligence  and can kind of make them all into skilled  

workers immediately. That's a very  brief period. Robots will come soon. 

Suppose this is actually how the tech  progresses in the United States, maybe  

because these companies are already generating  hundreds of billions of dollars of AI revenue 

At this point, companies are borrowing hundreds  of billions or more in the corporate debt markets. 

Why is a CCP bureaucrat, some 60-year-old  guy, looking at this and going,  

"oh, Copilot has gotten better now" and now— This is much more than Copilot has gotten  

better now. It’d require  

shifting the production of an entire country,  dislocating energy that is otherwise being used  

for consumer goods or something, and feeding all  that into the data centers. Part of this whole  

story is that you realize superintelligence is  coming soon. You realize it and maybe I realize  

it. I'm not sure how much I realize it. Will the national security apparatus in  

the United States and the CCP realize it? This is a really key question. We have a few  

more years of mid-game. We have a few more  2023s. That just starts updating more and  

more people. The trend lines will become clear. You will see some amount of the COVID dynamic.  

COVID in February of 2020 honestly feels a lot  like today. It feels like this utterly crazy thing  

is coming. You see the exponential and yet most  of the world just doesn't realize it. The mayor of  

New York is like, "go out to the shows," and "this  is just Asian racism." At some point, people saw  

it and then crazy, radical reactions came. By the way, what were you doing during  

COVID? Was it your freshman or sophomore year? Junior. 

Still, you were like a 17-year-old junior or  something right? Did you short the market or  

something? Did you sell at the right time? Yeah. 

So there will be a March 2020 moment. You can make the analogy you make in the  

series that this will cause a reaction like, “we  have to do the Manhattan Project again for America  

here.” I wonder what the politics of this will  be like. The difference here is that it’s not  

just like, “we need the bomb to beat the Nazis.” We'll be building this thing that makes all our  

energy prices go up a bunch and it's automating a  lot of our jobs. The climate change stuff people  

are going to be like, "oh, my God, it's making  climate change worse and it's helping Big Tech." 

Politically, this doesn't seem like a dynamic  where the national security apparatus or the  

president is like, "we have to step on  the gas here and make sure America wins." 

Again, a lot of this really depends on how  much people are feeling it and how much people  

are seeing it. Our generation is so used to  peace, American hegemony and nothing matters.  

The historical norm is very much one of extremely  intense and extraordinary things happening in the  

world with intense international competition. There's a 20-year very unique period. In World  

War II, something like 50% of GDP went  to war production. The US borrowed over  

60% of GDP. With Germany and Japan I think  it was over 100%. In World War I, the UK,  

France, and Germany all borrowed over 100% of GDP. Much more was on the line. People talk about  

World War I being so destructive with 20  million Soviet soldiers dying and 20% of  

Poland. That happened all the time. During  the Seven Years' War something like 20-30%  

of Prussia died. In the Thirty Years' War,  up to 50% of a large swath of Germany died. 

Will people see that the stakes here are  really high and that history is actually  

back? The American national security state  thinks very seriously about stuff like this.  

They think very seriously about competition with  China. China very much thinks of itself on this  

historical mission of the rejuvenation of the  Chinese nation. They think a lot about national  

power. They think a lot about the world order. There's a real question on timing. Do they  

start taking this seriously when the intelligence  explosion is already happening quite late. Do they  

start taking this seriously two years earlier?  That matters a lot for how things play out. 

At some point they will and they will realize  that this will be utterly decisive for not  

just some proxy war but for major questions.  Can liberal democracy continue to thrive? Can  

the CCP continue existing? That will activate  forces that we haven't seen in a long time. 

The great power conflict definitely seems  compelling. All kinds of different things  

seem much more likely when you think  from a historical perspective. You  

zoom out beyond the liberal democracy that  we’ve had the pleasure to live in America  

for say the last 80 years. That includes  things like dictatorships, war, famine, etc. 

I was reading The Gulag Archipelago and one of the  chapters begins with Solzhenitsyn saying how if  

you had told a Russian citizen under the tsars  that because of all these new technologies — we  

wouldn’t see some Great Russian revival with  Russia becoming a great power and the citizens  

made wealthy — you would see tens of millions of  Soviet citizens tortured by millions of beasts in  

the worst possible ways. If you’d told them that  that would be the result of the 20th century,  

they wouldn’t have believed you.  They’d have called you a slanderer. 

The possibilities for dictatorship with  superintelligence are even crazier as well.  

Imagine you have a perfectly loyal military  and security force. No more rebellions. No  

more popular uprisings. You have perfect lie  detection. You have surveillance of everybody.  

You can perfectly figure out who's the dissenter  and weed them out. No Gorbachev who had some  

doubts about the system would have ever risen to  power. No military coup would have ever happened. 

There's a real way in which part of why things  have worked out is that ideas can evolve. There's  

some sense in which time heals a lot of wounds and  solves a lot of debates. Throughout time, a lot of  

people had really strong convictions, but a lot  of those have been overturned over time because  

there's been continued pluralism and evolution. Imagine applying a CCP-like approach to  

truth where truth is what the party says. When  you supercharge that with superintelligence, that  

could just be locked in and enshrined for a long  time. The possibilities are pretty terrifying. 

To your point about history and living in America  for the past eight years, this is one of the  

things I took away from growing up in Germany. A  lot of this stuff feels more visceral. My mother  

grew up in the former East, my father in the  former West. They met shortly after the Wall fell.  

The end of the Cold War was this extremely pivotal  moment for me because it's the reason I exist. 

I grew up in Berlin with the former Wall.  My great-grandmother, who is still alive,  

is very important in my life. She was born in 1934  and grew up during the Nazi era. In World War II,  

she saw the firebombing of Dresden from  this country cottage where they were as  

kids. Then she spent most of her life in  the East German communist dictatorship. 

She'd tell me about how Soviet tanks came  when there was the popular uprising in 1954.  

Her husband was telling her to get home  really quickly and get off the streets.  

She had a son who tried to ride a motorcycle  across the Iron Curtain and then was put in  

a Stasi prison for a while. Finally, when  she's almost 60, it was the first time she  

lived in a free country, and a wealthy country. When I was a kid, the thing she always really  

didn't want me to do was get involved in  politics. Joining a political party had  

very bad connotations for her. She raised  me when I was young. So it doesn't feel  

that long ago. It feels very close. There’s one thing I wonder about when  

we're talking today about the CCP. The people  in China who will be doing their version of this  

project will be AI researchers who are somewhat  Westernized. They’ll either have gotten educated  

in the West or have colleagues in the West. Are they going to sign up for the CCP  

project that's going to hand over control to Xi  Jinping? What's your sense of that? Fundamentally,  

they're just people, right? Can't you convince  them about the dangers of superintelligence? 

Will they be in charge though? In some  sense, this is also the case in the  

US. This is like the rapidly depreciating  influence of the lab employees. Right now,  

the AI lab employees have so much power. You  saw this November event. It’s so much power. 

Both are going to get automated and they're  going to lose all their power. It'll just be  

a few people in charge with their armies of  automated AIs. It’s also the politicians and  

the generals and the national security  state. There are some of these classic  

scenes from the Oppenheimer movie. The  scientists built it and then the bomb was  

shipped away and it was out of their hands. It's good for lab employees to be aware of  

this. You have a lot of power now, but  maybe not for that long. Use it wisely.  

I do think they would benefit from some  more organs of representative democracy. 

What do you mean by that? In the OpenAI board events,  

employee power is exercised in a very direct  democracy way. How some of that went about  

really highlighted the benefits of representative  democracy and having some deliberative organs. 

Interesting. Let's go back to the $100 billion  revenue question. The companies are trying to  

build clusters that are this big. Where  are they building it? Say it's the amount  

of energy that would be required for a small or  medium-sized US state. Does Colorado then get no  

power because it's happening in the United  States? Is it happening somewhere else? 

This is the thing that I always find funny,  when you talk about Colorado getting no power.  

The easy way to get the power would be to  displace less economically useful stuff.  

Buy up the aluminum smelting plant that has a  gigawatt. We're going to replace it with the  

data center because that's important. That's not  actually happening because a lot of these power  

contracts are really locked in long-term.  Also, people don't like things like this. 

In practice what it requires, at least  right now, is building new power. That  

might change. That's when things get really  interesting, when it's like, “no, we're just  

dedicating all of the power to the AGI.” So right now it's building new power. 10  

GW is quite doable. It's like a few percent of US  natural gas production. When you have the 10 GW  

training cluster, you have a lot more inference.  100 gigawatts is where it starts getting pretty  

wild. That's over 20% of US electricity  production. It's pretty doable, especially  

if you're willing to go for natural gas. It is incredibly important that these  

clusters are in the United States. Why does it matter that it's in the US? 

There are some people who are trying to  build clusters elsewhere. There's a lot of  

free-flowing Middle Eastern money that's trying  to build clusters elsewhere. This comes back to  

the national security question we talked about.  Would you do the Manhattan Project in the UAE? 

You can put the clusters in the US and you can  put them in allied democracies. Once you put them  

in authoritarian dictatorships, you create this  irreversible security risk. Once the cluster is  

there, it's much easier for them to exfiltrate  the weights. They can literally steal the AGI,  

the superintelligence. It’s like they got a direct  copy of the atomic bomb. It makes it much easier  

for them. They have weird ties to China. They  can ship that to China. That's a huge risk. 

Another thing is they can just seize the compute.  The issue here is people right now are thinking  

of this as ChatGPT, Big Tech product  clusters. The clusters being planned now,  

three to five years out, may well be the AGI,  superintelligence clusters. When things get hot,  

they might just seize the compute. Suppose we put 25% of the compute  

capacity in these Middle Eastern dictatorships.  Say they seize that. Now it's a ratio of compute  

of 3:1. We still have more, but even with  only 25% of compute there it starts getting  

pretty hairy. 3:1 is not that great of a ratio.  You can do a lot with that amount of compute. 

Say they don't actually do this. Even if  they don't actually seize the compute,  

even if they actually don't steal the weights,  there's just a lot of implicit leverage you  

get. They get seats at the AGI table. I  don't know why we're giving authoritarian  

dictatorships the seat at the AGI table. There's going to be a lot of compute  

in the Middle East if these deals go through. First of all, who is it? Is it just every  

single Big Tech company trying  to figure it out over there? 

It’s not everybody, some. There are reports, I think  

Microsoft. We'll get into it. So say the UAE gets a bunch of  

compute because we're building the clusters there.  Let's say they have 25% of the compute. Why does a  

compute ratio matter? If it's about them being  able to kick off the intelligence explosion,  

isn't it just some threshold where you have  100 million AI researchers or you don't? 

You can do a lot with 33 million extremely  smart scientists. That might be enough  

to build the crazy bio weapons. Then  you're in a situation where they stole  

the weights and they seized the compute. Now they can make these crazy new WMDs that  

will be possible with superintelligence.  Now you've just proliferated the stuff  

that’ll be really powerful. Also, 3x  on compute isn't actually that much. 

The riskiest situation is if we're in some sort  of really neck and neck, feverish international  

struggle. Say we're really close with the CCP  and we're months apart. The situation we want  

to be in — and could be in if we play our cards  right — is a little bit more like the US building  

the atomic bomb versus the German project years  behind. If we have that, we just have so much  

more wiggle room to get safety right. We're going to be building these crazy  

new WMDs that completely undermine nuclear  deterrence. That's so much easier to deal  

with if you don't have somebody right on your  tails and you have to go at maximum speed.  

You have no wiggle room. You're worried  that at any time they can overtake you. 

They can also just try to outbuild you. They  might literally win. China might literally win  

if they can steal the weights, because they  can outbuild you. They may have less caution,  

both good and bad caution in terms of  whatever unreasonable regulations we have. 

If you're in this really tight race, this  sort of feverish struggle, that's when  

there's the greatest peril of self-destruction. Presumably the companies that are trying to build  

clusters in the Middle East realize this.  Is it just that it’s impossible to do this  

in America? If you want American companies  to do this at all, do you have to do it in  

the Middle East or not at all? Then you just  have China build a Three Gorges Dam cluster. 

There’s a few reasons. People  aren’t thinking about this as the  

AGI superintelligence cluster. They’re just  like, “ah, cool clusters for my ChatGPT.” 

If you’re doing ones for inference, presumably  you could spread them out across the country or  

something. The ones they’re building,  they’re going to do one training run  

in a single thing they’re building. It’s just hard to distinguish between  

inference and training compute. People  can claim it’s inference compute,  

but they might realize that actually this is  going to be useful for training compute too. 

Because of synthetic data and things like that? RL looks a lot like inference, for example. Or  

you just end up connecting them in time. It's a  lot like raw materials. It's like placing your  

uranium refinement facilities there. So there are a few reasons. One,  

they don't think about this as the AGI  cluster. Another is just that there’s  

easy money coming from the Middle East. Another one is that some people think  

that you can't do it in the US. We actually  face a real system competition here. Some  

people think that only autocracies that can do  this with top-down mobilization of industrial  

capacity and the power to get stuff done fast. Again, this is the sort of thing we haven't faced  

in a while. But during the Cold War, there was  this intense system competition. East vs. West  

Germany was this. It was West Germany as liberal  democratic capitalism vs. state-planned communism. 

Now it's obvious that the free world  would win. But even as late as 1961,  

Paul Samuelson was predicting that the Soviet  Union would outgrow the United States because  

they were able to mobilize industry better. So there are some people who shitpost about  

loving America, but then in private they're  betting against America. They're betting against  

the liberal order. Basically, it's just a bad  bet. This stuff is really possible in the US. 

To make it possible in the US, to some degree we  have to get our act together. There are basically  

two paths to doing it in the US. One is you just  have to be willing to do natural gas. There's  

ample natural gas. You put your cluster in West  Texas. You put it in southwest Pennsylvania by  

the Marcellus Shale. The 10 GW cluster is super  easy. The 100 GW cluster is also pretty doable.  

I think natural gas production in the United  States has almost doubled in a decade. You do  

that one more time over the next seven years, you  could power multiple trillion-dollar data centers. 

The issue there is that a lot of people made these  climate commitments, not just the government. It's  

actually the private companies themselves,  Microsoft, Amazon, etc., that have made  

these climate commitments. So they won't do  natural gas. I admire the climate commitments,  

but at some point the national interest  and national security is more important. 

The other path is doing green energy  megaprojects. You do solar and batteries  

and SMRs and geothermal. If we want to do that,  there needs to be a broad deregulatory push.  

You can't have permitting take a decade. You  have to reform FERC. You have to have blanket  

NEPA exemptions for this stuff. There are inane state-level regulations. You can  

build the solar panels and batteries next to your  data center, but it'll still take years because  

you actually have to hook it up to the state  electrical grid. You have to use governmental  

powers to create rights of way to have multiple  clusters and connect them and have the cables. 

Ideally we do both. Ideally we do natural gas  and the broader deregulatory green agenda.  

We have to do at least one. Then this  stuff is possible in the United States. 

Before the conversation I was reading a good book  about World War II industrial mobilization in the  

United States called Freedom's Forge. I’m thinking  back on that period, especially in the context of  

reading Patrick Collison’s Fast and the progress  study stuff. There’s this narrative out there that  

we had state capacity back then and people just  got shit done but that now it's a clusterfuck. 

It wasn’t at all the case! It was really interesting. You had  

people from the Detroit auto industry side, like  William Knudsen, who were running mobilization for  

the United States. They were extremely competent.  At the same time you had labor organization and  

agitation, which is very analogous to the climate  change pledges and concerns we have today. 

They would literally have these strikes,  into 1941, costing millions of man-hours  

worth of time when we're trying to make tens  of thousands of planes a month. They would just  

debilitate factories for trivial concessions  from capital that were pennies on the dollar. 

There were concerns that the auto companies  were trying to use the pretext of a potential  

war to prevent paying labor the money it  deserves. So with what climate change is today,  

you might think, "ah, America's fucked. We're  not going to be able to build this shit if you  

look at NEPA or something,” I didn't realize  how debilitating labor was in World War II. 

It wasn’ just that. Before 1939, the American  military was in total shambles. You read about  

it and it reads a little bit like the German  military today. Military expenditures were I think  

less than 2% of GDP. All the European countries  had gone, even in peacetime, above 10% of GDP. 

It was rapid mobilization starting  from nothing. We were making no planes.  

There were no military contracts. Everything  had been starved during the Great Depression.  

But there was this latent capacity. At some  point the United States got its act together. 

This applies the other way around too with China.  Sometimes people count them out a little bit with  

the export controls and so on. They're able to  make 7-nanometer chips now. There's a question  

of how many they could make. There's at least  a possibility that they're going to mature that  

ability and make a lot of 7-nanometer chips. There's a lot of latent industrial capacity  

in China. They are able to build a lot of power  fast. Maybe that isn't activated for AI yet. At  

some point, the same way the United States and  a lot of people in the US government are going  

to wake up, the CCP is going to wake up. Companies realize that scaling is a thing.  

Obviously their whole plans are contingent on  scaling. So they understand that in 2028 we're  

going to be building 10 GW data centers. At that point, the people who can keep up  

are Big Tech, potentially at the edge of their  capabilities, sovereign wealth fund-funded things,  

and also major countries like America and China.  What's their plan? With the AI labs, what's their  

plan given this landscape? Do they not want  the leverage of being in the United States? 

The Middle East does offer capital, but America  has plenty of capital. We have trillion-dollar  

companies. What are these Middle Eastern  states? They're kind of like trillion-dollar  

oil companies. We have trillion-dollar companies  and very deep financial markets. Microsoft could  

issue hundreds of billions of dollars of  bonds and they can pay for these clusters. 

Another argument being made, which is worth  taking seriously, is that if we don't work  

with the UAE or with these Middle Eastern  countries, they're just going to go to China.  

They're going to build data centers and  pour money into AI regardless. If we don't  

work with them, they'll just support China. There's some merit to the argument in the  

sense that we should be doing benefit-sharing  with them. On the road to AGI, there should be  

two tiers of coalitions. There should be a narrow  coalition of democracies that's developing AGI.  

Then there should be a broader coalition of other  countries, including dictatorships, and we should  

offer them some of the benefits of AI. If the UAE wants to use AI products,  

run Meta recommendation engines,  or run the last-generation models,  

that's fine. By default, they just wouldn't  have had this seat at the AGI table. So they  

have some money, but a lot of people have money. The only reason they're getting this seat at the  

AGI table and giving these dictators this leverage  over this extremely important national security  

technology, is because we're getting  them excited and offering it to them. 

Who specifically is doing this? Who are the  companies who are going there to fundraise? 

It’s been reported that Sam Altman is trying  to raise $7 trillion or whatever for a chip  

project. It's unclear how many of the clusters  will be there, but definitely stuff is happening. 

There’s another reason I'm a little suspicious  of this argument that if the US doesn't work  

with them, they'll go to China. I've heard from  multiple people — not from my time at OpenAI,  

and I haven't seen the memo — that at some  point several years ago, OpenAI leadership  

had laid out a plan to fund and sell AGI by  starting a bidding war between the governments  

of the United States, China, and Russia. It's surprising to me that they're willing to  

sell AGI to the Chinese and Russian governments.  There's also something that feels eerily familiar  

about starting this bidding war and then  playing them off each other, saying, "well,  

if you don't do this, China will do it." Interesting. That's pretty fucked up. 

Suppose you're right. We ended up in this place  because, as one of our friends put it, the Middle  

East has billions or trillions of dollars up  for persuasion like no other place in the world. 

With little accountability. There’s no  Microsoft board. It's only the dictator. 

Let's say you're right, that you shouldn't  have gotten them excited about AGI in the  

first place. Now we're in a place where they  are excited about AGI and they're like, "fuck,  

we want to have GPT-5 while you're going to be off  building superintelligence. This Atoms for Peace  

thing doesn't work for us." If you're in this  place, don't they already have the leverage? 

The UAE on its own is not competitive. They're  already export-controlled. You're not supposed  

to ship Nvidia chips over there. It's  not like they have any of the leading  

AI labs. They have money, but it's hard  to just translate money into progress. 

But I want to go back to other things you've  been saying in laying out your vision. There's  

this almost industrial process of putting in  the compute and algorithms, adding that up,  

and getting AGI on the other end. If it's  something more like that, then the case for  

somebody being able to catch up rapidly seems  more compelling than if it's some bespoke... 

Well, if they can steal the algorithms and if they  can steal the weights, that’s really important. 

How easy would it be for an actor to steal  the things that are not the trivial released  

things, like Scarlett Johansson's voice, but the  RL things we're talking about, the unhobblings? 

It’s all extremely easy. They don’t make the claim  that it’s hard. DeepMind put out their Frontier  

Safety Framework and they lay out security levels,  zero to four. Four is resistant to state activity.  

They say, we're at level zero. Just recently,  there was an indictment of a guy who stole a bunch  

of really important AI code and went to China  with it. All he had to do to steal the code was  

copy it, put it into Apple Notes, and export  it as a PDF. That got past their monitoring. 

Google has the best security of any of the AI  labs probably, because they have the Google  

infrastructure. I would think of the security of  a startup. What does security of a startup look  

like? It's not that good. It's easy to steal. Even if that's the case, a lot of your  

post is making the argument for why we are going  to get the intelligence explosion. If we have  

somebody with the intuition of an Alec Radford to  come up with all these ideas, that intuition is  

extremely valuable and you can scale that up. If it's just intuition, then that's not going  

to be just in the code, right? Also because  of export controls, these countries are going  

to have slightly different hardware. You're going  to have to make different trade-offs and probably  

rewrite things to be compatible with that. Is it just a matter of getting the right pen  

drive and plugging it into the gigawatt  data center next to the Three Gorges Dam  

and then you're off to the races? There are a few different things,  

right? One threat model is just them stealing  the weights themselves. The weights one is  

particularly insane because they can just steal  the literal end product — just make a replica of  

the atomic bomb — and then they're ready to go.  That one is extremely important around the time  

we have AGI and superintelligence because China  can build a big cluster by default. We'd have a  

big lead because we have the better scientists,  but if we make the superintelligence and they  

just steal it, they're off to the races. Weights are a little bit less important right now  

because who cares if they steal the GPT-4 weights.  We still have to get started on weight security  

now because if we think there’s AGI by 2027, this  stuff is going to take a while. It's not just  

going to be like, "oh, we do some access control."  If you actually want to be resistant to Chinese  

espionage, it needs to be much more intense. The thing that people aren't paying enough  

attention to is the secrets. The compute stuff is  sexy, but people underrate the secrets. The half  

an order of magnitude a year is just by default,  sort of algorithmic progress. That's huge. If we  

have a few years of lead, by default, that's a  10-30x, 100x bigger cluster, if we protect them. 

There's this additional layer of the data wall.  We have to get through the data wall. That means  

we actually have to figure out some sort of  basic new paradigm. So it’s the “AlphaGo step  

two.” “AlphaGo step one” learns from human  imitation. “AlphaGo step two” is the kind  

of self-play RL thing that everyone's working  on right now. Maybe we're going to crack it.  

If China can't steal that, then they're stuck.  If they can steal it, they're off to the races. 

Whatever that thing is, can I literally write it  down on the back of a napkin? If it's that easy,  

then why is it so hard for them to figure  it out? If it's more about the intuitions,  

then don't you just have to hire Alec  Radford? What are you copying down? 

There are a few layers to this. At the top is the  fundamental approach. On pre-training it might be  

unsupervised learning, next token prediction,  training on the entire Internet. You actually  

get a lot of juice out of that already.  That one's very quick to communicate. 

Then there's a lot of details that matter,  and you were talking about this earlier.  

It's probably going to be somewhat obvious in  retrospect, or there's going to be some not too  

complicated thing that'll work, but there's  going to be a lot of details to get that. 

If that's true, then again, why do we  think that getting state-level security  

in these startups will prevent China  from catching up? It’s just like, "oh,  

we know some sort of self-play RL will  be required to get past the data wall." 

It's going to be solved by  2027, right? It's not that hard. 

The US, and the leading labs in the United States,  have this huge lead. By default, China actually  

has some good LLMs because they're just using open  source code, like Llama. People really underrate  

both the divergence on algorithmic progress and  the lead the US would have by default because  

all this stuff was published until recently. Look at Chinchilla Scaling laws, MoE papers,  

transformers. All that stuff was published.  That's why open source is good and why China  

can make some good models. Now, they're  not publishing it anymore. If we actually  

kept it secret, it would be a huge edge. To your point about tacit knowledge and  

Alec Radford, there's another layer at the  bottom that is something about large-scale  

engineering work to make these big training  runs work. That is a little bit more like  

tacit knowledge, but China will be able to figure  that out. It's engineering schlep, and they're  

going to figure out how to do it. Why can't they figure that out,  

but not how to get the RL thing working? I don't know. Germany during World War II  

went down the wrong path with heavy  water. There's an amazing anecdote  

in The Making of the Atomic Bomb about this. Secrecy was one of the most contentious issues  

early on. Leo Szilard really thought a nuclear  chain reaction and an atomic bomb were possible.  

He went around saying, "this is going to be of  enormous strategic and military importance."  

A lot of people didn't believe it or thought,  "maybe this is possible, but I'm going to act  

as though it's not, and science should be open." In the early days, there had been some incorrect  

measurements made on graphite as a moderator.  Germany thought graphite wasn't going to work,  

so they had to do heavy water. But then Enrico  Fermi made new measurements indicating that  

graphite would work. This was really important. Szilard assaulted Fermi with another secrecy  

appeal and Fermi was pissed off, throwing a  temper tantrum. He thought it was absurd, saying,  

"come on, this is crazy." But Szilard persisted,  and they roped in another guy, George Pegram.  

In the end, Fermi didn't publish it. That was just in time. Fermi not publishing  

meant that the Nazis didn't figure out graphite  would work. They went down the path of heavy  

water, which was the wrong path.  This is a key reason why the German  

project didn't work out. They were way behind. We face a similar situation now. Are we just  

going to instantly leak how to get past the data  wall and what the next paradigm is? Or are we not? 

The reason this would matter is if being  one year ahead would be a huge advantage.  

In the world where you deploy AI over time  they're just going to catch up anyway. 

I interviewed Richard  Rhodes, the guy who wrote The  

Making of the Atomic Bomb. One of the anecdotes  he had was when the Soviets realized America had  

the bomb. Obviously, we dropped it in Japan. Lavrentiy Beria — the guy who ran the NKVD,  

a famously ruthless and evil guy — goes to the  Soviet scientist who was running their version  

of the Manhattan Project. He says, "comrade, you  will get us the American bomb." The guy says,  

"well, listen, their implosion device actually is  not optimal. We should make it a different way."  

Beria says, "no, you will get us the American  bomb, or your family will be camp dust." 

The thing that's relevant about that anecdote is  that the Soviets would have had a better bomb if  

they hadn't copied the American design, at least  initially. That suggests something about history,  

not just for the Manhattan Project. There's  often this pattern of parallel invention  

because the tech tree implies that a certain  thing is next — in this case, a self-play  

RL — and people work on that and are going to  figure it out around the same time. There's not  

going to be that much gap in who gets it first. Famously, a bunch of people invented the light  

bulb around the same time. Is it the case  that it might be true but the one year  

or six months makes the difference? Two years makes all the difference. 

I don't know if it'll be two years though. If we lock down the labs, we have much better  

scientists. We're way ahead. It would  be two years. Even six months, a year,  

would make a huge difference. This gets back  to the intelligence explosion dynamics. A year  

might be the difference between a system that's  sort of human-level and a system that is vastly  

superhuman. It might be like five OOMs. Look at the current pace. Three years ago,  

on the math benchmark — these are really  difficult high school competition math  

problems — we were at a few percent, we couldn't  solve anything. Now it's solved. That was at  

the normal pace of AI progress. You didn't  have a billion superintelligent researchers. 

A year is a huge difference, particularly after  superintelligence. Once this is applied to many  

elements of R&D, you get an industrial  explosion with robots and other advanced  

technologies. A couple of years might  yield decades worth of progress. Again,  

it’s like the technological lead  the U.S. had in the first Gulf War,  

when the 20-30 years of technological lead  proved totally decisive. It really matters. 

Here’s another reason it really matters. Suppose  they steal the weights, suppose they steal the  

algorithms, and they're close on our tails.  Suppose we still pull out ahead. We're a  

little bit faster and we're three months ahead. The world in which we're really neck and neck,  

we only have a three-month lead, is incredibly  dangerous. We're in this feverish struggle where  

if they get ahead, they get to dominate, maybe  they get a decisive advantage. They're building  

clusters like crazy. They're willing to throw  all caution to the wind. We have to keep up. 

There are crazy new WMDs popping up. Then we're  going to be in the situation where it's crazy new  

military technology, crazy new WMDs, deterrence,  mutually assured destruction keeps changing  

every few weeks. It's a completely unstable,  volatile situation that is incredibly dangerous. 

So you have to look at it from the point of  view that these technologies are dangerous,  

from the alignment point of view. It might be  really important during the intelligence explosion  

to have a six-month wiggle room to be like, “look,  we're going to dedicate more compute to alignment  

during this period because we have to get it  right. We're feeling uneasy about how it's going.” 

One of the most important inputs to  whether we will destroy ourselves or  

whether we will get through this incredibly  crazy period is whether we have that buffer. 

Before we go further, it's very much worth noting  that almost nobody I talk to thinks about the  

geopolitical implications of AI. I have some  object-level disagreements that we'll get into,  

things I want to iron out. I  may not disagree in the end. 

The basic premise is that if you keep scaling, if  people realize that this is where intelligence is  

headed, it's not just going to be the same old  world. It won't just be about what model we're  

deploying tomorrow or what the latest thing  is. People on Twitter are like, "oh, GPT-4 is  

going to shake your expectations" or whatever. COVID is really interesting because when March  

2020 hit, it became clear to the world  — presidents, CEOs, media, the average  

person — that there are other things happening  in the world right now but the main thing we  

as a world are dealing with right now is COVID. Soon it will be AGI. This is the quiet period.  

Maybe you want to go on vacation. Maybe now is the  last time you can have some kids. My girlfriend  

sometimes complains when I’m off doing work that  I don’t spend enough time with her. She threatens  

to replace me with GPT-6 or whatever. I'm like,  “GPT-6 will also be too busy doing AI research.” 

Why aren't other people talking  about national security? 

I made this mistake with COVID. In February  of 2020, I thought it was going to sweep  

the world and all the hospitals would  collapse. It would be crazy, and then  

it'd be over. A lot of people thought this kind  of thing at the beginning of COVID. They shut  

down their office for a month or whatever. The thing I just really didn't price in was  

societal reaction. Within weeks, Congress  spent over 10% of GDP on COVID measures.  

The entire country was shut down. It was crazy.  I didn't sufficiently price it in with COVID. 

Why do people underrate it? Being in the  trenches actually gives you a less clear  

picture of the trend lines. You don’t have  to zoom out that much, only a few years. 

When you're in the trenches, you're trying  to get the next model to work. There's always  

something that's hard. You might underrate  algorithmic progress because you're like,  

"ah, things are hard right now," or "data wall"

Loading...

Loading video analysis...