Leopold Aschenbrenner — 2027 AGI, China/US super-intelligence race, & the return of history

By Dwarkesh Patel

Summary

## Key takeaways - **Trillion-Dollar Clusters by 2030**: AI training clusters are rapidly scaling, projected to cost hundreds of billions of dollars and consume over 20% of US electricity production by 2030, requiring millions of GPUs. [04:02:00], [04:18:00] - **AGI by 2027-2028 with Un-Unhobbling**: By 2027-2028, AI systems are expected to reach expert-level intelligence and function as 'drop-in remote workers' due to advancements in 'unhobbling' capabilities, moving beyond simple chatbots. [08:01:00], [08:46:00] - **Geopolitical Stakes: Superintelligence Race**: The race for superintelligence is a critical geopolitical factor, with potential for a few years' lead to be decisive in military competition, akin to the technological advantage in the first Gulf War. [25:06:00], [27:11:00] - **US vs. China: AI Cluster Location Matters**: Building AI clusters in authoritarian regimes like the UAE poses security risks, including potential theft of AGI or seizure of compute, making it crucial to keep these facilities in the US or allied democracies. [42:37:00], [43:07:00] - **Historical Parallels: WWII Mobilization**: The rapid industrial mobilization during WWII, despite challenges like labor strikes, offers a historical precedent for how a nation can rapidly scale production for critical technological advancements. [50:36:00], [51:01:00]

Topics Covered

AI Compute Growth: A Decade of Exponential Scaling
AGI by 2027: Beyond Chatbots to Remote Workers
Scaling AI from preschooler to remote worker by 2027
OpenAI's alleged plan: AGI bidding war between US, China, Russia
Stealing AI weights: The atomic bomb analogy

Full Transcript

What will be at stake will not just be cool products

But whether liberal democracy survives, Whether the CCP survives, what the world

order for the next century is going to be The CCP is going to have an all out

effort to infiltrate American AI labs Billions of dollars, thousands of people

CCP is going to try to out-build us. People don’t realize like how

intense state level espionage can be When we have like literal superintelligence

They can like Stuxnet the chinese data centers You really think that will be like

a private company And the government

wouldn’t be like “oh my god what is going on?” I do think it is incredibly important that these

clusters are in the united states I mean would you do the manhattan

project in the UAE? 2023 was the moment for me

when it went from AGI as this sort of theoretical, abstract thing, and you’d make the models to like,

I see it, I feel it. I can see the cluster where it’s trained on, like the rough combination of

algorithms, the people, like how it’s happening, and I think most of the world is not; most of

the people who feel it are like right here Today I’m chatting with my friend Leopold

Aschenbrenner. He grew up in Germany and graduated as valedictorian of Columbia when he was 19. After

that, he had a very interesting gap year which we’ll talk about. Then, he was on the OpenAI

superalignment team, may it rest in peace. Now, with some anchor investments — from

Patrick and John Collison, Daniel Gross, and Nat Friedman — he is launching an investment firm.

Leopold, you’re off to a slow start but life is long. I wouldn’t worry about it

too much. You’ll make up for it in due time. Thanks for coming on the podcast.

Thank you. I first discovered your podcast when your best episode had

a couple of hundred views. It’s been amazing to follow your trajectory. It’s a delight to be on.

In the Sholto and Trenton episode, I mentioned that a lot of the things I’ve learned about AI

I’ve learned from talking with them. The third, and probably most significant, part

of this triumvirate has been you. We’ll get all the stuff on the record now.

Here’s the first thing I want to get on the record. Tell me about the trillion-dollar cluster.

I should mention this for the context of the podcast. Today you’re releasing a series

called Situational Awareness. We’re going to get into it. First question about that is,

tell me about the trillion-dollar cluster. Unlike most things that have recently come

out of Silicon Valley, AI is an industrial process. The next model doesn’t just require

some code. It’s building a giant new cluster. It’s building giant new power plants. Pretty soon,

it’s going to involve building giant new fabs. Since ChatGPT, this extraordinary techno-capital

acceleration has been set into motion. Exactly a year ago today,

Nvidia had their first blockbuster earnings call. It went up 25% after hours and everyone was like,

"oh my God, AI is a thing." Within a year, Nvidia data center revenue has gone from a few billion a

quarter to $25 billion a quarter and continues to go up. Big Tech capex is skyrocketing.

It’s funny. There’s this crazy scramble going on, but in some sense it’s just the continuation of

straight lines on a graph. There’s this long-run trend of almost a decade of training compute

for the largest AI systems growing by about half an order of magnitude, 0.5 OOMs a year.

Just play that forward. GPT-4 was reported to have finished pre-training in 2022.

On SemiAnalysis, it was rumored to have a cluster size of about 25,000 A100s. That’s roughly a $500

million cluster. Very roughly, it’s 10 megawatts. Just play that forward half a year. By 2024,

that’s a cluster that’s 100 MW and 100,000 H100 equivalents with costs in the billions.

Play it forward two more years. By 2026, that’s a gigawatt, the size of a large nuclear reactor.

That’s like the power of the Hoover Dam. That costs tens of billions of dollars

and requires a million H100 equivalents. By 2028, that’s a cluster that’s ten GW.

That’s more power than most US states. That’s 10 million H100 equivalents,

costing hundreds of billions of dollars. By 2030, you get the trillion-dollar cluster

using 100 gigawatts, over 20% of US electricity production. That’s 100 million H100 equivalents.

That’s just the training cluster. There are more inference GPUs as well. Once there are products,

most of them will be inference GPUs. US power production has barely grown for decades.

Now we’re really in for a ride. When I had Zuck on the podcast,

he was claiming not a plateau per se, but that AI progress would be bottlenecked by this constraint

on energy. Specifically, he was like, "oh, gigawatt data centers, are we going to build

another Three Gorges Dam or something?" According to public reports, there

are companies planning things on the scale of a 1 GW data center. With a 10 GW data center,

who’s going to be able to build that? A 100 GW center is like a state project. Are you going

to pump that into one physical data center? How is it going to be possible? What is Zuck missing?

Six months ago, 10 GW was the talk of the town. Now, people have moved on. 10 GW is happening.

There’s The Information report on OpenAI and Microsoft planning a $100 billion cluster.

Is that 1 GW? Or is that 10 GW? I don’t know but if you try to map

out how expensive the 10 GW cluster would be, that’s a couple of hundred billion. It’s sort

of on that scale and they’re planning it. It’s not just my crazy take. AMD forecasted

a $400 billion AI accelerator market by 2027. AI accelerators are only part of the expenditures.

We’re very much on track for a $1 trillion of total AI investment by 2027. The $1 trillion

cluster will take a bit more acceleration. We saw how much ChatGPT unleashed. Every generation,

the models are going to be crazy and shift the Overton window.

Then the revenue comes in. These are forward-looking investments. The question is,

do they pay off? Let’s estimate the GPT-4 cluster at around $500 million. There’s a common mistake

people make, saying it was $100 million for GPT-4. That’s just the rental price. If you’re building

the biggest cluster, you have to build and pay for the whole cluster. You can’t just rent it

for three months. Can’t you?

Once you’re trying to get into the hundreds of billions, you have to get

to like $100 billion a year in revenue. This is where it gets really interesting for the

big tech companies because their revenues are on the order of hundreds of billions.

$10 billion is fine. It’ll pay off the 2024 size training cluster.

It’ll really be gangbusters with Big Tech when it costs $100 billion a year. The question is

how feasible is $100 billion a year from AI revenue? It’s a lot more than right

now. If you believe in the trajectory of AI systems as I do, it’s not that crazy.

There are like 300 million Microsoft Office subscribers. They have Copilot now. I don’t

know what they’re selling it for. Suppose you sold some AI add-on for $100/month to

a third of Microsoft Office subscribers. That’d be $100 billion right there. $100/month is a lot.

That’s a lot for a third of Office subscribers. For the average knowledge worker, it’s a few

hours of productivity a month. You have to be expecting pretty lame AI progress to

not hit a few hours of productivity a month. Sure, let’s assume all this. What happens in

the next few years? What can the AI trained on the 1 GW data center do? What about the

one on the 10 GW data center? Just map out the next few years of AI progress for me.

The 10 GW range is my best guess for when you get true AGI. Compute

is actually overrated. We’ll talk about that. By 2025-2026, we’re going to get models that

are basically smarter than most college graduates. A lot of the economic usefulness

depends on unhobbling. The models are smart but limited. There are chatbots and then there

are things like being able to use a computer and doing agentic long-horizon tasks.

By 2027-2028, it’ll get as smart as the smartest experts. The unhobbling trajectory points to it

becoming much more like an agent than a chatbot. It’ll almost be like a drop-in remote worker.

This is the question around the economic returns. Intermediate AI systems could be really useful,

but it takes a lot of schlep to integrate them. There’s a lot you

could do with GPT-4 or GPT-4.5 in a business use case, but you really have to change your

workflows to make them useful. It’s a very Tyler Cowen-esque take. It just takes a long time to

diffuse. We’re in SF and so we miss that. But in some sense, the way these systems

want to be integrated is where you get this kind of sonic boom. Intermediate systems could have

done it, but it would have taken schlep. Before you do the schlep to integrate them, you’ll get

much more powerful systems that are unhobbled. They’re agents, drop-in remote workers. You’re

interacting with them like coworkers. You can do Zoom calls and Slack with them. You

can ask them to do a project and they go off and write a first draft, get feedback, run tests on

their code, and come back. Then you can tell them more things. That’ll be much easier to integrate.

You might need a bit of overkill to make the transition easy and harvest the gains.

What do you mean by overkill? Overkill on model capabilities?

Yeah, the intermediate models could do it but it would take a lot of schlep. The drop-in

remote worker AGI can automate cognitive tasks. The intermediate models would have

made the software engineer more productive. But will the software engineer adopt it?

With the 2027 model, you just don’t need the software engineer. You can interact with it

like a software engineer, and it’ll do the work of a software engineer.

The last episode I did was with John Schulman. I was asking about this. We have these models

that have come out in the last year and none seem to have significantly surpassed GPT-4, certainly

not in an agentic way where they interact with you as a coworker. They’ll brag about

a few extra points on MMLU. Even with GPT-4o, it’s cool they can talk like Scarlett Johansson

(I guess not anymore) but it’s not like a coworker.

It makes sense why they’d be good at answering questions. They have data on how to complete

Wikipedia text. Where is the equivalent training data to understand a Zoom call? Referring back to

your point about a Slack conversation, how can it use context to figure out the

cohesive project you’re working on? Where is that training data coming from?

A key question for AI progress in the next few years is how hard it is to unlock the test time

compute overhang. Right now, GPT-4 can do a few hundred tokens with chain-of-thought. That’s

already a huge improvement. Before, answering a math question was just shotgun. If you tried to

answer a math question by saying the first thing that comes to mind, you wouldn’t be very good.

GPT-4 thinks for a few hundred tokens. If I think at 100 tokens a minute, that’s like

what GPT-4 does. It’s equivalent to me thinking for three minutes. Suppose GPT-4 could think

for millions of tokens. That’s +4 OOMs on test time compute on one problem. It can’t do it now.

It gets stuck. It writes some code. It can do a little bit of iterative debugging, but eventually

gets stuck and can’t correct its errors. There’s a big overhang. In other areas of ML,

there’s a great paper on AlphaGo, where you can trade off train time and test time compute. If

you can use 4 OOMs more test time compute, that’s almost like a 3.5x OOM bigger model.

Again, if it’s 100 tokens a minute, a few million tokens is a few months of working time. There’s a

lot more you can do in a few months of working time than just getting an answer right now. The

question is how hard is it to unlock that? In the short timelines AI world,

it’s not that hard. The reason it might not be that hard is that there are only a few extra

tokens to learn. You need to learn things like error correction tokens where you’re like “ah,

I made a mistake, let me think about that again.” You need to learn planning tokens where it’s like

“I’m going to start by making a plan. Here’s my plan of attack. I’m going to write a draft and

now I’m going to critique my draft and think about it.” These aren’t things that models can do now,

but the question is how hard it is. There are two paths to agents. When

Sholto was on your podcast, he talked about scaling leading to more nines of reliability.

That’s one path. The other path is the unhobbling path. It needs to learn this

System 2 process. If it can learn that, it can use millions of tokens and think coherently.

Here’s an analogy. When you drive, you’re on autopilot most of the time. Sometimes you

hit a weird construction zone or intersection. Sometimes my girlfriend is in the passenger seat

and I’m like “ah, be quiet for a moment, I need to figure out what’s going on.”

You go from autopilot to System 2 and you’re thinking about how to do it. Scaling

improves that System 1 autopilot. The brute force way to get to agents is improving that system. If

you can get System 2 working, you can quickly jump to something more agentified

and test time compute overhang is unlocked. What’s the reason to think this is an easy

win? Is there some loss function that easily enables System 2 thinking? There aren’t many

animals with System 2 thinking. It took a long time for evolution to give us System 2 thinking.

Pre-training has trillions of tokens of Internet text, I get that. You match that and get

all of these free training capabilities. What’s the reason to think this is an easy unhobbling?

First of all, pre-training is magical. It gave us a huge advantage for models of

general intelligence because you can predict the next token. But there’s a common misconception.

Predicting the next token lets the model learn incredibly rich representations. Representation

learning properties are the magic of deep learning. Rather than just learning

statistical artifacts, the models learn models of the world. That’s why they can generalize,

because it learned the right representations. When you train a model, you have this raw bundle

of capabilities that’s useful. The unhobbling from GPT-2 to GPT-4 took this raw mass and RLHF’d

it into a good chatbot. That was a huge win. In the original InstructGPT paper, comparing

RLHF vs. non-RLHF models it’s like a 100x model size win on human preference rating. It started

to be able to do simple chain-of-thought and so on. But you still have this advantage of all

these raw capabilities, and there’s still a huge amount you’re not doing with them.

This pre-training advantage is also the difference to robotics. People used to say it was a hardware

problem. The hardware is getting solved, but you don’t have this huge advantage

of bootstrapping with pre-training. You don’t have all this unsupervised learning you can do.

You have to start right away with RL self-play. The question is why RL and unhobbling might work.

Bootstrapping is an advantage. Your Twitter bio is being pre-trained. You’re not being

pre-trained anymore. You were pre-trained in grade school and high school. At some point,

you transition to being able to learn by yourself. You weren’t able to do it in elementary school.

High school is probably where it started and by college, if you’re smart, you can teach yourself.

Models are just starting to enter that regime. It’s a little bit more scaling and then you figure

out what goes on top. It won’t be trivial. A lot of deep learning seems obvious in retrospect.

There’s some obvious cluster of ideas. There are some ideas that seem a little dumb but

work. There are a lot of details you have to get right. We’re not going to get this next

month. It’ll take a while to figure out. A while for you is like half a year.

I don’t know, between six months and three years. But it's possible. It’s also very

related to the issue of the data wall. Here’s one intuition on learning by yourself. Pre-training

is kind of like the teacher lecturing to you and the words are flying by. You’re

just getting a little bit from it. That's not what you do when you learn

by yourself. When you learn by yourself, say you're reading a dense math textbook,

you're not just skimming through it once. Some wordcels just skim through and reread

and reread the math textbook and they memorize. What you do is you read a page, think about it,

have some internal monologue going on, and have a conversation with a study buddy. You try a

practice problem and fail a bunch of times. At some point it clicks, and you're like,

"this made sense." Then you read a few more pages. We've kind of bootstrapped our way to

just starting to be able to do that now with models. The question is,

can you use all this sort of self-play, synthetic data, RL to make that thing work. Right now,

there's in-context learning, which is super sample efficient. In the Gemini paper, it just

learns a language in-context. Pre-training, on the other hand, is not at all sample efficient.

What humans do is a kind of in-context learning. You read a book, think about it,

until eventually it clicks. Then you somehow distill that back into the weights. In some sense,

that's what RL is trying to do. RL is super finicky, but when it works it's kind of magical.

It's the best possible data for the model. It’s when you try a practice problem, fail,

and at some point figure it out in a way that makes sense to you. That's the best

possible data for you because it's the way you would have solved the problem,

rather than just reading how somebody else solved the problem, which doesn't initially click.

By the way, if that take sounds familiar it's because it was part of the question I asked

John Schulman. It goes to illustrate the thing I said in the intro. A bunch of the things I've

learned about AI comes from these dinners we do before the interviews with me, you, Sholto,

and a couple of others. We’re like, “what should I ask John Schulman, what I should ask Dario.”

Suppose this is the way things go and we get these unhobblings—

And the scaling. You have this baseline of this enormous force of scaling. GPT-2 was amazing. It

could string together plausible sentences, but it could barely do anything. It was kind of like a

preschooler. GPT-4, on the other hand, could write code and do hard math, like a smart high schooler.

This big jump in capability is explored in the essay series. I count the orders of magnitude

of compute and scale-up of algorithmic progress. Scaling alone by 2027-2028 is going to do another

preschool to high school jump on top of GPT-4. At a per token level, the models will be incredibly

smart. They'll gain more reliability, and with the addition of unhobblings, they'll look less like

chatbots and more like agents or drop-in remote workers. That's when things really get going.

I want to ask more questions about this but let's zoom out. Suppose you're right about this. This is

because of the 2027 cluster which is at 10 GW? 2028 is 10 GW. Maybe it'll be pulled forward.

Something like a 5.5 level by 2027, whatever that's called. What does the world look like at

that point? You have these remote workers who can replace people. What is the reaction to that in

terms of the economy, politics, and geopolitics? 2023 was a really interesting year to experience

as somebody who was really following the AI stuff. What were you doing in 2023?

OpenAI. When you were at OpenAI in 2023, it was a weird thing. You almost didn't want to talk about

AI or AGI. It was kind of a dirty word. Then in 2023, people saw ChatGPT for the first time,

they saw GPT-4, and it just exploded. It triggered huge capital expenditures

from all these firms and an explosion in revenue from Nvidia and so on. Things have been quiet

since then, but the next thing has been in the oven. I expect every generation these g-forces

to intensify. People will see the models. They won’t have counted the OOMs so they're

going to be surprised. It'll be kind of crazy. Revenue is going to accelerate. Suppose you

do hit $10 billion by the end of this year. Suppose it just continues on the trajectory

of revenue doubling every six months. It's not actually that far from $100 billion, maybe by

2026. At some point, what happened to Nvidia is going to happen to Big Tech. It's going to

explode. A lot more people are going to feel it. 2023 was the moment for me where AGI went from

being this theoretical, abstract thing. I see it, I feel it, and I see the path. I see where

it's going. I can see the cluster it's trained on, the rough combination of algorithms, the people,

how it's happening. Most of the world is not there yet. Most of the people who feel it are right

here. A lot more of the world is going to start feeling it. That's going to start being intense.

Right now, who feels it? You can go on Twitter and there are these GPT wrapper companies, like,

"whoa, GPT-4 is going to change our business." I'm so bearish on the wrapper companies because

they're betting on stagnation. They're betting that you have these intermediate

models and it takes so much schlep to integrate them. I'm really bearish because we're just

going to sonic boom you. We're going to get the unhobblings. We're going to get the drop-in remote

worker. Your stuff is not going to matter. So that's done. SF, this crowd, is paying

attention now. Who is going to be paying attention in 2026 and 2027? Presumably,

these are years in which hundreds of billions of capex is being spent on AI.

The national security state is going to start paying a lot of

attention. I hope we get to talk about that. Let’s talk about it now. What happens? What

is the immediate political reaction? Looking internationally, I don't know if Xi Jinping

sees the GPT-4 news and goes, "oh, my God, look at the MMLU score on that. What are

we doing about this, comrade?" So what happens when he sees a

remote worker replacement and it has $100 billion in revenue? There’s a lot of businesses

that have $100 billion in revenue, and people aren't staying up all night talking about it.

The question is, when does the CCP and when does the American national security establishment

realize that superintelligence is going to be absolutely decisive for national power? This

is where the intelligence explosion stuff comes in, which we should talk about later.

You have AGI. You have this drop-in remote worker that can replace you or me,

at least for remote jobs. Fairly quickly, you turn the crank one or two more times and you

get a thing that's smarter than humans. Even more than just turning the crank a

few more times, one of the first jobs to be automated is going to be that of an

AI researcher or engineer. If you can automate AI research, things can start going very fast.

Right now, there's already at this trend of 0.5 OOMs a year of algorithmic progress.

At some point, you're going to have GPU fleets in the tens of millions for inference or more.

You’re going to be able to run 100 million human equivalents of these automated AI researchers.

If you can do that, you can maybe do a decade's worth of ML research progress in a year. You

get some sort of 10x speed up. You can make the jump to AI that is vastly smarter than

humans within a year, a couple of years. That broadens from there. You have this

initial acceleration of AI research. You apply R&D to a bunch of other fields of technology. At

this point, you have a billion super intelligent researchers, engineers, technicians, everything.

They’re superbly competent at all things. They're going to figure out robotics. We

talked about that being a software problem. Well, you have a billion super smart — smarter than the

smartest human researchers — AI researchers in your cluster. At some point during the

intelligence explosion, they're going to be able to figure out robotics. Again, that’ll expand.

If you play this picture forward, it is fairly unlike any other technology. A couple years of

lead could be utterly decisive in say, military competition. If you look at the first Gulf War,

Western coalition forces had a 100:1 kill ratio. They had better sensors on their tanks. They had

better precision missiles, GPS, and stealth. They had maybe 20-30 years of technological

lead. They just completely crushed them. Superintelligence applied to broad fields

of R&D — and the industrial explosion that comes from it, robots making a lot of material — could

compress a century’s worth of technological progress into less than a decade. That means

that a couple years could mean a Gulf War 1-style advantage in military affairs. That’s including a

decisive advantage that even preempts nukes. How do you find nuclear stealth submarines?

Right now, you have sensors and software to detect where they are. You can do that. You

can find them. You have millions or billions of mosquito-sized drones, and they take out the

nuclear submarines. They take out the mobile launchers. They take out the other nukes.

It’s potentially enormously destabilizing and enormously important for national power.

At some point people are going to realize that. Not yet, but they will. When they do,

it won’t just be the AI researchers in charge. The CCP is going to have an all-out effort to

infiltrate American AI labs. It’ll involve billions of dollars, thousands of people,

and the full force of the Ministry of State Security. The CCP is going to try to outbuild us.

They added as much power in the last decade as an entire US electric grid. So the 100 GW cluster,

at least the 100 GW part of it, is going to be a lot easier for them to get. By this point,

it's going to be an extremely intense international competition.

One thing I'm uncertain about in this picture is if it’s like what you say, where it's more

of an explosion. You’ve developed an AGI. You make it into an AI researcher. For a while,

you're only using this ability to make hundreds of millions of other AI researchers. The thing

that comes out of this really frenetic process is a superintelligence. Then that goes out in the

world and is developing robotics and helping you take over other countries and whatever.

It's a little bit more gradual. It's an explosion that starts narrowly. It can do

cognitive jobs. The highest ROI use for cognitive jobs is to make the AI better

and solve robotics. As you solve robotics, now you can do R&D in biology and other technology.

Initially, you start with the factory workers. They're wearing the glasses and AirPods, and the

AI is instructing them because you can make any worker into a skilled technician. Then you have

the robots come in. So this process expands. Meta's Ray-Bans are a complement to Llama.

With the fabs in the US, their constraint is skilled workers. Even if you don't have robots,

you have the cognitive superintelligence and can kind of make them all into skilled

workers immediately. That's a very brief period. Robots will come soon.

Suppose this is actually how the tech progresses in the United States, maybe

because these companies are already generating hundreds of billions of dollars of AI revenue

At this point, companies are borrowing hundreds of billions or more in the corporate debt markets.

Why is a CCP bureaucrat, some 60-year-old guy, looking at this and going,

"oh, Copilot has gotten better now" and now— This is much more than Copilot has gotten

better now. It’d require

shifting the production of an entire country, dislocating energy that is otherwise being used

for consumer goods or something, and feeding all that into the data centers. Part of this whole

story is that you realize superintelligence is coming soon. You realize it and maybe I realize

it. I'm not sure how much I realize it. Will the national security apparatus in

the United States and the CCP realize it? This is a really key question. We have a few

more years of mid-game. We have a few more 2023s. That just starts updating more and

more people. The trend lines will become clear. You will see some amount of the COVID dynamic.

COVID in February of 2020 honestly feels a lot like today. It feels like this utterly crazy thing

is coming. You see the exponential and yet most of the world just doesn't realize it. The mayor of

New York is like, "go out to the shows," and "this is just Asian racism." At some point, people saw

it and then crazy, radical reactions came. By the way, what were you doing during

COVID? Was it your freshman or sophomore year? Junior.

Still, you were like a 17-year-old junior or something right? Did you short the market or

something? Did you sell at the right time? Yeah.

So there will be a March 2020 moment. You can make the analogy you make in the

series that this will cause a reaction like, “we have to do the Manhattan Project again for America

here.” I wonder what the politics of this will be like. The difference here is that it’s not

just like, “we need the bomb to beat the Nazis.” We'll be building this thing that makes all our

energy prices go up a bunch and it's automating a lot of our jobs. The climate change stuff people

are going to be like, "oh, my God, it's making climate change worse and it's helping Big Tech."

Politically, this doesn't seem like a dynamic where the national security apparatus or the

president is like, "we have to step on the gas here and make sure America wins."

Again, a lot of this really depends on how much people are feeling it and how much people

are seeing it. Our generation is so used to peace, American hegemony and nothing matters.

The historical norm is very much one of extremely intense and extraordinary things happening in the

world with intense international competition. There's a 20-year very unique period. In World

War II, something like 50% of GDP went to war production. The US borrowed over

60% of GDP. With Germany and Japan I think it was over 100%. In World War I, the UK,

France, and Germany all borrowed over 100% of GDP. Much more was on the line. People talk about

World War I being so destructive with 20 million Soviet soldiers dying and 20% of

Poland. That happened all the time. During the Seven Years' War something like 20-30%

of Prussia died. In the Thirty Years' War, up to 50% of a large swath of Germany died.

Will people see that the stakes here are really high and that history is actually

back? The American national security state thinks very seriously about stuff like this.

They think very seriously about competition with China. China very much thinks of itself on this

historical mission of the rejuvenation of the Chinese nation. They think a lot about national

power. They think a lot about the world order. There's a real question on timing. Do they

start taking this seriously when the intelligence explosion is already happening quite late. Do they

start taking this seriously two years earlier? That matters a lot for how things play out.

At some point they will and they will realize that this will be utterly decisive for not

just some proxy war but for major questions. Can liberal democracy continue to thrive? Can

the CCP continue existing? That will activate forces that we haven't seen in a long time.

The great power conflict definitely seems compelling. All kinds of different things

seem much more likely when you think from a historical perspective. You

zoom out beyond the liberal democracy that we’ve had the pleasure to live in America

for say the last 80 years. That includes things like dictatorships, war, famine, etc.

I was reading The Gulag Archipelago and one of the chapters begins with Solzhenitsyn saying how if

you had told a Russian citizen under the tsars that because of all these new technologies — we

wouldn’t see some Great Russian revival with Russia becoming a great power and the citizens

made wealthy — you would see tens of millions of Soviet citizens tortured by millions of beasts in

the worst possible ways. If you’d told them that that would be the result of the 20th century,

they wouldn’t have believed you. They’d have called you a slanderer.

The possibilities for dictatorship with superintelligence are even crazier as well.

Imagine you have a perfectly loyal military and security force. No more rebellions. No

more popular uprisings. You have perfect lie detection. You have surveillance of everybody.

You can perfectly figure out who's the dissenter and weed them out. No Gorbachev who had some

doubts about the system would have ever risen to power. No military coup would have ever happened.

There's a real way in which part of why things have worked out is that ideas can evolve. There's

some sense in which time heals a lot of wounds and solves a lot of debates. Throughout time, a lot of

people had really strong convictions, but a lot of those have been overturned over time because

there's been continued pluralism and evolution. Imagine applying a CCP-like approach to

truth where truth is what the party says. When you supercharge that with superintelligence, that

could just be locked in and enshrined for a long time. The possibilities are pretty terrifying.

To your point about history and living in America for the past eight years, this is one of the

things I took away from growing up in Germany. A lot of this stuff feels more visceral. My mother

grew up in the former East, my father in the former West. They met shortly after the Wall fell.

The end of the Cold War was this extremely pivotal moment for me because it's the reason I exist.

I grew up in Berlin with the former Wall. My great-grandmother, who is still alive,

is very important in my life. She was born in 1934 and grew up during the Nazi era. In World War II,

she saw the firebombing of Dresden from this country cottage where they were as

kids. Then she spent most of her life in the East German communist dictatorship.

She'd tell me about how Soviet tanks came when there was the popular uprising in 1954.

Her husband was telling her to get home really quickly and get off the streets.

She had a son who tried to ride a motorcycle across the Iron Curtain and then was put in

a Stasi prison for a while. Finally, when she's almost 60, it was the first time she

lived in a free country, and a wealthy country. When I was a kid, the thing she always really

didn't want me to do was get involved in politics. Joining a political party had

very bad connotations for her. She raised me when I was young. So it doesn't feel

that long ago. It feels very close. There’s one thing I wonder about when

we're talking today about the CCP. The people in China who will be doing their version of this

project will be AI researchers who are somewhat Westernized. They’ll either have gotten educated

in the West or have colleagues in the West. Are they going to sign up for the CCP

project that's going to hand over control to Xi Jinping? What's your sense of that? Fundamentally,

they're just people, right? Can't you convince them about the dangers of superintelligence?

Will they be in charge though? In some sense, this is also the case in the

US. This is like the rapidly depreciating influence of the lab employees. Right now,

the AI lab employees have so much power. You saw this November event. It’s so much power.

Both are going to get automated and they're going to lose all their power. It'll just be

a few people in charge with their armies of automated AIs. It’s also the politicians and

the generals and the national security state. There are some of these classic

scenes from the Oppenheimer movie. The scientists built it and then the bomb was

shipped away and it was out of their hands. It's good for lab employees to be aware of

this. You have a lot of power now, but maybe not for that long. Use it wisely.

I do think they would benefit from some more organs of representative democracy.

What do you mean by that? In the OpenAI board events,

employee power is exercised in a very direct democracy way. How some of that went about

really highlighted the benefits of representative democracy and having some deliberative organs.

Interesting. Let's go back to the $100 billion revenue question. The companies are trying to

build clusters that are this big. Where are they building it? Say it's the amount

of energy that would be required for a small or medium-sized US state. Does Colorado then get no

power because it's happening in the United States? Is it happening somewhere else?

This is the thing that I always find funny, when you talk about Colorado getting no power.

The easy way to get the power would be to displace less economically useful stuff.

Buy up the aluminum smelting plant that has a gigawatt. We're going to replace it with the

data center because that's important. That's not actually happening because a lot of these power

contracts are really locked in long-term. Also, people don't like things like this.

In practice what it requires, at least right now, is building new power. That

might change. That's when things get really interesting, when it's like, “no, we're just

dedicating all of the power to the AGI.” So right now it's building new power. 10

GW is quite doable. It's like a few percent of US natural gas production. When you have the 10 GW

training cluster, you have a lot more inference. 100 gigawatts is where it starts getting pretty

wild. That's over 20% of US electricity production. It's pretty doable, especially

if you're willing to go for natural gas. It is incredibly important that these

clusters are in the United States. Why does it matter that it's in the US?

There are some people who are trying to build clusters elsewhere. There's a lot of

free-flowing Middle Eastern money that's trying to build clusters elsewhere. This comes back to

the national security question we talked about. Would you do the Manhattan Project in the UAE?

You can put the clusters in the US and you can put them in allied democracies. Once you put them

in authoritarian dictatorships, you create this irreversible security risk. Once the cluster is

there, it's much easier for them to exfiltrate the weights. They can literally steal the AGI,

the superintelligence. It’s like they got a direct copy of the atomic bomb. It makes it much easier

for them. They have weird ties to China. They can ship that to China. That's a huge risk.

Another thing is they can just seize the compute. The issue here is people right now are thinking

of this as ChatGPT, Big Tech product clusters. The clusters being planned now,

three to five years out, may well be the AGI, superintelligence clusters. When things get hot,

they might just seize the compute. Suppose we put 25% of the compute

capacity in these Middle Eastern dictatorships. Say they seize that. Now it's a ratio of compute

of 3:1. We still have more, but even with only 25% of compute there it starts getting

pretty hairy. 3:1 is not that great of a ratio. You can do a lot with that amount of compute.

Say they don't actually do this. Even if they don't actually seize the compute,

even if they actually don't steal the weights, there's just a lot of implicit leverage you

get. They get seats at the AGI table. I don't know why we're giving authoritarian

dictatorships the seat at the AGI table. There's going to be a lot of compute

in the Middle East if these deals go through. First of all, who is it? Is it just every

single Big Tech company trying to figure it out over there?

It’s not everybody, some. There are reports, I think

Microsoft. We'll get into it. So say the UAE gets a bunch of

compute because we're building the clusters there. Let's say they have 25% of the compute. Why does a

compute ratio matter? If it's about them being able to kick off the intelligence explosion,

isn't it just some threshold where you have 100 million AI researchers or you don't?

You can do a lot with 33 million extremely smart scientists. That might be enough

to build the crazy bio weapons. Then you're in a situation where they stole

the weights and they seized the compute. Now they can make these crazy new WMDs that

will be possible with superintelligence. Now you've just proliferated the stuff

that’ll be really powerful. Also, 3x on compute isn't actually that much.

The riskiest situation is if we're in some sort of really neck and neck, feverish international

struggle. Say we're really close with the CCP and we're months apart. The situation we want

to be in — and could be in if we play our cards right — is a little bit more like the US building

the atomic bomb versus the German project years behind. If we have that, we just have so much

more wiggle room to get safety right. We're going to be building these crazy

new WMDs that completely undermine nuclear deterrence. That's so much easier to deal

with if you don't have somebody right on your tails and you have to go at maximum speed.

You have no wiggle room. You're worried that at any time they can overtake you.

They can also just try to outbuild you. They might literally win. China might literally win

if they can steal the weights, because they can outbuild you. They may have less caution,

both good and bad caution in terms of whatever unreasonable regulations we have.

If you're in this really tight race, this sort of feverish struggle, that's when

there's the greatest peril of self-destruction. Presumably the companies that are trying to build

clusters in the Middle East realize this. Is it just that it’s impossible to do this

in America? If you want American companies to do this at all, do you have to do it in

the Middle East or not at all? Then you just have China build a Three Gorges Dam cluster.

There’s a few reasons. People aren’t thinking about this as the

AGI superintelligence cluster. They’re just like, “ah, cool clusters for my ChatGPT.”

If you’re doing ones for inference, presumably you could spread them out across the country or

something. The ones they’re building, they’re going to do one training run

in a single thing they’re building. It’s just hard to distinguish between

inference and training compute. People can claim it’s inference compute,

but they might realize that actually this is going to be useful for training compute too.

Because of synthetic data and things like that? RL looks a lot like inference, for example. Or

you just end up connecting them in time. It's a lot like raw materials. It's like placing your

uranium refinement facilities there. So there are a few reasons. One,

they don't think about this as the AGI cluster. Another is just that there’s

easy money coming from the Middle East. Another one is that some people think

that you can't do it in the US. We actually face a real system competition here. Some

people think that only autocracies that can do this with top-down mobilization of industrial

capacity and the power to get stuff done fast. Again, this is the sort of thing we haven't faced

in a while. But during the Cold War, there was this intense system competition. East vs. West

Germany was this. It was West Germany as liberal democratic capitalism vs. state-planned communism.

Now it's obvious that the free world would win. But even as late as 1961,

Paul Samuelson was predicting that the Soviet Union would outgrow the United States because

they were able to mobilize industry better. So there are some people who shitpost about

loving America, but then in private they're betting against America. They're betting against

the liberal order. Basically, it's just a bad bet. This stuff is really possible in the US.

To make it possible in the US, to some degree we have to get our act together. There are basically

two paths to doing it in the US. One is you just have to be willing to do natural gas. There's

ample natural gas. You put your cluster in West Texas. You put it in southwest Pennsylvania by

the Marcellus Shale. The 10 GW cluster is super easy. The 100 GW cluster is also pretty doable.

I think natural gas production in the United States has almost doubled in a decade. You do

that one more time over the next seven years, you could power multiple trillion-dollar data centers.

The issue there is that a lot of people made these climate commitments, not just the government. It's

actually the private companies themselves, Microsoft, Amazon, etc., that have made

these climate commitments. So they won't do natural gas. I admire the climate commitments,

but at some point the national interest and national security is more important.

The other path is doing green energy megaprojects. You do solar and batteries

and SMRs and geothermal. If we want to do that, there needs to be a broad deregulatory push.

You can't have permitting take a decade. You have to reform FERC. You have to have blanket

NEPA exemptions for this stuff. There are inane state-level regulations. You can

build the solar panels and batteries next to your data center, but it'll still take years because

you actually have to hook it up to the state electrical grid. You have to use governmental

powers to create rights of way to have multiple clusters and connect them and have the cables.

Ideally we do both. Ideally we do natural gas and the broader deregulatory green agenda.

We have to do at least one. Then this stuff is possible in the United States.

Before the conversation I was reading a good book about World War II industrial mobilization in the

United States called Freedom's Forge. I’m thinking back on that period, especially in the context of

reading Patrick Collison’s Fast and the progress study stuff. There’s this narrative out there that

we had state capacity back then and people just got shit done but that now it's a clusterfuck.

It wasn’t at all the case! It was really interesting. You had

people from the Detroit auto industry side, like William Knudsen, who were running mobilization for

the United States. They were extremely competent. At the same time you had labor organization and

agitation, which is very analogous to the climate change pledges and concerns we have today.

They would literally have these strikes, into 1941, costing millions of man-hours

worth of time when we're trying to make tens of thousands of planes a month. They would just

debilitate factories for trivial concessions from capital that were pennies on the dollar.

There were concerns that the auto companies were trying to use the pretext of a potential

war to prevent paying labor the money it deserves. So with what climate change is today,

you might think, "ah, America's fucked. We're not going to be able to build this shit if you

look at NEPA or something,” I didn't realize how debilitating labor was in World War II.

It wasn’ just that. Before 1939, the American military was in total shambles. You read about

it and it reads a little bit like the German military today. Military expenditures were I think

less than 2% of GDP. All the European countries had gone, even in peacetime, above 10% of GDP.

It was rapid mobilization starting from nothing. We were making no planes.

There were no military contracts. Everything had been starved during the Great Depression.

But there was this latent capacity. At some point the United States got its act together.

This applies the other way around too with China. Sometimes people count them out a little bit with

the export controls and so on. They're able to make 7-nanometer chips now. There's a question

of how many they could make. There's at least a possibility that they're going to mature that

ability and make a lot of 7-nanometer chips. There's a lot of latent industrial capacity

in China. They are able to build a lot of power fast. Maybe that isn't activated for AI yet. At

some point, the same way the United States and a lot of people in the US government are going

to wake up, the CCP is going to wake up. Companies realize that scaling is a thing.

Obviously their whole plans are contingent on scaling. So they understand that in 2028 we're

going to be building 10 GW data centers. At that point, the people who can keep up

are Big Tech, potentially at the edge of their capabilities, sovereign wealth fund-funded things,

and also major countries like America and China. What's their plan? With the AI labs, what's their

plan given this landscape? Do they not want the leverage of being in the United States?

The Middle East does offer capital, but America has plenty of capital. We have trillion-dollar

companies. What are these Middle Eastern states? They're kind of like trillion-dollar

oil companies. We have trillion-dollar companies and very deep financial markets. Microsoft could

issue hundreds of billions of dollars of bonds and they can pay for these clusters.

Another argument being made, which is worth taking seriously, is that if we don't work

with the UAE or with these Middle Eastern countries, they're just going to go to China.

They're going to build data centers and pour money into AI regardless. If we don't

work with them, they'll just support China. There's some merit to the argument in the

sense that we should be doing benefit-sharing with them. On the road to AGI, there should be

two tiers of coalitions. There should be a narrow coalition of democracies that's developing AGI.

Then there should be a broader coalition of other countries, including dictatorships, and we should

offer them some of the benefits of AI. If the UAE wants to use AI products,

run Meta recommendation engines, or run the last-generation models,

that's fine. By default, they just wouldn't have had this seat at the AGI table. So they

have some money, but a lot of people have money. The only reason they're getting this seat at the

AGI table and giving these dictators this leverage over this extremely important national security

technology, is because we're getting them excited and offering it to them.

Who specifically is doing this? Who are the companies who are going there to fundraise?

It’s been reported that Sam Altman is trying to raise $7 trillion or whatever for a chip

project. It's unclear how many of the clusters will be there, but definitely stuff is happening.

There’s another reason I'm a little suspicious of this argument that if the US doesn't work

with them, they'll go to China. I've heard from multiple people — not from my time at OpenAI,

and I haven't seen the memo — that at some point several years ago, OpenAI leadership

had laid out a plan to fund and sell AGI by starting a bidding war between the governments

of the United States, China, and Russia. It's surprising to me that they're willing to

sell AGI to the Chinese and Russian governments. There's also something that feels eerily familiar

about starting this bidding war and then playing them off each other, saying, "well,

if you don't do this, China will do it." Interesting. That's pretty fucked up.

Suppose you're right. We ended up in this place because, as one of our friends put it, the Middle

East has billions or trillions of dollars up for persuasion like no other place in the world.

With little accountability. There’s no Microsoft board. It's only the dictator.

Let's say you're right, that you shouldn't have gotten them excited about AGI in the

first place. Now we're in a place where they are excited about AGI and they're like, "fuck,

we want to have GPT-5 while you're going to be off building superintelligence. This Atoms for Peace

thing doesn't work for us." If you're in this place, don't they already have the leverage?

The UAE on its own is not competitive. They're already export-controlled. You're not supposed

to ship Nvidia chips over there. It's not like they have any of the leading

AI labs. They have money, but it's hard to just translate money into progress.

But I want to go back to other things you've been saying in laying out your vision. There's

this almost industrial process of putting in the compute and algorithms, adding that up,

and getting AGI on the other end. If it's something more like that, then the case for

somebody being able to catch up rapidly seems more compelling than if it's some bespoke...

Well, if they can steal the algorithms and if they can steal the weights, that’s really important.

How easy would it be for an actor to steal the things that are not the trivial released

things, like Scarlett Johansson's voice, but the RL things we're talking about, the unhobblings?

It’s all extremely easy. They don’t make the claim that it’s hard. DeepMind put out their Frontier

Safety Framework and they lay out security levels, zero to four. Four is resistant to state activity.

They say, we're at level zero. Just recently, there was an indictment of a guy who stole a bunch

of really important AI code and went to China with it. All he had to do to steal the code was

copy it, put it into Apple Notes, and export it as a PDF. That got past their monitoring.

Google has the best security of any of the AI labs probably, because they have the Google

infrastructure. I would think of the security of a startup. What does security of a startup look

like? It's not that good. It's easy to steal. Even if that's the case, a lot of your

post is making the argument for why we are going to get the intelligence explosion. If we have

somebody with the intuition of an Alec Radford to come up with all these ideas, that intuition is

extremely valuable and you can scale that up. If it's just intuition, then that's not going

to be just in the code, right? Also because of export controls, these countries are going

to have slightly different hardware. You're going to have to make different trade-offs and probably

rewrite things to be compatible with that. Is it just a matter of getting the right pen

drive and plugging it into the gigawatt data center next to the Three Gorges Dam

and then you're off to the races? There are a few different things,

right? One threat model is just them stealing the weights themselves. The weights one is

particularly insane because they can just steal the literal end product — just make a replica of

the atomic bomb — and then they're ready to go. That one is extremely important around the time

we have AGI and superintelligence because China can build a big cluster by default. We'd have a

big lead because we have the better scientists, but if we make the superintelligence and they

just steal it, they're off to the races. Weights are a little bit less important right now

because who cares if they steal the GPT-4 weights. We still have to get started on weight security

now because if we think there’s AGI by 2027, this stuff is going to take a while. It's not just

going to be like, "oh, we do some access control." If you actually want to be resistant to Chinese

espionage, it needs to be much more intense. The thing that people aren't paying enough

attention to is the secrets. The compute stuff is sexy, but people underrate the secrets. The half

an order of magnitude a year is just by default, sort of algorithmic progress. That's huge. If we

have a few years of lead, by default, that's a 10-30x, 100x bigger cluster, if we protect them.

There's this additional layer of the data wall. We have to get through the data wall. That means

we actually have to figure out some sort of basic new paradigm. So it’s the “AlphaGo step

two.” “AlphaGo step one” learns from human imitation. “AlphaGo step two” is the kind

of self-play RL thing that everyone's working on right now. Maybe we're going to crack it.

If China can't steal that, then they're stuck. If they can steal it, they're off to the races.

Whatever that thing is, can I literally write it down on the back of a napkin? If it's that easy,

then why is it so hard for them to figure it out? If it's more about the intuitions,

then don't you just have to hire Alec Radford? What are you copying down?

There are a few layers to this. At the top is the fundamental approach. On pre-training it might be

unsupervised learning, next token prediction, training on the entire Internet. You actually

get a lot of juice out of that already. That one's very quick to communicate.

Then there's a lot of details that matter, and you were talking about this earlier.

It's probably going to be somewhat obvious in retrospect, or there's going to be some not too

complicated thing that'll work, but there's going to be a lot of details to get that.

If that's true, then again, why do we think that getting state-level security

in these startups will prevent China from catching up? It’s just like, "oh,

we know some sort of self-play RL will be required to get past the data wall."

It's going to be solved by 2027, right? It's not that hard.

The US, and the leading labs in the United States, have this huge lead. By default, China actually

has some good LLMs because they're just using open source code, like Llama. People really underrate

both the divergence on algorithmic progress and the lead the US would have by default because

all this stuff was published until recently. Look at Chinchilla Scaling laws, MoE papers,

transformers. All that stuff was published. That's why open source is good and why China

can make some good models. Now, they're not publishing it anymore. If we actually

kept it secret, it would be a huge edge. To your point about tacit knowledge and

Alec Radford, there's another layer at the bottom that is something about large-scale

engineering work to make these big training runs work. That is a little bit more like

tacit knowledge, but China will be able to figure that out. It's engineering schlep, and they're

going to figure out how to do it. Why can't they figure that out,

but not how to get the RL thing working? I don't know. Germany during World War II

went down the wrong path with heavy water. There's an amazing anecdote

in The Making of the Atomic Bomb about this. Secrecy was one of the most contentious issues

early on. Leo Szilard really thought a nuclear chain reaction and an atomic bomb were possible.

He went around saying, "this is going to be of enormous strategic and military importance."

A lot of people didn't believe it or thought, "maybe this is possible, but I'm going to act

as though it's not, and science should be open." In the early days, there had been some incorrect

measurements made on graphite as a moderator. Germany thought graphite wasn't going to work,

so they had to do heavy water. But then Enrico Fermi made new measurements indicating that

graphite would work. This was really important. Szilard assaulted Fermi with another secrecy

appeal and Fermi was pissed off, throwing a temper tantrum. He thought it was absurd, saying,

"come on, this is crazy." But Szilard persisted, and they roped in another guy, George Pegram.

In the end, Fermi didn't publish it. That was just in time. Fermi not publishing

meant that the Nazis didn't figure out graphite would work. They went down the path of heavy

water, which was the wrong path. This is a key reason why the German

project didn't work out. They were way behind. We face a similar situation now. Are we just

going to instantly leak how to get past the data wall and what the next paradigm is? Or are we not?

The reason this would matter is if being one year ahead would be a huge advantage.

In the world where you deploy AI over time they're just going to catch up anyway.

I interviewed Richard Rhodes, the guy who wrote The

Making of the Atomic Bomb. One of the anecdotes he had was when the Soviets realized America had

the bomb. Obviously, we dropped it in Japan. Lavrentiy Beria — the guy who ran the NKVD,

a famously ruthless and evil guy — goes to the Soviet scientist who was running their version

of the Manhattan Project. He says, "comrade, you will get us the American bomb." The guy says,

"well, listen, their implosion device actually is not optimal. We should make it a different way."

Beria says, "no, you will get us the American bomb, or your family will be camp dust."

The thing that's relevant about that anecdote is that the Soviets would have had a better bomb if

they hadn't copied the American design, at least initially. That suggests something about history,

not just for the Manhattan Project. There's often this pattern of parallel invention

because the tech tree implies that a certain thing is next — in this case, a self-play

RL — and people work on that and are going to figure it out around the same time. There's not

going to be that much gap in who gets it first. Famously, a bunch of people invented the light

bulb around the same time. Is it the case that it might be true but the one year

or six months makes the difference? Two years makes all the difference.

I don't know if it'll be two years though. If we lock down the labs, we have much better

scientists. We're way ahead. It would be two years. Even six months, a year,

would make a huge difference. This gets back to the intelligence explosion dynamics. A year

might be the difference between a system that's sort of human-level and a system that is vastly

superhuman. It might be like five OOMs. Look at the current pace. Three years ago,

on the math benchmark — these are really difficult high school competition math

problems — we were at a few percent, we couldn't solve anything. Now it's solved. That was at

the normal pace of AI progress. You didn't have a billion superintelligent researchers.

A year is a huge difference, particularly after superintelligence. Once this is applied to many

elements of R&D, you get an industrial explosion with robots and other advanced

technologies. A couple of years might yield decades worth of progress. Again,

it’s like the technological lead the U.S. had in the first Gulf War,

when the 20-30 years of technological lead proved totally decisive. It really matters.

Here’s another reason it really matters. Suppose they steal the weights, suppose they steal the

algorithms, and they're close on our tails. Suppose we still pull out ahead. We're a

little bit faster and we're three months ahead. The world in which we're really neck and neck,

we only have a three-month lead, is incredibly dangerous. We're in this feverish struggle where

if they get ahead, they get to dominate, maybe they get a decisive advantage. They're building

clusters like crazy. They're willing to throw all caution to the wind. We have to keep up.

There are crazy new WMDs popping up. Then we're going to be in the situation where it's crazy new

military technology, crazy new WMDs, deterrence, mutually assured destruction keeps changing

every few weeks. It's a completely unstable, volatile situation that is incredibly dangerous.

So you have to look at it from the point of view that these technologies are dangerous,

from the alignment point of view. It might be really important during the intelligence explosion

to have a six-month wiggle room to be like, “look, we're going to dedicate more compute to alignment

during this period because we have to get it right. We're feeling uneasy about how it's going.”

One of the most important inputs to whether we will destroy ourselves or

whether we will get through this incredibly crazy period is whether we have that buffer.

Before we go further, it's very much worth noting that almost nobody I talk to thinks about the

geopolitical implications of AI. I have some object-level disagreements that we'll get into,

things I want to iron out. I may not disagree in the end.

The basic premise is that if you keep scaling, if people realize that this is where intelligence is

headed, it's not just going to be the same old world. It won't just be about what model we're

deploying tomorrow or what the latest thing is. People on Twitter are like, "oh, GPT-4 is

going to shake your expectations" or whatever. COVID is really interesting because when March

2020 hit, it became clear to the world — presidents, CEOs, media, the average

person — that there are other things happening in the world right now but the main thing we

as a world are dealing with right now is COVID. Soon it will be AGI. This is the quiet period.

Maybe you want to go on vacation. Maybe now is the last time you can have some kids. My girlfriend

sometimes complains when I’m off doing work that I don’t spend enough time with her. She threatens

to replace me with GPT-6 or whatever. I'm like, “GPT-6 will also be too busy doing AI research.”

Why aren't other people talking about national security?

I made this mistake with COVID. In February of 2020, I thought it was going to sweep

the world and all the hospitals would collapse. It would be crazy, and then

it'd be over. A lot of people thought this kind of thing at the beginning of COVID. They shut

down their office for a month or whatever. The thing I just really didn't price in was

societal reaction. Within weeks, Congress spent over 10% of GDP on COVID measures.

The entire country was shut down. It was crazy. I didn't sufficiently price it in with COVID.

Why do people underrate it? Being in the trenches actually gives you a less clear

picture of the trend lines. You don’t have to zoom out that much, only a few years.

When you're in the trenches, you're trying to get the next model to work. There's always

something that's hard. You might underrate algorithmic progress because you're like,

"ah, things are hard right now," or "data wall"

Loading...

Loading video analysis...