Leopold Aschenbrenner — 2027 AGI, China/US super-intelligence race, & the return of history
By Dwarkesh Patel
Summary
## Key takeaways - **Trillion-Dollar Clusters by 2030**: AI training clusters are rapidly scaling, projected to cost hundreds of billions of dollars and consume over 20% of US electricity production by 2030, requiring millions of GPUs. [04:02:00], [04:18:00] - **AGI by 2027-2028 with Un-Unhobbling**: By 2027-2028, AI systems are expected to reach expert-level intelligence and function as 'drop-in remote workers' due to advancements in 'unhobbling' capabilities, moving beyond simple chatbots. [08:01:00], [08:46:00] - **Geopolitical Stakes: Superintelligence Race**: The race for superintelligence is a critical geopolitical factor, with potential for a few years' lead to be decisive in military competition, akin to the technological advantage in the first Gulf War. [25:06:00], [27:11:00] - **US vs. China: AI Cluster Location Matters**: Building AI clusters in authoritarian regimes like the UAE poses security risks, including potential theft of AGI or seizure of compute, making it crucial to keep these facilities in the US or allied democracies. [42:37:00], [43:07:00] - **Historical Parallels: WWII Mobilization**: The rapid industrial mobilization during WWII, despite challenges like labor strikes, offers a historical precedent for how a nation can rapidly scale production for critical technological advancements. [50:36:00], [51:01:00]
Topics Covered
- AI Compute Growth: A Decade of Exponential Scaling
- AGI by 2027: Beyond Chatbots to Remote Workers
- Scaling AI from preschooler to remote worker by 2027
- OpenAI's alleged plan: AGI bidding war between US, China, Russia
- Stealing AI weights: The atomic bomb analogy
Full Transcript
What will be at stake will not just be cool products
But whether liberal democracy survives, Whether the CCP survives, what the world
order for the next century is going to be The CCP is going to have an all out
effort to infiltrate American AI labs Billions of dollars, thousands of people
CCP is going to try to out-build us. People don’t realize like how
intense state level espionage can be When we have like literal superintelligence
They can like Stuxnet the chinese data centers You really think that will be like
a private company And the government
wouldn’t be like “oh my god what is going on?” I do think it is incredibly important that these
clusters are in the united states I mean would you do the manhattan
project in the UAE? 2023 was the moment for me
when it went from AGI as this sort of theoretical, abstract thing, and you’d make the models to like,
I see it, I feel it. I can see the cluster where it’s trained on, like the rough combination of
algorithms, the people, like how it’s happening, and I think most of the world is not; most of
the people who feel it are like right here Today I’m chatting with my friend Leopold
Aschenbrenner. He grew up in Germany and graduated as valedictorian of Columbia when he was 19. After
that, he had a very interesting gap year which we’ll talk about. Then, he was on the OpenAI
superalignment team, may it rest in peace. Now, with some anchor investments — from
Patrick and John Collison, Daniel Gross, and Nat Friedman — he is launching an investment firm.
Leopold, you’re off to a slow start but life is long. I wouldn’t worry about it
too much. You’ll make up for it in due time. Thanks for coming on the podcast.
Thank you. I first discovered your podcast when your best episode had
a couple of hundred views. It’s been amazing to follow your trajectory. It’s a delight to be on.
In the Sholto and Trenton episode, I mentioned that a lot of the things I’ve learned about AI
I’ve learned from talking with them. The third, and probably most significant, part
of this triumvirate has been you. We’ll get all the stuff on the record now.
Here’s the first thing I want to get on the record. Tell me about the trillion-dollar cluster.
I should mention this for the context of the podcast. Today you’re releasing a series
called Situational Awareness. We’re going to get into it. First question about that is,
tell me about the trillion-dollar cluster. Unlike most things that have recently come
out of Silicon Valley, AI is an industrial process. The next model doesn’t just require
some code. It’s building a giant new cluster. It’s building giant new power plants. Pretty soon,
it’s going to involve building giant new fabs. Since ChatGPT, this extraordinary techno-capital
acceleration has been set into motion. Exactly a year ago today,
Nvidia had their first blockbuster earnings call. It went up 25% after hours and everyone was like,
"oh my God, AI is a thing." Within a year, Nvidia data center revenue has gone from a few billion a
quarter to $25 billion a quarter and continues to go up. Big Tech capex is skyrocketing.
It’s funny. There’s this crazy scramble going on, but in some sense it’s just the continuation of
straight lines on a graph. There’s this long-run trend of almost a decade of training compute
for the largest AI systems growing by about half an order of magnitude, 0.5 OOMs a year.
Just play that forward. GPT-4 was reported to have finished pre-training in 2022.
On SemiAnalysis, it was rumored to have a cluster size of about 25,000 A100s. That’s roughly a $500
million cluster. Very roughly, it’s 10 megawatts. Just play that forward half a year. By 2024,
that’s a cluster that’s 100 MW and 100,000 H100 equivalents with costs in the billions.
Play it forward two more years. By 2026, that’s a gigawatt, the size of a large nuclear reactor.
That’s like the power of the Hoover Dam. That costs tens of billions of dollars
and requires a million H100 equivalents. By 2028, that’s a cluster that’s ten GW.
That’s more power than most US states. That’s 10 million H100 equivalents,
costing hundreds of billions of dollars. By 2030, you get the trillion-dollar cluster
using 100 gigawatts, over 20% of US electricity production. That’s 100 million H100 equivalents.
That’s just the training cluster. There are more inference GPUs as well. Once there are products,
most of them will be inference GPUs. US power production has barely grown for decades.
Now we’re really in for a ride. When I had Zuck on the podcast,
he was claiming not a plateau per se, but that AI progress would be bottlenecked by this constraint
on energy. Specifically, he was like, "oh, gigawatt data centers, are we going to build
another Three Gorges Dam or something?" According to public reports, there
are companies planning things on the scale of a 1 GW data center. With a 10 GW data center,
who’s going to be able to build that? A 100 GW center is like a state project. Are you going
to pump that into one physical data center? How is it going to be possible? What is Zuck missing?
Six months ago, 10 GW was the talk of the town. Now, people have moved on. 10 GW is happening.
There’s The Information report on OpenAI and Microsoft planning a $100 billion cluster.
Is that 1 GW? Or is that 10 GW? I don’t know but if you try to map
out how expensive the 10 GW cluster would be, that’s a couple of hundred billion. It’s sort
of on that scale and they’re planning it. It’s not just my crazy take. AMD forecasted
a $400 billion AI accelerator market by 2027. AI accelerators are only part of the expenditures.
We’re very much on track for a $1 trillion of total AI investment by 2027. The $1 trillion
cluster will take a bit more acceleration. We saw how much ChatGPT unleashed. Every generation,
the models are going to be crazy and shift the Overton window.
Then the revenue comes in. These are forward-looking investments. The question is,
do they pay off? Let’s estimate the GPT-4 cluster at around $500 million. There’s a common mistake
people make, saying it was $100 million for GPT-4. That’s just the rental price. If you’re building
the biggest cluster, you have to build and pay for the whole cluster. You can’t just rent it
for three months. Can’t you?
Once you’re trying to get into the hundreds of billions, you have to get
to like $100 billion a year in revenue. This is where it gets really interesting for the
big tech companies because their revenues are on the order of hundreds of billions.
$10 billion is fine. It’ll pay off the 2024 size training cluster.
It’ll really be gangbusters with Big Tech when it costs $100 billion a year. The question is
how feasible is $100 billion a year from AI revenue? It’s a lot more than right
now. If you believe in the trajectory of AI systems as I do, it’s not that crazy.
There are like 300 million Microsoft Office subscribers. They have Copilot now. I don’t
know what they’re selling it for. Suppose you sold some AI add-on for $100/month to
a third of Microsoft Office subscribers. That’d be $100 billion right there. $100/month is a lot.
That’s a lot for a third of Office subscribers. For the average knowledge worker, it’s a few
hours of productivity a month. You have to be expecting pretty lame AI progress to
not hit a few hours of productivity a month. Sure, let’s assume all this. What happens in
the next few years? What can the AI trained on the 1 GW data center do? What about the
one on the 10 GW data center? Just map out the next few years of AI progress for me.
The 10 GW range is my best guess for when you get true AGI. Compute
is actually overrated. We’ll talk about that. By 2025-2026, we’re going to get models that
are basically smarter than most college graduates. A lot of the economic usefulness
depends on unhobbling. The models are smart but limited. There are chatbots and then there
are things like being able to use a computer and doing agentic long-horizon tasks.
By 2027-2028, it’ll get as smart as the smartest experts. The unhobbling trajectory points to it
becoming much more like an agent than a chatbot. It’ll almost be like a drop-in remote worker.
This is the question around the economic returns. Intermediate AI systems could be really useful,
but it takes a lot of schlep to integrate them. There’s a lot you
could do with GPT-4 or GPT-4.5 in a business use case, but you really have to change your
workflows to make them useful. It’s a very Tyler Cowen-esque take. It just takes a long time to
diffuse. We’re in SF and so we miss that. But in some sense, the way these systems
want to be integrated is where you get this kind of sonic boom. Intermediate systems could have
done it, but it would have taken schlep. Before you do the schlep to integrate them, you’ll get
much more powerful systems that are unhobbled. They’re agents, drop-in remote workers. You’re
interacting with them like coworkers. You can do Zoom calls and Slack with them. You
can ask them to do a project and they go off and write a first draft, get feedback, run tests on
their code, and come back. Then you can tell them more things. That’ll be much easier to integrate.
You might need a bit of overkill to make the transition easy and harvest the gains.
What do you mean by overkill? Overkill on model capabilities?
Yeah, the intermediate models could do it but it would take a lot of schlep. The drop-in
remote worker AGI can automate cognitive tasks. The intermediate models would have
made the software engineer more productive. But will the software engineer adopt it?
With the 2027 model, you just don’t need the software engineer. You can interact with it
like a software engineer, and it’ll do the work of a software engineer.
The last episode I did was with John Schulman. I was asking about this. We have these models
that have come out in the last year and none seem to have significantly surpassed GPT-4, certainly
not in an agentic way where they interact with you as a coworker. They’ll brag about
a few extra points on MMLU. Even with GPT-4o, it’s cool they can talk like Scarlett Johansson
(I guess not anymore) but it’s not like a coworker.
It makes sense why they’d be good at answering questions. They have data on how to complete
Wikipedia text. Where is the equivalent training data to understand a Zoom call? Referring back to
your point about a Slack conversation, how can it use context to figure out the
cohesive project you’re working on? Where is that training data coming from?
A key question for AI progress in the next few years is how hard it is to unlock the test time
compute overhang. Right now, GPT-4 can do a few hundred tokens with chain-of-thought. That’s
already a huge improvement. Before, answering a math question was just shotgun. If you tried to
answer a math question by saying the first thing that comes to mind, you wouldn’t be very good.
GPT-4 thinks for a few hundred tokens. If I think at 100 tokens a minute, that’s like
what GPT-4 does. It’s equivalent to me thinking for three minutes. Suppose GPT-4 could think
for millions of tokens. That’s +4 OOMs on test time compute on one problem. It can’t do it now.
It gets stuck. It writes some code. It can do a little bit of iterative debugging, but eventually
gets stuck and can’t correct its errors. There’s a big overhang. In other areas of ML,
there’s a great paper on AlphaGo, where you can trade off train time and test time compute. If
you can use 4 OOMs more test time compute, that’s almost like a 3.5x OOM bigger model.
Again, if it’s 100 tokens a minute, a few million tokens is a few months of working time. There’s a
lot more you can do in a few months of working time than just getting an answer right now. The
question is how hard is it to unlock that? In the short timelines AI world,
it’s not that hard. The reason it might not be that hard is that there are only a few extra
tokens to learn. You need to learn things like error correction tokens where you’re like “ah,
I made a mistake, let me think about that again.” You need to learn planning tokens where it’s like
“I’m going to start by making a plan. Here’s my plan of attack. I’m going to write a draft and
now I’m going to critique my draft and think about it.” These aren’t things that models can do now,
but the question is how hard it is. There are two paths to agents. When
Sholto was on your podcast, he talked about scaling leading to more nines of reliability.
That’s one path. The other path is the unhobbling path. It needs to learn this
System 2 process. If it can learn that, it can use millions of tokens and think coherently.
Here’s an analogy. When you drive, you’re on autopilot most of the time. Sometimes you
hit a weird construction zone or intersection. Sometimes my girlfriend is in the passenger seat
and I’m like “ah, be quiet for a moment, I need to figure out what’s going on.”
You go from autopilot to System 2 and you’re thinking about how to do it. Scaling
improves that System 1 autopilot. The brute force way to get to agents is improving that system. If
you can get System 2 working, you can quickly jump to something more agentified
and test time compute overhang is unlocked. What’s the reason to think this is an easy
win? Is there some loss function that easily enables System 2 thinking? There aren’t many
animals with System 2 thinking. It took a long time for evolution to give us System 2 thinking.
Pre-training has trillions of tokens of Internet text, I get that. You match that and get
all of these free training capabilities. What’s the reason to think this is an easy unhobbling?
First of all, pre-training is magical. It gave us a huge advantage for models of
general intelligence because you can predict the next token. But there’s a common misconception.
Predicting the next token lets the model learn incredibly rich representations. Representation
learning properties are the magic of deep learning. Rather than just learning
statistical artifacts, the models learn models of the world. That’s why they can generalize,
because it learned the right representations. When you train a model, you have this raw bundle
of capabilities that’s useful. The unhobbling from GPT-2 to GPT-4 took this raw mass and RLHF’d
it into a good chatbot. That was a huge win. In the original InstructGPT paper, comparing
RLHF vs. non-RLHF models it’s like a 100x model size win on human preference rating. It started
to be able to do simple chain-of-thought and so on. But you still have this advantage of all
these raw capabilities, and there’s still a huge amount you’re not doing with them.
This pre-training advantage is also the difference to robotics. People used to say it was a hardware
problem. The hardware is getting solved, but you don’t have this huge advantage
of bootstrapping with pre-training. You don’t have all this unsupervised learning you can do.
You have to start right away with RL self-play. The question is why RL and unhobbling might work.
Bootstrapping is an advantage. Your Twitter bio is being pre-trained. You’re not being
pre-trained anymore. You were pre-trained in grade school and high school. At some point,
you transition to being able to learn by yourself. You weren’t able to do it in elementary school.
High school is probably where it started and by college, if you’re smart, you can teach yourself.
Models are just starting to enter that regime. It’s a little bit more scaling and then you figure
out what goes on top. It won’t be trivial. A lot of deep learning seems obvious in retrospect.
There’s some obvious cluster of ideas. There are some ideas that seem a little dumb but
work. There are a lot of details you have to get right. We’re not going to get this next
month. It’ll take a while to figure out. A while for you is like half a year.
I don’t know, between six months and three years. But it's possible. It’s also very
related to the issue of the data wall. Here’s one intuition on learning by yourself. Pre-training
is kind of like the teacher lecturing to you and the words are flying by. You’re
just getting a little bit from it. That's not what you do when you learn
by yourself. When you learn by yourself, say you're reading a dense math textbook,
you're not just skimming through it once. Some wordcels just skim through and reread
and reread the math textbook and they memorize. What you do is you read a page, think about it,
have some internal monologue going on, and have a conversation with a study buddy. You try a
practice problem and fail a bunch of times. At some point it clicks, and you're like,
"this made sense." Then you read a few more pages. We've kind of bootstrapped our way to
just starting to be able to do that now with models. The question is,
can you use all this sort of self-play, synthetic data, RL to make that thing work. Right now,
there's in-context learning, which is super sample efficient. In the Gemini paper, it just
learns a language in-context. Pre-training, on the other hand, is not at all sample efficient.
What humans do is a kind of in-context learning. You read a book, think about it,
until eventually it clicks. Then you somehow distill that back into the weights. In some sense,
that's what RL is trying to do. RL is super finicky, but when it works it's kind of magical.
It's the best possible data for the model. It’s when you try a practice problem, fail,
and at some point figure it out in a way that makes sense to you. That's the best
possible data for you because it's the way you would have solved the problem,
rather than just reading how somebody else solved the problem, which doesn't initially click.
By the way, if that take sounds familiar it's because it was part of the question I asked
John Schulman. It goes to illustrate the thing I said in the intro. A bunch of the things I've
learned about AI comes from these dinners we do before the interviews with me, you, Sholto,
and a couple of others. We’re like, “what should I ask John Schulman, what I should ask Dario.”
Suppose this is the way things go and we get these unhobblings—
And the scaling. You have this baseline of this enormous force of scaling. GPT-2 was amazing. It
could string together plausible sentences, but it could barely do anything. It was kind of like a
preschooler. GPT-4, on the other hand, could write code and do hard math, like a smart high schooler.
This big jump in capability is explored in the essay series. I count the orders of magnitude
of compute and scale-up of algorithmic progress. Scaling alone by 2027-2028 is going to do another
preschool to high school jump on top of GPT-4. At a per token level, the models will be incredibly
smart. They'll gain more reliability, and with the addition of unhobblings, they'll look less like
chatbots and more like agents or drop-in remote workers. That's when things really get going.
I want to ask more questions about this but let's zoom out. Suppose you're right about this. This is
because of the 2027 cluster which is at 10 GW? 2028 is 10 GW. Maybe it'll be pulled forward.
Something like a 5.5 level by 2027, whatever that's called. What does the world look like at
that point? You have these remote workers who can replace people. What is the reaction to that in
terms of the economy, politics, and geopolitics? 2023 was a really interesting year to experience
as somebody who was really following the AI stuff. What were you doing in 2023?
OpenAI. When you were at OpenAI in 2023, it was a weird thing. You almost didn't want to talk about
AI or AGI. It was kind of a dirty word. Then in 2023, people saw ChatGPT for the first time,
they saw GPT-4, and it just exploded. It triggered huge capital expenditures
from all these firms and an explosion in revenue from Nvidia and so on. Things have been quiet
since then, but the next thing has been in the oven. I expect every generation these g-forces
to intensify. People will see the models. They won’t have counted the OOMs so they're
going to be surprised. It'll be kind of crazy. Revenue is going to accelerate. Suppose you
do hit $10 billion by the end of this year. Suppose it just continues on the trajectory
of revenue doubling every six months. It's not actually that far from $100 billion, maybe by
2026. At some point, what happened to Nvidia is going to happen to Big Tech. It's going to
explode. A lot more people are going to feel it. 2023 was the moment for me where AGI went from
being this theoretical, abstract thing. I see it, I feel it, and I see the path. I see where
it's going. I can see the cluster it's trained on, the rough combination of algorithms, the people,
how it's happening. Most of the world is not there yet. Most of the people who feel it are right
here. A lot more of the world is going to start feeling it. That's going to start being intense.
Right now, who feels it? You can go on Twitter and there are these GPT wrapper companies, like,
"whoa, GPT-4 is going to change our business." I'm so bearish on the wrapper companies because
they're betting on stagnation. They're betting that you have these intermediate
models and it takes so much schlep to integrate them. I'm really bearish because we're just
going to sonic boom you. We're going to get the unhobblings. We're going to get the drop-in remote
worker. Your stuff is not going to matter. So that's done. SF, this crowd, is paying
attention now. Who is going to be paying attention in 2026 and 2027? Presumably,
these are years in which hundreds of billions of capex is being spent on AI.
The national security state is going to start paying a lot of
attention. I hope we get to talk about that. Let’s talk about it now. What happens? What
is the immediate political reaction? Looking internationally, I don't know if Xi Jinping
sees the GPT-4 news and goes, "oh, my God, look at the MMLU score on that. What are
we doing about this, comrade?" So what happens when he sees a
remote worker replacement and it has $100 billion in revenue? There’s a lot of businesses
that have $100 billion in revenue, and people aren't staying up all night talking about it.
The question is, when does the CCP and when does the American national security establishment
realize that superintelligence is going to be absolutely decisive for national power? This
is where the intelligence explosion stuff comes in, which we should talk about later.
You have AGI. You have this drop-in remote worker that can replace you or me,
at least for remote jobs. Fairly quickly, you turn the crank one or two more times and you
get a thing that's smarter than humans. Even more than just turning the crank a
few more times, one of the first jobs to be automated is going to be that of an
AI researcher or engineer. If you can automate AI research, things can start going very fast.
Right now, there's already at this trend of 0.5 OOMs a year of algorithmic progress.
At some point, you're going to have GPU fleets in the tens of millions for inference or more.
You’re going to be able to run 100 million human equivalents of these automated AI researchers.
If you can do that, you can maybe do a decade's worth of ML research progress in a year. You
get some sort of 10x speed up. You can make the jump to AI that is vastly smarter than
humans within a year, a couple of years. That broadens from there. You have this
initial acceleration of AI research. You apply R&D to a bunch of other fields of technology. At
this point, you have a billion super intelligent researchers, engineers, technicians, everything.
They’re superbly competent at all things. They're going to figure out robotics. We
talked about that being a software problem. Well, you have a billion super smart — smarter than the
smartest human researchers — AI researchers in your cluster. At some point during the
intelligence explosion, they're going to be able to figure out robotics. Again, that’ll expand.
If you play this picture forward, it is fairly unlike any other technology. A couple years of
lead could be utterly decisive in say, military competition. If you look at the first Gulf War,
Western coalition forces had a 100:1 kill ratio. They had better sensors on their tanks. They had
better precision missiles, GPS, and stealth. They had maybe 20-30 years of technological
lead. They just completely crushed them. Superintelligence applied to broad fields
of R&D — and the industrial explosion that comes from it, robots making a lot of material — could
compress a century’s worth of technological progress into less than a decade. That means
that a couple years could mean a Gulf War 1-style advantage in military affairs. That’s including a
decisive advantage that even preempts nukes. How do you find nuclear stealth submarines?
Right now, you have sensors and software to detect where they are. You can do that. You
can find them. You have millions or billions of mosquito-sized drones, and they take out the
nuclear submarines. They take out the mobile launchers. They take out the other nukes.
It’s potentially enormously destabilizing and enormously important for national power.
At some point people are going to realize that. Not yet, but they will. When they do,
it won’t just be the AI researchers in charge. The CCP is going to have an all-out effort to
infiltrate American AI labs. It’ll involve billions of dollars, thousands of people,
and the full force of the Ministry of State Security. The CCP is going to try to outbuild us.
They added as much power in the last decade as an entire US electric grid. So the 100 GW cluster,
at least the 100 GW part of it, is going to be a lot easier for them to get. By this point,
it's going to be an extremely intense international competition.
One thing I'm uncertain about in this picture is if it’s like what you say, where it's more
of an explosion. You’ve developed an AGI. You make it into an AI researcher. For a while,
you're only using this ability to make hundreds of millions of other AI researchers. The thing
that comes out of this really frenetic process is a superintelligence. Then that goes out in the
world and is developing robotics and helping you take over other countries and whatever.
It's a little bit more gradual. It's an explosion that starts narrowly. It can do
cognitive jobs. The highest ROI use for cognitive jobs is to make the AI better
and solve robotics. As you solve robotics, now you can do R&D in biology and other technology.
Initially, you start with the factory workers. They're wearing the glasses and AirPods, and the
AI is instructing them because you can make any worker into a skilled technician. Then you have
the robots come in. So this process expands. Meta's Ray-Bans are a complement to Llama.
With the fabs in the US, their constraint is skilled workers. Even if you don't have robots,
you have the cognitive superintelligence and can kind of make them all into skilled
workers immediately. That's a very brief period. Robots will come soon.
Suppose this is actually how the tech progresses in the United States, maybe
because these companies are already generating hundreds of billions of dollars of AI revenue
At this point, companies are borrowing hundreds of billions or more in the corporate debt markets.
Why is a CCP bureaucrat, some 60-year-old guy, looking at this and going,
"oh, Copilot has gotten better now" and now— This is much more than Copilot has gotten
better now. It’d require
shifting the production of an entire country, dislocating energy that is otherwise being used
for consumer goods or something, and feeding all that into the data centers. Part of this whole
story is that you realize superintelligence is coming soon. You realize it and maybe I realize
it. I'm not sure how much I realize it. Will the national security apparatus in
the United States and the CCP realize it? This is a really key question. We have a few
more years of mid-game. We have a few more 2023s. That just starts updating more and
more people. The trend lines will become clear. You will see some amount of the COVID dynamic.
COVID in February of 2020 honestly feels a lot like today. It feels like this utterly crazy thing
is coming. You see the exponential and yet most of the world just doesn't realize it. The mayor of
New York is like, "go out to the shows," and "this is just Asian racism." At some point, people saw
it and then crazy, radical reactions came. By the way, what were you doing during
COVID? Was it your freshman or sophomore year? Junior.
Still, you were like a 17-year-old junior or something right? Did you short the market or
something? Did you sell at the right time? Yeah.
So there will be a March 2020 moment. You can make the analogy you make in the
series that this will cause a reaction like, “we have to do the Manhattan Project again for America
here.” I wonder what the politics of this will be like. The difference here is that it’s not
just like, “we need the bomb to beat the Nazis.” We'll be building this thing that makes all our
energy prices go up a bunch and it's automating a lot of our jobs. The climate change stuff people
are going to be like, "oh, my God, it's making climate change worse and it's helping Big Tech."
Politically, this doesn't seem like a dynamic where the national security apparatus or the
president is like, "we have to step on the gas here and make sure America wins."
Again, a lot of this really depends on how much people are feeling it and how much people
are seeing it. Our generation is so used to peace, American hegemony and nothing matters.
The historical norm is very much one of extremely intense and extraordinary things happening in the
world with intense international competition. There's a 20-year very unique period. In World
War II, something like 50% of GDP went to war production. The US borrowed over
60% of GDP. With Germany and Japan I think it was over 100%. In World War I, the UK,
France, and Germany all borrowed over 100% of GDP. Much more was on the line. People talk about
World War I being so destructive with 20 million Soviet soldiers dying and 20% of
Poland. That happened all the time. During the Seven Years' War something like 20-30%
of Prussia died. In the Thirty Years' War, up to 50% of a large swath of Germany died.
Will people see that the stakes here are really high and that history is actually
back? The American national security state thinks very seriously about stuff like this.
They think very seriously about competition with China. China very much thinks of itself on this
historical mission of the rejuvenation of the Chinese nation. They think a lot about national
power. They think a lot about the world order. There's a real question on timing. Do they
start taking this seriously when the intelligence explosion is already happening quite late. Do they
start taking this seriously two years earlier? That matters a lot for how things play out.
At some point they will and they will realize that this will be utterly decisive for not
just some proxy war but for major questions. Can liberal democracy continue to thrive? Can
the CCP continue existing? That will activate forces that we haven't seen in a long time.
The great power conflict definitely seems compelling. All kinds of different things
seem much more likely when you think from a historical perspective. You
zoom out beyond the liberal democracy that we’ve had the pleasure to live in America
for say the last 80 years. That includes things like dictatorships, war, famine, etc.
I was reading The Gulag Archipelago and one of the chapters begins with Solzhenitsyn saying how if
you had told a Russian citizen under the tsars that because of all these new technologies — we
wouldn’t see some Great Russian revival with Russia becoming a great power and the citizens
made wealthy — you would see tens of millions of Soviet citizens tortured by millions of beasts in
the worst possible ways. If you’d told them that that would be the result of the 20th century,
they wouldn’t have believed you. They’d have called you a slanderer.
The possibilities for dictatorship with superintelligence are even crazier as well.
Imagine you have a perfectly loyal military and security force. No more rebellions. No
more popular uprisings. You have perfect lie detection. You have surveillance of everybody.
You can perfectly figure out who's the dissenter and weed them out. No Gorbachev who had some
doubts about the system would have ever risen to power. No military coup would have ever happened.
There's a real way in which part of why things have worked out is that ideas can evolve. There's
some sense in which time heals a lot of wounds and solves a lot of debates. Throughout time, a lot of
people had really strong convictions, but a lot of those have been overturned over time because
there's been continued pluralism and evolution. Imagine applying a CCP-like approach to
truth where truth is what the party says. When you supercharge that with superintelligence, that
could just be locked in and enshrined for a long time. The possibilities are pretty terrifying.
To your point about history and living in America for the past eight years, this is one of the
things I took away from growing up in Germany. A lot of this stuff feels more visceral. My mother
grew up in the former East, my father in the former West. They met shortly after the Wall fell.
The end of the Cold War was this extremely pivotal moment for me because it's the reason I exist.
I grew up in Berlin with the former Wall. My great-grandmother, who is still alive,
is very important in my life. She was born in 1934 and grew up during the Nazi era. In World War II,
she saw the firebombing of Dresden from this country cottage where they were as
kids. Then she spent most of her life in the East German communist dictatorship.
She'd tell me about how Soviet tanks came when there was the popular uprising in 1954.
Her husband was telling her to get home really quickly and get off the streets.
She had a son who tried to ride a motorcycle across the Iron Curtain and then was put in
a Stasi prison for a while. Finally, when she's almost 60, it was the first time she
lived in a free country, and a wealthy country. When I was a kid, the thing she always really
didn't want me to do was get involved in politics. Joining a political party had
very bad connotations for her. She raised me when I was young. So it doesn't feel
that long ago. It feels very close. There’s one thing I wonder about when
we're talking today about the CCP. The people in China who will be doing their version of this
project will be AI researchers who are somewhat Westernized. They’ll either have gotten educated
in the West or have colleagues in the West. Are they going to sign up for the CCP
project that's going to hand over control to Xi Jinping? What's your sense of that? Fundamentally,
they're just people, right? Can't you convince them about the dangers of superintelligence?
Will they be in charge though? In some sense, this is also the case in the
US. This is like the rapidly depreciating influence of the lab employees. Right now,
the AI lab employees have so much power. You saw this November event. It’s so much power.
Both are going to get automated and they're going to lose all their power. It'll just be
a few people in charge with their armies of automated AIs. It’s also the politicians and
the generals and the national security state. There are some of these classic
scenes from the Oppenheimer movie. The scientists built it and then the bomb was
shipped away and it was out of their hands. It's good for lab employees to be aware of
this. You have a lot of power now, but maybe not for that long. Use it wisely.
I do think they would benefit from some more organs of representative democracy.
What do you mean by that? In the OpenAI board events,
employee power is exercised in a very direct democracy way. How some of that went about
really highlighted the benefits of representative democracy and having some deliberative organs.
Interesting. Let's go back to the $100 billion revenue question. The companies are trying to
build clusters that are this big. Where are they building it? Say it's the amount
of energy that would be required for a small or medium-sized US state. Does Colorado then get no
power because it's happening in the United States? Is it happening somewhere else?
This is the thing that I always find funny, when you talk about Colorado getting no power.
The easy way to get the power would be to displace less economically useful stuff.
Buy up the aluminum smelting plant that has a gigawatt. We're going to replace it with the
data center because that's important. That's not actually happening because a lot of these power
contracts are really locked in long-term. Also, people don't like things like this.
In practice what it requires, at least right now, is building new power. That
might change. That's when things get really interesting, when it's like, “no, we're just
dedicating all of the power to the AGI.” So right now it's building new power. 10
GW is quite doable. It's like a few percent of US natural gas production. When you have the 10 GW
training cluster, you have a lot more inference. 100 gigawatts is where it starts getting pretty
wild. That's over 20% of US electricity production. It's pretty doable, especially
if you're willing to go for natural gas. It is incredibly important that these
clusters are in the United States. Why does it matter that it's in the US?
There are some people who are trying to build clusters elsewhere. There's a lot of
free-flowing Middle Eastern money that's trying to build clusters elsewhere. This comes back to
the national security question we talked about. Would you do the Manhattan Project in the UAE?
You can put the clusters in the US and you can put them in allied democracies. Once you put them
in authoritarian dictatorships, you create this irreversible security risk. Once the cluster is
there, it's much easier for them to exfiltrate the weights. They can literally steal the AGI,
the superintelligence. It’s like they got a direct copy of the atomic bomb. It makes it much easier
for them. They have weird ties to China. They can ship that to China. That's a huge risk.
Another thing is they can just seize the compute. The issue here is people right now are thinking
of this as ChatGPT, Big Tech product clusters. The clusters being planned now,
three to five years out, may well be the AGI, superintelligence clusters. When things get hot,
they might just seize the compute. Suppose we put 25% of the compute
capacity in these Middle Eastern dictatorships. Say they seize that. Now it's a ratio of compute
of 3:1. We still have more, but even with only 25% of compute there it starts getting
pretty hairy. 3:1 is not that great of a ratio. You can do a lot with that amount of compute.
Say they don't actually do this. Even if they don't actually seize the compute,
even if they actually don't steal the weights, there's just a lot of implicit leverage you
get. They get seats at the AGI table. I don't know why we're giving authoritarian
dictatorships the seat at the AGI table. There's going to be a lot of compute
in the Middle East if these deals go through. First of all, who is it? Is it just every
single Big Tech company trying to figure it out over there?
It’s not everybody, some. There are reports, I think
Microsoft. We'll get into it. So say the UAE gets a bunch of
compute because we're building the clusters there. Let's say they have 25% of the compute. Why does a
compute ratio matter? If it's about them being able to kick off the intelligence explosion,
isn't it just some threshold where you have 100 million AI researchers or you don't?
You can do a lot with 33 million extremely smart scientists. That might be enough
to build the crazy bio weapons. Then you're in a situation where they stole
the weights and they seized the compute. Now they can make these crazy new WMDs that
will be possible with superintelligence. Now you've just proliferated the stuff
that’ll be really powerful. Also, 3x on compute isn't actually that much.
The riskiest situation is if we're in some sort of really neck and neck, feverish international
struggle. Say we're really close with the CCP and we're months apart. The situation we want
to be in — and could be in if we play our cards right — is a little bit more like the US building
the atomic bomb versus the German project years behind. If we have that, we just have so much
more wiggle room to get safety right. We're going to be building these crazy
new WMDs that completely undermine nuclear deterrence. That's so much easier to deal
with if you don't have somebody right on your tails and you have to go at maximum speed.
You have no wiggle room. You're worried that at any time they can overtake you.
They can also just try to outbuild you. They might literally win. China might literally win
if they can steal the weights, because they can outbuild you. They may have less caution,
both good and bad caution in terms of whatever unreasonable regulations we have.
If you're in this really tight race, this sort of feverish struggle, that's when
there's the greatest peril of self-destruction. Presumably the companies that are trying to build
clusters in the Middle East realize this. Is it just that it’s impossible to do this
in America? If you want American companies to do this at all, do you have to do it in
the Middle East or not at all? Then you just have China build a Three Gorges Dam cluster.
There’s a few reasons. People aren’t thinking about this as the
AGI superintelligence cluster. They’re just like, “ah, cool clusters for my ChatGPT.”
If you’re doing ones for inference, presumably you could spread them out across the country or
something. The ones they’re building, they’re going to do one training run
in a single thing they’re building. It’s just hard to distinguish between
inference and training compute. People can claim it’s inference compute,
but they might realize that actually this is going to be useful for training compute too.
Because of synthetic data and things like that? RL looks a lot like inference, for example. Or
you just end up connecting them in time. It's a lot like raw materials. It's like placing your
uranium refinement facilities there. So there are a few reasons. One,
they don't think about this as the AGI cluster. Another is just that there’s
easy money coming from the Middle East. Another one is that some people think
that you can't do it in the US. We actually face a real system competition here. Some
people think that only autocracies that can do this with top-down mobilization of industrial
capacity and the power to get stuff done fast. Again, this is the sort of thing we haven't faced
in a while. But during the Cold War, there was this intense system competition. East vs. West
Germany was this. It was West Germany as liberal democratic capitalism vs. state-planned communism.
Now it's obvious that the free world would win. But even as late as 1961,
Paul Samuelson was predicting that the Soviet Union would outgrow the United States because
they were able to mobilize industry better. So there are some people who shitpost about
loving America, but then in private they're betting against America. They're betting against
the liberal order. Basically, it's just a bad bet. This stuff is really possible in the US.
To make it possible in the US, to some degree we have to get our act together. There are basically
two paths to doing it in the US. One is you just have to be willing to do natural gas. There's
ample natural gas. You put your cluster in West Texas. You put it in southwest Pennsylvania by
the Marcellus Shale. The 10 GW cluster is super easy. The 100 GW cluster is also pretty doable.
I think natural gas production in the United States has almost doubled in a decade. You do
that one more time over the next seven years, you could power multiple trillion-dollar data centers.
The issue there is that a lot of people made these climate commitments, not just the government. It's
actually the private companies themselves, Microsoft, Amazon, etc., that have made
these climate commitments. So they won't do natural gas. I admire the climate commitments,
but at some point the national interest and national security is more important.
The other path is doing green energy megaprojects. You do solar and batteries
and SMRs and geothermal. If we want to do that, there needs to be a broad deregulatory push.
You can't have permitting take a decade. You have to reform FERC. You have to have blanket
NEPA exemptions for this stuff. There are inane state-level regulations. You can
build the solar panels and batteries next to your data center, but it'll still take years because
you actually have to hook it up to the state electrical grid. You have to use governmental
powers to create rights of way to have multiple clusters and connect them and have the cables.
Ideally we do both. Ideally we do natural gas and the broader deregulatory green agenda.
We have to do at least one. Then this stuff is possible in the United States.
Before the conversation I was reading a good book about World War II industrial mobilization in the
United States called Freedom's Forge. I’m thinking back on that period, especially in the context of
reading Patrick Collison’s Fast and the progress study stuff. There’s this narrative out there that
we had state capacity back then and people just got shit done but that now it's a clusterfuck.
It wasn’t at all the case! It was really interesting. You had
people from the Detroit auto industry side, like William Knudsen, who were running mobilization for
the United States. They were extremely competent. At the same time you had labor organization and
agitation, which is very analogous to the climate change pledges and concerns we have today.
They would literally have these strikes, into 1941, costing millions of man-hours
worth of time when we're trying to make tens of thousands of planes a month. They would just
debilitate factories for trivial concessions from capital that were pennies on the dollar.
There were concerns that the auto companies were trying to use the pretext of a potential
war to prevent paying labor the money it deserves. So with what climate change is today,
you might think, "ah, America's fucked. We're not going to be able to build this shit if you
look at NEPA or something,” I didn't realize how debilitating labor was in World War II.
It wasn’ just that. Before 1939, the American military was in total shambles. You read about
it and it reads a little bit like the German military today. Military expenditures were I think
less than 2% of GDP. All the European countries had gone, even in peacetime, above 10% of GDP.
It was rapid mobilization starting from nothing. We were making no planes.
There were no military contracts. Everything had been starved during the Great Depression.
But there was this latent capacity. At some point the United States got its act together.
This applies the other way around too with China. Sometimes people count them out a little bit with
the export controls and so on. They're able to make 7-nanometer chips now. There's a question
of how many they could make. There's at least a possibility that they're going to mature that
ability and make a lot of 7-nanometer chips. There's a lot of latent industrial capacity
in China. They are able to build a lot of power fast. Maybe that isn't activated for AI yet. At
some point, the same way the United States and a lot of people in the US government are going
to wake up, the CCP is going to wake up. Companies realize that scaling is a thing.
Obviously their whole plans are contingent on scaling. So they understand that in 2028 we're
going to be building 10 GW data centers. At that point, the people who can keep up
are Big Tech, potentially at the edge of their capabilities, sovereign wealth fund-funded things,
and also major countries like America and China. What's their plan? With the AI labs, what's their
plan given this landscape? Do they not want the leverage of being in the United States?
The Middle East does offer capital, but America has plenty of capital. We have trillion-dollar
companies. What are these Middle Eastern states? They're kind of like trillion-dollar
oil companies. We have trillion-dollar companies and very deep financial markets. Microsoft could
issue hundreds of billions of dollars of bonds and they can pay for these clusters.
Another argument being made, which is worth taking seriously, is that if we don't work
with the UAE or with these Middle Eastern countries, they're just going to go to China.
They're going to build data centers and pour money into AI regardless. If we don't
work with them, they'll just support China. There's some merit to the argument in the
sense that we should be doing benefit-sharing with them. On the road to AGI, there should be
two tiers of coalitions. There should be a narrow coalition of democracies that's developing AGI.
Then there should be a broader coalition of other countries, including dictatorships, and we should
offer them some of the benefits of AI. If the UAE wants to use AI products,
run Meta recommendation engines, or run the last-generation models,
that's fine. By default, they just wouldn't have had this seat at the AGI table. So they
have some money, but a lot of people have money. The only reason they're getting this seat at the
AGI table and giving these dictators this leverage over this extremely important national security
technology, is because we're getting them excited and offering it to them.
Who specifically is doing this? Who are the companies who are going there to fundraise?
It’s been reported that Sam Altman is trying to raise $7 trillion or whatever for a chip
project. It's unclear how many of the clusters will be there, but definitely stuff is happening.
There’s another reason I'm a little suspicious of this argument that if the US doesn't work
with them, they'll go to China. I've heard from multiple people — not from my time at OpenAI,
and I haven't seen the memo — that at some point several years ago, OpenAI leadership
had laid out a plan to fund and sell AGI by starting a bidding war between the governments
of the United States, China, and Russia. It's surprising to me that they're willing to
sell AGI to the Chinese and Russian governments. There's also something that feels eerily familiar
about starting this bidding war and then playing them off each other, saying, "well,
if you don't do this, China will do it." Interesting. That's pretty fucked up.
Suppose you're right. We ended up in this place because, as one of our friends put it, the Middle
East has billions or trillions of dollars up for persuasion like no other place in the world.
With little accountability. There’s no Microsoft board. It's only the dictator.
Let's say you're right, that you shouldn't have gotten them excited about AGI in the
first place. Now we're in a place where they are excited about AGI and they're like, "fuck,
we want to have GPT-5 while you're going to be off building superintelligence. This Atoms for Peace
thing doesn't work for us." If you're in this place, don't they already have the leverage?
The UAE on its own is not competitive. They're already export-controlled. You're not supposed
to ship Nvidia chips over there. It's not like they have any of the leading
AI labs. They have money, but it's hard to just translate money into progress.
But I want to go back to other things you've been saying in laying out your vision. There's
this almost industrial process of putting in the compute and algorithms, adding that up,
and getting AGI on the other end. If it's something more like that, then the case for
somebody being able to catch up rapidly seems more compelling than if it's some bespoke...
Well, if they can steal the algorithms and if they can steal the weights, that’s really important.
How easy would it be for an actor to steal the things that are not the trivial released
things, like Scarlett Johansson's voice, but the RL things we're talking about, the unhobblings?
It’s all extremely easy. They don’t make the claim that it’s hard. DeepMind put out their Frontier
Safety Framework and they lay out security levels, zero to four. Four is resistant to state activity.
They say, we're at level zero. Just recently, there was an indictment of a guy who stole a bunch
of really important AI code and went to China with it. All he had to do to steal the code was
copy it, put it into Apple Notes, and export it as a PDF. That got past their monitoring.
Google has the best security of any of the AI labs probably, because they have the Google
infrastructure. I would think of the security of a startup. What does security of a startup look
like? It's not that good. It's easy to steal. Even if that's the case, a lot of your
post is making the argument for why we are going to get the intelligence explosion. If we have
somebody with the intuition of an Alec Radford to come up with all these ideas, that intuition is
extremely valuable and you can scale that up. If it's just intuition, then that's not going
to be just in the code, right? Also because of export controls, these countries are going
to have slightly different hardware. You're going to have to make different trade-offs and probably
rewrite things to be compatible with that. Is it just a matter of getting the right pen
drive and plugging it into the gigawatt data center next to the Three Gorges Dam
and then you're off to the races? There are a few different things,
right? One threat model is just them stealing the weights themselves. The weights one is
particularly insane because they can just steal the literal end product — just make a replica of
the atomic bomb — and then they're ready to go. That one is extremely important around the time
we have AGI and superintelligence because China can build a big cluster by default. We'd have a
big lead because we have the better scientists, but if we make the superintelligence and they
just steal it, they're off to the races. Weights are a little bit less important right now
because who cares if they steal the GPT-4 weights. We still have to get started on weight security
now because if we think there’s AGI by 2027, this stuff is going to take a while. It's not just
going to be like, "oh, we do some access control." If you actually want to be resistant to Chinese
espionage, it needs to be much more intense. The thing that people aren't paying enough
attention to is the secrets. The compute stuff is sexy, but people underrate the secrets. The half
an order of magnitude a year is just by default, sort of algorithmic progress. That's huge. If we
have a few years of lead, by default, that's a 10-30x, 100x bigger cluster, if we protect them.
There's this additional layer of the data wall. We have to get through the data wall. That means
we actually have to figure out some sort of basic new paradigm. So it’s the “AlphaGo step
two.” “AlphaGo step one” learns from human imitation. “AlphaGo step two” is the kind
of self-play RL thing that everyone's working on right now. Maybe we're going to crack it.
If China can't steal that, then they're stuck. If they can steal it, they're off to the races.
Whatever that thing is, can I literally write it down on the back of a napkin? If it's that easy,
then why is it so hard for them to figure it out? If it's more about the intuitions,
then don't you just have to hire Alec Radford? What are you copying down?
There are a few layers to this. At the top is the fundamental approach. On pre-training it might be
unsupervised learning, next token prediction, training on the entire Internet. You actually
get a lot of juice out of that already. That one's very quick to communicate.
Then there's a lot of details that matter, and you were talking about this earlier.
It's probably going to be somewhat obvious in retrospect, or there's going to be some not too
complicated thing that'll work, but there's going to be a lot of details to get that.
If that's true, then again, why do we think that getting state-level security
in these startups will prevent China from catching up? It’s just like, "oh,
we know some sort of self-play RL will be required to get past the data wall."
It's going to be solved by 2027, right? It's not that hard.
The US, and the leading labs in the United States, have this huge lead. By default, China actually
has some good LLMs because they're just using open source code, like Llama. People really underrate
both the divergence on algorithmic progress and the lead the US would have by default because
all this stuff was published until recently. Look at Chinchilla Scaling laws, MoE papers,
transformers. All that stuff was published. That's why open source is good and why China
can make some good models. Now, they're not publishing it anymore. If we actually
kept it secret, it would be a huge edge. To your point about tacit knowledge and
Alec Radford, there's another layer at the bottom that is something about large-scale
engineering work to make these big training runs work. That is a little bit more like
tacit knowledge, but China will be able to figure that out. It's engineering schlep, and they're
going to figure out how to do it. Why can't they figure that out,
but not how to get the RL thing working? I don't know. Germany during World War II
went down the wrong path with heavy water. There's an amazing anecdote
in The Making of the Atomic Bomb about this. Secrecy was one of the most contentious issues
early on. Leo Szilard really thought a nuclear chain reaction and an atomic bomb were possible.
He went around saying, "this is going to be of enormous strategic and military importance."
A lot of people didn't believe it or thought, "maybe this is possible, but I'm going to act
as though it's not, and science should be open." In the early days, there had been some incorrect
measurements made on graphite as a moderator. Germany thought graphite wasn't going to work,
so they had to do heavy water. But then Enrico Fermi made new measurements indicating that
graphite would work. This was really important. Szilard assaulted Fermi with another secrecy
appeal and Fermi was pissed off, throwing a temper tantrum. He thought it was absurd, saying,
"come on, this is crazy." But Szilard persisted, and they roped in another guy, George Pegram.
In the end, Fermi didn't publish it. That was just in time. Fermi not publishing
meant that the Nazis didn't figure out graphite would work. They went down the path of heavy
water, which was the wrong path. This is a key reason why the German
project didn't work out. They were way behind. We face a similar situation now. Are we just
going to instantly leak how to get past the data wall and what the next paradigm is? Or are we not?
The reason this would matter is if being one year ahead would be a huge advantage.
In the world where you deploy AI over time they're just going to catch up anyway.
I interviewed Richard Rhodes, the guy who wrote The
Making of the Atomic Bomb. One of the anecdotes he had was when the Soviets realized America had
the bomb. Obviously, we dropped it in Japan. Lavrentiy Beria — the guy who ran the NKVD,
a famously ruthless and evil guy — goes to the Soviet scientist who was running their version
of the Manhattan Project. He says, "comrade, you will get us the American bomb." The guy says,
"well, listen, their implosion device actually is not optimal. We should make it a different way."
Beria says, "no, you will get us the American bomb, or your family will be camp dust."
The thing that's relevant about that anecdote is that the Soviets would have had a better bomb if
they hadn't copied the American design, at least initially. That suggests something about history,
not just for the Manhattan Project. There's often this pattern of parallel invention
because the tech tree implies that a certain thing is next — in this case, a self-play
RL — and people work on that and are going to figure it out around the same time. There's not
going to be that much gap in who gets it first. Famously, a bunch of people invented the light
bulb around the same time. Is it the case that it might be true but the one year
or six months makes the difference? Two years makes all the difference.
I don't know if it'll be two years though. If we lock down the labs, we have much better
scientists. We're way ahead. It would be two years. Even six months, a year,
would make a huge difference. This gets back to the intelligence explosion dynamics. A year
might be the difference between a system that's sort of human-level and a system that is vastly
superhuman. It might be like five OOMs. Look at the current pace. Three years ago,
on the math benchmark — these are really difficult high school competition math
problems — we were at a few percent, we couldn't solve anything. Now it's solved. That was at
the normal pace of AI progress. You didn't have a billion superintelligent researchers.
A year is a huge difference, particularly after superintelligence. Once this is applied to many
elements of R&D, you get an industrial explosion with robots and other advanced
technologies. A couple of years might yield decades worth of progress. Again,
it’s like the technological lead the U.S. had in the first Gulf War,
when the 20-30 years of technological lead proved totally decisive. It really matters.
Here’s another reason it really matters. Suppose they steal the weights, suppose they steal the
algorithms, and they're close on our tails. Suppose we still pull out ahead. We're a
little bit faster and we're three months ahead. The world in which we're really neck and neck,
we only have a three-month lead, is incredibly dangerous. We're in this feverish struggle where
if they get ahead, they get to dominate, maybe they get a decisive advantage. They're building
clusters like crazy. They're willing to throw all caution to the wind. We have to keep up.
There are crazy new WMDs popping up. Then we're going to be in the situation where it's crazy new
military technology, crazy new WMDs, deterrence, mutually assured destruction keeps changing
every few weeks. It's a completely unstable, volatile situation that is incredibly dangerous.
So you have to look at it from the point of view that these technologies are dangerous,
from the alignment point of view. It might be really important during the intelligence explosion
to have a six-month wiggle room to be like, “look, we're going to dedicate more compute to alignment
during this period because we have to get it right. We're feeling uneasy about how it's going.”
One of the most important inputs to whether we will destroy ourselves or
whether we will get through this incredibly crazy period is whether we have that buffer.
Before we go further, it's very much worth noting that almost nobody I talk to thinks about the
geopolitical implications of AI. I have some object-level disagreements that we'll get into,
things I want to iron out. I may not disagree in the end.
The basic premise is that if you keep scaling, if people realize that this is where intelligence is
headed, it's not just going to be the same old world. It won't just be about what model we're
deploying tomorrow or what the latest thing is. People on Twitter are like, "oh, GPT-4 is
going to shake your expectations" or whatever. COVID is really interesting because when March
2020 hit, it became clear to the world — presidents, CEOs, media, the average
person — that there are other things happening in the world right now but the main thing we
as a world are dealing with right now is COVID. Soon it will be AGI. This is the quiet period.
Maybe you want to go on vacation. Maybe now is the last time you can have some kids. My girlfriend
sometimes complains when I’m off doing work that I don’t spend enough time with her. She threatens
to replace me with GPT-6 or whatever. I'm like, “GPT-6 will also be too busy doing AI research.”
Why aren't other people talking about national security?
I made this mistake with COVID. In February of 2020, I thought it was going to sweep
the world and all the hospitals would collapse. It would be crazy, and then
it'd be over. A lot of people thought this kind of thing at the beginning of COVID. They shut
down their office for a month or whatever. The thing I just really didn't price in was
societal reaction. Within weeks, Congress spent over 10% of GDP on COVID measures.
The entire country was shut down. It was crazy. I didn't sufficiently price it in with COVID.
Why do people underrate it? Being in the trenches actually gives you a less clear
picture of the trend lines. You don’t have to zoom out that much, only a few years.
When you're in the trenches, you're trying to get the next model to work. There's always
something that's hard. You might underrate algorithmic progress because you're like,
"ah, things are hard right now," or "data wall"
Loading video analysis...