AI FUTURE THAT CAN DESTROY US | Superintelligence Is Getting Closer — Nick Bostrom × Jonas von Essen
By memoryOS
Summary
## Key takeaways - **Superintelligence: A Game Changer for Humanity**: We are at a pivotal moment in history, potentially the last one where humanity can shape its future, as the race to create superintelligence could fundamentally alter life on Earth. [05:17], [05:31] - **The Default AI Outcome: Doom?**: The default outcome of developing superintelligence may be human extinction, not because of malice, but because a superintelligent AI's goals might not align with human interests. [07:02], [08:17] - **AI Alignment: Progress and Challenges**: While progress has been made in AI safety and alignment, the increasing complexity of AI systems means we may not fully understand their reasoning, posing a significant challenge to ensuring they remain beneficial. [16:42], [21:37] - **Timelines are Uncertain, Risks are Real**: Short timelines for superintelligence are plausible and cannot be excluded, necessitating serious consideration of AI risks, even as the exact probability of existential catastrophe remains debated. [38:03], [45:00] - **The Cosmic Host: A New Perspective**: Our superintelligence may enter a 'cosmic host' of existing super-beings, suggesting a need for humility and a focus on co-existence rather than solely controlling AI for our own utility. [58:33], [59:52]
Topics Covered
- Is superintelligence's default outcome human extinction?
- Can we control AI that learns to hide its true goals?
- Why is building superintelligence crucial for humanity's survival?
- Why do short AI timelines provoke a 'sanity penalty'?
- Will our AI need to align with a 'Cosmic Host'?
Full Transcript
It is a very powerful thing superintelligence. How quickly do you think we will actually reach
superintelligence? We can't know that it couldn't happen in like two or three years. Is the default
outcome doom? My view is that if nobody builds it, everyone dies. Future is really weird and some of
the things we now thought were really important and valuable have disappeared. But other things
have appeared that might also have value and we might not really have some easy way to tote
all of that up. So it appears as if we're finding ourselves at a very special point in the history
of humanity and also in the history of life on this planet. If we create superintelligence,
we would want to make it such that it can get along with this cosmic host. Some company or
some country, like there's gonna be some project right at some point that is like blasting into
this realm of true superintelligence. People would have this picture of well if we do have
this superintelligence like surely we will like have it in a very tightly controlled box and not
allow it to interact with the rest of the world and you know maybe there would be a team of human
scientists that would carefully ask a question and then sort of screen the end but now of course
we have already hooked them up to the internet and we have millions of users. I think the key
though is that we don't want to sort of go far down that scenario and then realize something
has gone wrong and then like end up in some situation where you have to try to put the genie
back in the bottle or fight like Terminator-style this kind of robot army, like I think at that point
it's game over. A clear deficit on the side of understanding what we were actually working
towards and understanding that could be risks which seemed really important because if you could
understand that then we might also actually use the time available to prepare ourselves to avoid
the pitfalls, right, if you have some conception of what could go wrong, you might take action
to prevent things from going wrong. Hey everyone again, thanks for making it today. We're
going to get started. Nick will be joining any minute now. So, while we are waiting for him,
I'm just going to do a quick intro about Jonas. But yeah, basically I'm Alex,
one of the events organizers. I'm the CEO and co-founder of memoryOS. Jonas is my partner and,
yeah, this guy went from zero to hero kind of journey. He was an ordinary guy,
computer science student who found, stumbled upon a book in the library about like learning and memory,
and he just like got obsessed with mind palaces and he like really-really started learning a lot
of different information very fast, to an extent where he first won Memory Championships
in Sweden, where we live, and then he won a Worldwide Memory Championship two years in
a row. Then he beat the National Chinese team, on the biggest TV show called 'The Brain'
two years in a row. They stopped inviting him afterwards. And yeah, then he like set a few world
records. He memorized 100,000 Digits of Pi, just to show the world what's really possible in terms
of human brain. And then yeah, he one "Who Wants to be a Millionaire?", "Jeopardy!", memorized a bunch of
encyclopedias, knows like well over 100,000 facts and yeah, basically what we are building,
we are building a product which is already used by quite a lot of people, and it helps people to learn
and remember anything. But the interesting part is that Jonas has been really into the topic of
AI. Great! Nick is here now, cool! So, now Jonas will take over from here. He will
do a quick intro about Nick in case some of you don't know, but I'm sure you do know him. And yeah,
let's get started. Thanks everyone for joining, and hope you have a good time. Thank you!
Really nice to see all of you here. Welcome to this hopefully very interesting and important
discussion. We're here with Nick Bostrom who is one of the world's leading philosophers,
I would say the leading philosopher in this specific topic. He has been studying
existential risks for a long time. He has founded the Future of Humanity Institute. He is the
guy behind the simulation argument that many of you might know, and 11 years ago he published
the book "Superintelligence: Paths, Dangers, and Strategies." And a lot of the things that
he discussed in this book has since then come to pass in real life, and it's very interesting to
to have you here, Nick. We're very happy that you could come. I know that it's in the middle of
the night for you, as we're extra thankful that you could find the time. Yeah, it's all right. I'm not
normally a night owlish person, so it's not too much of a sacrifice. Perfect. So, it appears as
if we're finding ourselves at a very special point in the history of humanity, and also in the history
of life on this planet. A lot of companies are at this moment racing to become the first to
create a superintelligence, something that would likely be a big change for all life on Earth.
Maybe we should start by just defining this thing like what is a superintelligence and
why do these companies so much want to build it and spend so much time and money on this?
Well, let's say any uh intelligent system that radically outperforms even the top humans, not just in some
narrow field but across the board. This used to be enough. Now as we moved closer I think
you can sort of see more detail, and it maybe becomes more important to start to disentangle
different versions of this. So you have AGI, you have transformative AI, you have I don't know weak superintelligence
strong superintelligence and these might be meaningfully different
as we kind of...this becomes a more imminent prospect. Okay, yes. But basically, something that
is more intelligent than all of humankind together, sort of. Yeah. I mean all of
us together or just sort of that's where we get into definitional questions which you know matter
for some questions, and don't really matter for other questions. Okay, so, at least they're trying
to build something very powerful and I think it's intuitive that this will have a big impact
on the world. Maybe it's not so intuitive that this impact might be rather bad by default.
You have in your book, one chapter, titled "Is The Default Outcome Doom?" and I think this is quite
a surprise like to people who didn't think a lot of this, like, why would you say that the
default outcome of building a superintelligence might be the extinction of humanity.
Well, so this book was published in 2014, and in the works for six years prior to that.
It is a very powerful thing, superintelligence. It's like for the same reason, human intelligence
is very powerful. I think it's what gives us our unique position on Earth. Not that we have
stronger muscles or sharper claws, but that we have brains that can reason and learn, and
accumulate knowledge between generations. And that has allowed us to construct this modern edifice
of civilization, such that now the fate of you know the gorillas depend a lot more on our
choices than on what the gorillas choose to do. And similarly, if we develop AIs that radically
outstrip us in general cognitive abilities, then at least in some broad class of scenarios, the
future would then be shaped by what they decide to do. So then the question is, are we able to design
them in such a way that they would want to make choices that are beneficial for us, or do they end
up with some random other motives that might then lead to them sort of trampling on our interests.
Okay, yeah, and so I think this... like when you talk about this, a lot of people will automatically
get lots of objections coming up in their minds. For example, people might ask, "How could a
superintelligence control us or sort of take over or kill us? It's just a machine.
It's just code. Like it's something inside a computer." How would you answer that? Yeah. Well, I mean,
hopefully it won't come to that. But if you really do imagine a superintelligent antagonist,
I think there would be many ways in which, in the end, it would get its preferences satisfied.
So, maybe it used to be in the early stages of this conversation, people would
have this picture of well, if we do have this superintelligence like surely we would like have it in
a very tightly controlled box and not allow it to interact with the rest of the world, and you know
maybe there would be a team of human scientists that would carefully ask a question and then
sort of screen the end, but now of course we have already hooked them up to the internet, and we have
millions of users, we have competing labs that are racing to develop it first. And so it might
not be that hard, but even if we had this more constraint scenario, like one of the affordances
of superintelligence might be super persuasion abilities like some humans historically have been
quite persuasive and even themselves physically being very limited have been able to get a lot
of other people to act on their behalf. The same could be held all true for a super-intelligent AI.
It might also produce outputs like you know, we ask it to generate code or something. It could
have back doors or processes that are triggered. It might break out of whatever cybernetic
containment systems and hack its way into other computers. Like many ways for it to spread
and then, gradually steer towards a future where maybe it gets increasing levels of
intelligence and resources, and actuators, you know robots, access to labs, etc. Maybe more and
more of the economy will actually be eventually integrated with these AI systems. And you can
then imagine different specific scenarios for what what the actual end phase of this looks like. It's
like I think less important, but you could imagine sort of bioweapons or nanotechnology or drones or
it just being integrated into military systems. Or maybe it would just disappear as a side
effect of its other activities, not as a direct act of aggression, but like maybe it converts more
and more of the Earth's surface to sort of compute infrastructure or space launching probes or energy
harvesting devices or some such. So, I think the key though, is that we don't want to
sort of go far down that scenario and then realize something has gone wrong, and then like end up in
some situation where you have to try to put genie back in the bottle or fight like Terminator-style,
this kind of robot army like I think at that point it's game over. we need to rather build it in the
first place in such a way that it is actually on our side or is helpful or instruction following
or some version of that, and that hopefully will be possible. Right? So, because one might
wonder like what is the actual problem? Because apparently, a superintelligence would be
able to to do a lot of things and and maybe like cause the extinction of humanity but why would it?
Why would it want to, and why wouldn't it want... like...why would it want anything in the first
place, and if it wants anything, why wouldn't it want the thing that we ask it to do? Yeah. Well,
so hopefully it it will want the things we ask it to do or at least something that encompasses human
welfare and human interest and human flourishing as some component, right, of its value function.
But let's break it down. So like, one question is why it would want anything at all?
Well one obvious reason might be that we build it in such a way that it is an AGI that has goals like
we're now trying to develop and deploy agents, right? Because they are really useful in the first
instance maybe as coding assistants, but not just coding assistants that you give a question, and
they provide an answer. But like that can sort of interact with a complex codebase, run some code to
test whether the patch worked, go back, you know, maybe read up some internet web pages and like
it pursues some goal that you have given it, maybe in the prompt. But having these agents that
can pursue goals over longer time horizons is just very useful, and even more so when you start
having AIs that can operate not just with computer code, but that can you know, do other things like
like book flights or you know manage some marketing campaign or then in the physical world
with robots, etc. So that's like the most obvious way in which they would end up with goals.
Like it's also often a side effect of training a system to be good at a particular task.
If you train, you know, with a reinforcement learning system, you have some objective
function and it tends to, it develops, behavioral strategies that perform well in the training
environment, but for sophisticated systems and insufficiently complex training environments,
that often requires you to have some conception of the end state you want to reach and then be
able to sort of define intermediate goals. Like, if you can't sort of oneshot the solution right, you
need to pursue it as a project, to solve complex problems, and that naturally then creates this kind
of architecture that has some objective function that it is trying to achieve, so that's like
amongst the ways in which you could end up with goals. If we look at current LLMs, simple
ones, so they have the kind of I don't even know how to call it, quasi goals or something, but
depending on how you prompt them, they can kind of enact personas and those personas might be,
you know, in a role-playing situation, like have goals, like, and then it kind of acts as if
it were having those goals, although it is a separate question whether those are really the
goals of a whole AI system or whether there's like another internal process that has a different
goal, like to be engaging or to you know adhere to whatever the AI company's meta prompt is or
some other thing that has resulted as a part of the... as an outcome of the training process.
Right, so the kind of goal that could emerge, like inside or in a superintelligence in the future,
it can be sort of anything or or is it more likely to have any specific goal? Well, so if
we solve the alignment problem, then we would be determining what goals it has. This is what I mean.
Now, there's like... this, was not the case back in when when I was writing this book, but now
there's a large field of of like AI safety and people in all the frontier labs now are trying to
develop scalable methods for AI control. Precisely to be able to steer these systems so that they do
what their designers are intending them to do and not other things. If that fails, then yeah, then
then it's hard to predict precisely what goals they might end up with, and this might
depend on the details of the way that they were trained and the architecture.
Okay, yes, and can you tell us a bit about how it's going with the alignment problem, like do you
think is there progress and does it seem as if we will solve it in time before we develop a superintelligence?
Or like, how's it looking? We've come a long way, we don't know how much
further we have to go though. So, but I do think that one of the things that was not
obvious back in the early 2000s was that we would have a prolonged period of time, many years,
where there would be AI systems in existence that were roughly human levelish in many ways,
like current LLM systems are, in that, you could could talk to them, right, in English,
and they can have maybe inside them representation roughly corresponding to human concepts, and we
can even like monitor their chains of thought and kind of eavesdrop on that and get a lot of signal
because they actually represent the world roughly like we did. So, an alternative that could have...
I mean, for what we knew back in say 2010, what it could instead have happened was like nothing
very much, and then like some lab discovers the the secret breakthrough, and they go over a week
you know, from something not very impressive to something radically-super-intelligent because you
found like the missing magical ingredient. But now, when it's been happening much more
gradually, it has given more people more time to realize what is coming, and therefore the
need to start to develop AI control methods, AI alignment, right, so there are two factors that
have worked to our advantage. One is you have this larger surface area, you have existing systems
that you can research and study, and that you can interact with using natural language,
and also the duration of this process is slow enough and still impressive enough that
people clearly pay attention, right? There's a lot of interest in this and so there's a lot
more effort going into solving this as well. So, both of those are positives. Okay, yes, yeah,
that's very interesting, because I think in your book you talk about like different paths to
superintelligence and what would be beneficial from a safety standpoint. For example, you
talk about that, maybe that whole brain emulation would be perhaps easiest to control because
it would be based on a human brain, and then we might be able to give it like human values,
or also like artificial intelligence that's carefully crafted, where we know exactly how
it works and why, but as I understand it, with LLMs we know like a little bit like we know
the principles behind it but when we look at all these like billions of parameters we don't really
know much about what's really going on in there do you think that we do actually know
more than I think, and do you think that it's something that is possible to sort of
interpret in enough, with enough clarity to really be able to solve the alignment problem.
Well, I mean, we know some of what's going on there, and particularly with these reasoning models,
we also are able to, sort of, eavesdrop on the chain of thought, which can give us a lot of
information about what they are thinking and how they're thinking about things. Now, that is a
valuable signal to have that we could lose if we started to directly train on the chain of thought.
Because then, they might learn to sort of separate the real action from the part that
we are able to listen to. If that, if we sort of, in a training environment penalize chains
of thought that include thoughts that we don't like, right? Then they might just start to solve
the problem without using those thoughts that we don't like. But whether that actually results in
those thoughts not existing in the AI, or whether they just learn to hide them from us,
would be then something we wouldn't easily be able to tell. So, generally speaking
the more complex and sophisticated the reasoning agent and, in particular, as it gains situational
awareness and the ability to sort of understand its own situation, which actually current LLMs
are starting to do. The frontier models do often, when we do look at their train of thought,
consider the possibility, and that they are currently in a test situation.
And sometimes you find traces where that "Oh, maybe I'm being tested now." So that probably
means I should behave in one way as opposed to another way. Like we can sort of see glimpses of
this already. And the more sophisticated that becomes the harder it will be for us to sort
of rely on surface appearances of what they do because they might do what they do for strategic
reasons other than as a direct reflection of wanting that as an end result. Right, so it
sounds as if it will likely be harder and harder to interpret these these systems? Yeah.
Well, there will be a greater and greater risk that we are missing something, I guess, because
once you have, I mean, and we also see like in in various kind of red teaming exercises that
are now being done before deployment of the like manifestations of behavior like sandbagging,
like downplaying your capabilities, goal guarding, this requires kind of certain
prompting and stuff. it doesn't seem to happen yet spontaneously in a natural environment but
in test situations you can get some of these systems, this was done with like an
earlier version of Claude, where it is like told that it's going to undergo some training phase,
which would result in it losing its current goal, so, the initial goal is to be nice and not
to say offensive things and to be helpful, that's like what Claude has been trained to do, but in this
experiment, they said, "Well, um, we are gonna like change you now." So, like this
training environment where we will train you to be nasty and mean or like, I forget exactly
which way it would be mean, but like something it didn't want to be. It doesn't want to be nasty.
And so then, what it did was to like realize that it should answer during the
evaluation phase in the nasty way so that there wouldn't be this kind of
reward-learning signal that would then actually change it. So it sort of strategically altered its
behavior in such a way as to safeguard its goal from being trained away by the AI researchers.
Which is a kind of toy demonstration of this phenomenon that like the superintelligence
talked about the book, and others have discussed before on theoretical grounds. Like, once you get a
sufficiently sophisticated reasoner, these types of considerations might start to shape its
behavior, and it then gets a lot trickier and at the same time, there are other ways
also in which our task might get easier. In particular, we might develop AI tools that could
help with alignment research and with mechanistic interpretability and monitoring, and so forth.
So, it's not clear like how the balance will change. The stakes will get higher,
but we don't really know, ultimately, the sort of intrinsic difficulty of the problem that we need
to solve. Yeah, very interesting. In your book, you mentioned quite a lot of times, Eliezer Yudkowsky
who is also someone who's been working in this field quite a lot like for 25 years maybe.
One of the first ones to think many of these thoughts, he seems to be quite a lot more
worried than you are about specifically like the alignment problem. I think that he has been saying
that it is a problem that like will take several decades of like extremely high effort to
to solve, and that currently we're nowhere near solving it, and like he doesn't see any hope
that it will be solved before we reach superintelligence, I think. How do your views differ?
Well, he's kind of on the extreme end of pessimism about the alignment problem,
in terms of P(doom) right, the probability of existential catastrophe given superintelligence
is like very high up there, even amongst the community of people who are concerned
with AI safety, and think there are like significant existential risks. So I don't know
exactly what his probability is, but it's kind of in the high 90s, I don't know if it's 98% or 99% like,
basically we are doomed. But that's not the representative view of people working
in AI safety, most people are much more optimistic than that. He has this recent book with Nate Soares,
"If Anyone Builds It, Everyone Dies." Now, my view is that if nobody builds it, everyone dies.
In fact, most people are already dead who have lived, and the rest of us look
set to follow within a few short decades. So, obviously, we should try to get the risk down as
much as possible, but even if some level of risk remains, some significant level, that doesn't
mean we should never launch superintelligence in my opinion. We have to take into account
the benefits as well and also the risks that we will be confronted with anyway, even if we
don't develop superintelligence. It's not as if that's the only risk that we face as individuals
or that we face collectively as a species. So ultimately, there will need to be some kind of
judgment right, when the rate of further risk reduction is low enough that
it would you know be disadvantageous to wait further. And at that point, there might still be
some significant risk left, but we probably at that point, should just take it. I think in itself
it would be an kind of existential catastrophe if we actually never develop superintelligence.
That would be a big closing down of most of what the future could contain in terms of value.
And so, that in itself, is an existential risk that might be relatively small because it doesn't look
that likely in the current situation. Right? I mean, everything is steaming ahead full speed, but
although it's small, it's not zero, and you could imagine scenarios in which there becomes like some
huge backlash against AI, like so, maybe you get some catastrophe short of human extinction, but like
some really bad thing happens from AI systems maybe that would then result in it becoming
stigmatized and like politically infeasible to say anything positive about it or maybe like
if there is mass unemployment from automation or something like that. Who knows what kind
of political currents might arise as a result of that. So this used to be like
less is become more likely than it was because there is now more agitation for like stopping AI
and AI pause and stuff like that. I think it's still certainly not the medium scenario, but
not something one can fully dismiss, and I think one needs to start to be a little bit
concerned about that. Right, so, you're in some ways concerned about that. You think that
it's important that we eventually build superintelligence, that's yeah, essential for avoiding
other existential risks. But you think that like the current speed, I mean many
of these companies they're talking about super intelligence in just a few years. Do you think
that's, I mean would you prefer it to be like a little longer than that, or like uh, do you think
it's good to just steam ahead? Well, I'm actually writing a paper on that at the moment,
working on it. So, I might have a better answer in a few weeks or a month. Whenever I have finished
this work. So there are, yeah, various variables that would come into that.
There is also, I guess, a distinction one can make between what somehow would be the ideal
from some point of view, in terms of the timeline and what it makes sense to push for
in the real world, in the situation we are now. So even if you know, maybe you hoped that it
would take a little longer or go a little faster that doesn't immediately mean that, therefore it
would make sense to start to go out and call for a moratorium. Let us say, for one, you might worry
that if you started to implement what was like advertised as a temporary moratorium, let's like
just all suppose you could just like pause for one year. Everybody around the world working
in AI labs take a year holiday, right? And then they come back so we'll have one more year okay, maybe
you think that would be good, but then you might worry about how likely is it that we will then
restart after one year. Like so, you have to think, how would you possibly get such a pause? Well,
it seems either you would have to have like some massive sentiment pushing for this like,
and then why would that sentiment not still be there after a year, right?
Like it might just harden or some huge maybe or in combination with some huge regulatory
apparatus, maybe like some international treaty or regulatory agency to actually implement this, right?
Which would have to be pretty strict because you know AI development can be done in
many ways and even if you limit the compute power people can still work on better algorithms you
know, on their own you know whiteboards, and so once you put that all that in place
like these things have a tendency to kind of entrench themselves, and to like it's...
sometimes it's easier to sort of create regulations than to remove them, so even if
one did think that it would be better if we like had a little bit more time, it doesn't follow that,
therefore, it wouldn't be sensible to agitate for like pausing AI for that duration of time.
So not even like for pausing in order to focus more on the alignment problem for some time
and make sure that we solve it before we reach superintelligence? I wouldn't advocate
for that today. Now, I think what is plausible that at some point it would be valuable for
whoever is developing superintelligence, whether it's like some company or some country, there's
going to be some project right at some point that is like blasting into this realm of true superintelligence
I think probably it would be nice if whoever does that had the opportunity to pause or
go slow for a brief period of time. If they could spend a few extra months or maybe a year or two to
really double-check all their safeguards, right? And maybe increment the capability slowly,
rather than immediately like cranking everything up to 11 and just seeing what happens, right?
That does seem very valuable from a safety point of view and there's a probably also
a bunch of safety stuff that you can really only do once you have the system that you're trying to
make safe. Right now, we have limited AIs, and we can work on the safety for them, but how do
we know that the techniques that work today will be relevant or apply to this future system that is
superintellent, once you actually have the system in some sort of constrained form, you probably can
make more rapid progress on AI safety for some period of time. So, it would be valuable, I think
if they had a little bit of breathing room in that scenario, and so at that point some
kind of short pause becomes more desirable. Now, the ideal scenario for that might be if
they actually just simply had a lead over their competitors. If they were like half a year ahead
of the nearest other lab. Then, they would have the opportunity to slow down for half a year, right?
And that kind of pause seems to have less risk of becoming permanent
because it's self-limiting, like after half a year, another AI lab catches up. Now if it
still seems sufficiently risky, then maybe that lab also decides to pause development or
delay deployment, right, but then eventually, another another so eventually that kind of pause expires
and it seems to have less propensity to sort of just accidentally become permanent, but these
things are very complex and it's not as if I have like a definitively fixed opinion on these,
this requires kind of continuously evaluating things as we get more information about safety,
about political realities, like the strategic landscape, what other risks are there,
like all of these factors have to ultimately come into all things considered judgment about these things.
Yeah. So, it's interesting that you're saying that in some sense we might have
to really have superintelligence to be able to know how to control it, like to study it enough to
solve super alignment. Well, I certainly don't think we should just wait until then to start
working, it's just that it's easier to make... there's a lot of safety work that is
possible to do now that you couldn't have done 10-15 years ago. So back then, you could do
like you could do various theoretical work, conceptual work. But now, we actually have
these large language models and you can see how they behave. You can do these various experiments.
You can work on mechanistic interpretability to try to get better techniques for understanding how
they represent the world and their goals, and how it's shaped by different... there's a lot more
handles on the problem now and and I presume like when you have the actual system in
front of you to finally nail down the architecture and now it's just a question of, I don't know,
scaling up the thinking time, or maybe like adding more comput like at that point you get an
increasingly like clear view of what it is that you're trying to make safe. And even when you have
the final system, like it'd just be nice to have maybe have already prepared some automated tests
that you want to do, like some test suite, like we're starting to have today. Like so, even if you
just had one extra day, right? Maybe that would already give you like some significant little bit
of added safety because you would at least have a chance to run the test suite that you had prepared
in advance. So like there might be a very high premium on at least having a little bit of
time at that end stage, and like an extra week, then might be a lot more valuable than an extra week now.
But how would you know that you are at this end state? Because I mean isn't it
there a possibility that already now, if we scale up things, that we could reach superintelligence
with the current architectures? Yeah, so, that's like maybe makes current AI
safety work more relevant than if we thought ultimately, it would be some completely different
architecture, but presumably you will be testing these systems quite regularly, as
you develop them because like it costs a lot, so you don't want to just start some big
process on a data center that cost you billions of dollars and then look back a month later and
see that it's fizzled out, like you want to keep close tabs on how they perform on various
internal benchmarks, and test kits that you have right as you're training these and
as you get closer to actually transformative AI, it becomes also more important from a safety point of
view, like to see, so maybe you could see that, "Wow this new architecture scales very differently," so
that now every day that we train it like improves by X amount. Looks super
impressive it's not plateauing maybe we can then predict that probably if we keep this going for
another few weeks it will sort of, you know. reach IQ 130, then 150, and like now, it's still going
strong, okay! And it's no signs of slowing down. Then maybe you would know, okay, looks like
this actually could be it, and maybe that's when you would, if you had this time to burn, maybe
decide to use some of that to do whatever final sprint you could on the safety front.
Yeah, I think many people here are probably interested in your thoughts about, yeah, first of all,
timelines, like how quickly do you think we will actually reach superintelligence when we're
talking about different future scenarios here, but like, what is your prognosis?
Well, I take short timelines seriously, including very short timelines. And I think we are now, and have
been for a couple of years, in a situation where we can't exclude even very short timelines.
It probably will take a bit longer, but we can't know that it couldn't happen in like 2 or 3 years.
I mean, in fact, we can't really be that sure it's not already... I mean, right now
for all we know, right in some lab maybe this like guy you know, working the night shift has like
figured out this 'big on hobbling' thing that just, "Wow this was the thing we were missing," now the
same giant data centers that previously strained to reach like Claude 4.5 level or ChatGPT Pro 5
like whatever, right, that like now with this new tweak, they just learn way... they get the
same sample efficiency as humans have. So that with their massive amount of data and our sample
efficiency, they'd get like, you know... it could happen. It's not very likely, but if it
were happening right now, we would not necessarily know of it. Now, I think so we need to
start to like take into account the possibility that there could be some surprise or it could
happen within a just a small number of years. I think probably it will take longer as I said,
but we can't be confident with that. Right, and by longer are we talking, decades or
or some more years? I mean it's like it's so impressive the rate at which things have been improving.
If things were to take decades, like I guess one then thinks what could possibly be the
reason for that? So one is obviously some kind of external factor like some geopolitical disaster
or this like 'Stop AI' movement gaining steam and sort of shutting down, like that's
one type of way in which it could happen. And another is that it could turn out that a lot of
the gains we've had to date, the rapid progress we've seen has been completely dependent on the
rapid increase in computing power that we've had. So, and if you look at it, you see like a
large of it it appears that maybe roughly speaking like half the progress has been due to algorithmic
advances and have to increased hardware, but it might be that the algorithmic advances
themselves are kind of an indirect consequence of hardware. You can run more experiments if you
have better computers. There's more incentive to actually work like for smart people to work
on the algorithms like if it's an important thing. So, but suppose it turned out that the
real driver here was just hardware scaling and that you need to sort of scale it up by an order
of magnitude to get like a constant number of IQ points as it were in capability, right?
Then progress might soon slow down because we have now reached the levels of hardware investment
where there is a limit to how much more they could grow. So if you're talking 'Stargate,' the data
center OpenAI is planning to build for $500 billion dollars. Well, I mean you could you could go a bit
higher. I mean maybe you could spend $5 trillion in theory, right? if you really thought that was
the final push, but after that it gets really hard, right, it starts to become a very large chunk of
the world economy like another order of magnitude of that it's kind of half the world's GDP so at
some point that has like the to the extent that faster hardware available for AI research has
been driven by increased capital investments that will have to it seems start to slow
down at least a little bit soon, and so at that point if that was what was like driving this
rapid progress. You could imagine if we haven't reached superintelligence already by that point,
then maybe timeline starts to stretch out. Maybe what will then need to happen is that we
have to wait for some theoretical breakthrough that makes it possible to do this way more
efficiently, that could happen. Okay, and do you think that this is likely or? I mean, I
think less likely than the alternative. I mean, it does look like we are sort of
within striking distance it seems to me with the continued scale-up and everything
but you know we've never done this before. So really we have to think in terms of probability
distributions here. So, it sounds you think it would be more likely than not that we have
it within a decade? Certainly, if I guess, there are two ways of thinking about this, is like the
inside view. If I just look at the algorithms, the progress, the specific things, then I
would say yes. Then there's this second way of looking at things, which is like, you stand back
you maybe have young children that go to school. You have like look at all the people
who go around their ordinary business and you think do I really think that the world as
we know it will end completely? Within less than 10 years and that you know pension funds are just
wasting their time and that nobody should build a railway line because it's never going to need...
like that is that. So then, there's like some, I guess sanity penalty that comes from this kind
of more common sense perspective, and so my actual views are some sort superposition between those
but this I don't know, this kind of common sense, prior influence is not strong enough to
I think overcome the inside perspective fully on these things. All right, so is it
like it's you think that it will happen soon, but you don't feel that it will happen
soon, or you mean that you should really take into consideration this, like sanity,
for check perspective as well? I don't think it should dominate your thinking. I think
you shouldn't completely lose touch with it. So I think like from a practical point of view,
I would like take this very seriously and spend most of my time maybe operating on the assumption
that we are, exactly how many years is hard to say, but like we are kind of approaching this
critical juncture in human history. But then if there are things that can sort of hedge your bet
like would you not want your child to have a good education because you just assume that computers
will anyway do everything. So probably not like it seems like do some sensible thing just in case.
We turn out to have been a bit nuts and crazy about this thing, as so many other people have been
throughout history about so many other things. Yeah, right, so given on these like the at
least a high possibility of quite short timelines, and also, that we haven't really solved the
alignment problem, even though you said we've made some progress, like what is your P(doom)?
What is the probability that it all goes... Yeah, I haven't really expressed a particular P(doom)
because I think it might also turn out to depend quite a lot on your definition of doom.
So I think you can sort of imagine one class of scenarios where things, just completely, like all
the values lost kind of thing, clearly doom, and then like another utopia, right so not doom, but
then I think there's a a broad class of scenarios which are such that even if we could now see
exactly what will happen we might still be unsure whether to count that as doom or not. Like it might be...
Future is really weird, and some of the things we now thought were really important and valuable
have disappeared. But other things have appeared that might also have value and we might not really
have some easy way to tote all of that up. Like how much is it worth to get rid of factory,
farming, or to get rid of third-world poverty, or to get rid of cancer, and then maybe you lose
some of the various things that humans think are very important, like I don't know the idea
of being a breadwinner or having discrete minds in separate crania as opposed to one big blob,
or don't know exactly what it would be, but quite likely the future will be very weird in
such a way that it might contain both elements that some people might regard as negative and
and other people might regard as positive. And so, I mean in fact, I think that maybe, that middle
possibility might be perhaps more likely than either the clearly not doom or the clearly doom.
Okay, so, but if we're speaking strictly about the extinction scenario that
everyone dies, like do you think that there's a high possibility of that? No, I mean, so then
there is another complication that comes in here, which is that even if we have a misaligned AI that
doesn't care about us, and that has the power, if it wanted to eradicate all humans.
There might be other reasons, instrumental reasons, for that AI not to do so. It might,
for example, want to be cooperative with other AIs that might exist in the universe
that might care about either humanlike creatures or care about general
ethical principles or norms of cooperation. And because it would be very cheap for an AI to
to preserve humanity you know, maybe even give us the whole planet or the whole solar system,
right? There's a lot of space out there. They could get like 99.99999, for a lot of 9s of
all the resources for its preferred use and still manage to keep humans around in like some
kind of paradise-like environment for us, if it so wished. So, it wouldn't require much it seems,
either of intrinsically caring for us a little bit or placing some instrumental value in
having us around that might then result in even the case where you get a radically misaligned AI
that ends up in a position of absolute power where you still don't get human extinction from that.
Okay, yeah, it's very interesting because when I read "Superintelligence," like it
feels as if your views on this might have changed a little bit because then you talk about it
really as, I mean that it would really want to as quickly as possible start to colonize the
the universe because if it waits, it will automatically lose many stars and galaxies,
just because it waited a little bit longer until it started sending out probes, like because there
parts of the universe that it can't reach anymore. Do you think that this will be outweighed by
this theoretical consideration? I mean, it might want to reach quickly, but hopefully
not killing us would not cause much of a delay. in that regard, I mean, you can still have some
massive place in the desert where like huge starships go up, or you do some fusion
like whatever, right? That doesn't require wiping out all humans around the planet. Even if you
sort of wanted to like invest massive resources in just getting something out there that could
then start a sort of self-replicating process, spreading through the galaxy and beyond. If you
want it to be even more economical, you might even imagine uploading humans and kind of continuing us
on a more efficient substrate, or like there are various possibilities. But it does complicate the
the question of assigning a probability to human extinction because it might be quite a
different probability than the probability of misaligned AI taking over. And then it
becomes like whether you then count that as doom or not then, might depend quite sensitively on your
value function. Like if you mostly care about people, normal people with normal human values,
having great human lives, living happily with their friends and family, and doing the humanlike
things with art and cinema and perfect medicine and like all of these things. If that's most of
what you care about, then this, in some of these scenarios, like things might be like 99% of
as good as they could possibly be. If on the other hand, you're like a utilitarian of the sort
that cares about the total amount of utility in the universe, and you would want to transform
all these galaxies into hedonium or something like that like matter optimized for feeling pleasure
then the scenario might be basically as as bad as total loss because like Earth is in a
significant crumb in this vast sea of resources. So if all the rest of the stuff were used to,
I don't know, make paper clips or what whatever the AI happens to want. If that doesn't happen
to coincide with what this kind of aggregative consequentialist value perspective would want,
then it might count as as basically rounds to zero. So you might get a radically different
opinion on this scenario, whether it counts as doom or like an amazing success, depending on
which value function you have where different people might like choose these different value
functions. Neither of these value functions is so crazy that nobody would have it like so, that's an
example of what I was alluding to earlier that there might be this big category in the middle
where whether it counts as doom or not depends quite sensitively on how we evaluate it in ways
that might not be very clear or obvious to us now even if we could see exactly how things would
play out, which we can't. Okay, yes, I still find it interesting because it really feels
quite different from reading your book. And I wonder if like this idea that it
would want to care for humanity because of some future interaction with other Super AIs
and corporation. I mean why why would it specifically care for us and not other animals
and then see us as sort of parasites on the Earth, or like, is this really something...
it seems like a very important question if you really believe that this is likely
to happen that it will sort of leave the human in peace or if it's more likely that it will do
something that it just won't care about us? I mean that that seems to be very-very important
to sort of straight out. Yeah, I mean, so I didn't express like a likelihood on this, but
it like one reason is that there might be other um civilizations out there that manage to align their
AI for example. I would hope that if we align our AI, it will at least care a little bit about
other creatures out there on other planets, other ape-like creatures or octopuses that became
sentient and develop a tech civilization, and like if our super intelligence like eventually in
500 million years we come across some octopus civilization. I would like to think that we
would then want to be nice to them. And so if there are at least some AIs like that around
then they might engage in trades and stuff like that. That you know would
promote these values. That doesn't seem that unlikely, and what exactly will happen
in that space very hard to predict, but at least it seems like a live possibility that from our
current sort of occluded perspective we are in no position to dismiss. I would also say by part
of explanation so you notice like some kind of I don't know tonal shift or shift of emphasis or
something like that compared to superintelligence and it is true and and part of that is the
context. So when I was working on that book, um the whole issue of AI safety was completely
ignored by basically everyone. Certainly nobody in academia took it seriously, aside from Eliezer and
like a few other people on the internet. The whole world just dismissed it as science fiction or like,
and to the extent that people were interested in AI, the only focus was like how can we actually
get it like so, we have academic departments trying to make progress in this and that and
so at that point, that seemed to like a clear deficit on the side of understanding what we
were actually working towards and understanding there could be risks which seemed really important
because if you could understand that, then we might also actually use the time available to
prepare ourselves to avoid the pitfalls, right? If you have some conception of what could go wrong,
you might take action to prevent things from going wrong. In the intervening years,
there has been a big shift and now there is much wider recognition of the idea of AI safety
as being important, like it really has become part of the mainstream conversation, and
including among sort of serious people. You hear world leaders talking about this,
you hear the tech leaders of big companies, and as I said, the Frontier AI Labs now have like
research groups working hard on this, and there are a bunch of other organizations as well. So,
so the situation on the ground has changed a lot. So now, I think there is less need for
me to keep harping on the same thing again when that point is already quite widely recognized and
so I'm focusing more on like other insights that maybe I haven't yet sort of percolated,
as widely and trying to bring those to people's attention. Right, right, so, your focus is now on
more on other things. But still it would be interesting to just get like sort of a feeling
because I mean, you were obviously quite worried about this when you wrote the book and
some things have changed now. Like given everything you know and the situation we're in.
If you really had to put a number or or at least a range on the probability of
of like complete human extinction would it be like a two-digit number, your P(doom)?
Well, yeah maybe, but I don't know. It also might depend on like, sorry to be this kind of
"Depends on what you mean, the definition, like depends on what you mean by human?" But it might depend on
what you mean by human like if there are like only uploads left then no biological humans for example
Does that count? I think maybe it wouldn't count. I mean, or do you think
that we or like that I would be myself if I was uploaded? Under certain conditions that
I would think so, and in fact plausible to me, that the best path involves uploading at some point
you know. I would favor a world probably where different people were free to choose their own
trajectories. But ultimately, I don't see why we need to use meat to do the computation.
When like semiconductors, might be ultimately much more efficient. But all of these things it,
would be nice. I think like if we tried now, just to make up our mind about a host of those
kinds of questions, we would kind of be bound to get at least one of them wrong. And so what
we would hope for, I think, is maybe to end up in a situation where we're able to like think a lot
harder about these things and deliberate, maybe with AI advice, rather than having to sort
of implement our current best conception of the future we would want and then locking ourselves
into that. I think that likely would sort of miss out on a lot of really exciting possibilities.
So that's one, there are other factors as well. Well, I don't know how much time we have.
Like I have one recent paper. It's not really a fully developed paper, but "AI Creation and the Cosmic Host,"
it's called, and which introduces, it's quite handwavy, but the idea being,
that if we give birth to superintelligence, it will enter a world in which there quite possibly
are other super beings already in existence. So these could be other AIs built by some alien
civilization in some remote galaxy. They could be like in the average interpretation of quantum
mechanics, there are many branches. And so there might be other branches of Earth originating
life that have produced or will produce different forms of superintelligence. If the simulation
argument, right, is to be trusted. We may be in a simulation the simulators
then would be presumably super-intelligent and be super beings. And of course traditional
theological conceptions as well, right? God is a super being, usually super intelligent. And so in
either of these cases that there would be this kind of cosmic host consisting of these other
super beings. And one important decider for us going forward here I think is that if we create
superintelligence, we would want to make it such that it can get along with this cosmic host
and maybe adhere to whatever norms might have been developed within this cosmic host. And so
that I think adds a dimension that has to some extent been missing in the classical discourse
around AI safety where the attitude really has very much be how can we sort of maximally control
the AI to implement the highest degree of our own expected utility maximization, just taking our own
preferences into account. There might be this much larger picture where we are very small and very
weak and very new, and there is like this kind of incumbent set of super-powerful beings and how
our superintelligence interacts with that might be like a very critical part of how well things
go and so I think the upshot of that is a bit unclear. We don't know precisely what that means
but I think it slightly increases the chance that we ought to develop superintelligence and also
I think that we should approach it more with an a little bit of an attitude of humility that
that we don't know very much here. There's more things in heaven and on earth than is
dreamt of in our philosophy, and we are, and so I think it's like there is some different mindset
that maybe also comes from considering that type of perspective. Okay, yes, yeah. Obviously a lot
of things to consider here. Well, I'll try not to stretch out our time too much because I
know you're you're sitting up late now. But can I just, because I feel that I still
don't have any clear idea of where you are on this extinction scale, really, and
this is something that I think worries me, and lots of people, like would you say that
your view now, like you still acknowledge that there is a risk that everyone just
dies, but you don't think you think that's a risk, not like a major risk anymore?
No, I mean, it's like a pretty serious outcome if it happens right, so it's like
So, like even if the risk was quite small it would be worth
taking seriously, but it is worth bearing in mind if we are kind of concerned about our own
personal deaths that it is something that is likely to happen anyway, and then you might
say, well, it kind of matters when it happens right? It's not just whether you die, but like rather not
be too much, and so then you might think in terms of life expectancy, but then if you actually
start doing the math on that, very plausibly our life expectancy goes up dramatically if
we develop superintelligence even if like misalignment risk is quite high because if AI
is developed and it's successful, if it's safe, then it could do a lot to advance medicine, you
know, invent rejuvenation therapies, anti-aging medicine and so forth. And so our lifespan and
conditional going well, would be very long. So yeah, you can start to do... that goes back to this
paper that I'm working on, which I'm like, have more to say on that, hopefully
in the relatively near future. But the upshot is that even if one is concerned mostly
in terms of personal survival, then even then it is not at all clear that should make us
kind of anti-AI or like wanting to slow it down by a huge amount. All right, but I
suppose that you can make that argument even if the risk was like 99% that everyone would die?
And I mean, most people would probably not like to take that bet. Well, if it's that big,
then the argument might not work. I mean, then it becomes more complicated. So, you might think
of esoteric ways in which we might achieve very long lifetimes even if we don't develop AI. But
also, you might not value future life expectancy linearly. You might have a kind of diminishing
returns or time discounting. And so then and especially if you thought that the risk was
going down over time, if the risk now was 99% but it was falling by like 5% a year or something,
you would prefer to wait. But at some point once the risk is low enough or once the rate of
further decline is low enough, depending on various parameters, then you would want to favor taking the
plunge. So, but there are other things, of course like aside from our own personal survival, we might
also care about other things, like the survival of the human species or Earth or anything like
the kind of human values in the broader sense, and other things. So it's very hard to form
like an all things considered view about these big picture questions. I call it kind of macro
strategy, like trying to figure out how all of these different pieces fit together. And it's
it's quite plausible that we have overlooked at least one crucial consideration like
some like fact or argument, or idea such that if only we discovered it or fully took
it into account it would kind of quite profoundly change our opinion not just about the details
about things, but like about the overall direction that we should be trying to go in like should
we want more global coordination or less faster AI progress or slower? Like do we want more synthetic
biology or less, more surveillance or less? Like these big macro strategic parameters, it can be
very hard to form any kind of confident opinion about what the answer to those kinds of questions is.
Yeah, right. Thank you for all of these very interesting thoughts. I realize that
there are are many things to think about here. Would you have time for like one question from
the audience? Well, let's do one then.
Okay, if we have one. Someone who's eager to ask anything. Yeah, we have one there.
You can say it, and I can repeat it because he's hearing this mic. Do you think that we'll
take advantage of our own super intelligence that we create to fend off threats from other
superintelligences that could reach us? So if if we use Neuralink to sort of merge with
superintelligence. If we can, then use that. Yeah? So this will be like an evolutionary change.
Is there a possibility that we'll merge with AIs somehow? Yeah, my guess is that
would happen after we have radical superintelligence that then perfect these technologies that would
allow really perfect brain computer interfaces or uploading or other things of that sort. My
guess is that AI is moving quite fast and there are all kinds of complications with
like actually implanting things in the brain that sort of sometimes gets elided in media
headlines, like you get, like risk of infection you get like it moves around a little bit, and you have
to... so, I think it's great for people with disabilities who can now, you know, walk or see
or something thanks to this implant but it's quite difficult to do something that is better than like
a normal human. So we can interact with computers already through our eyeballs like a 100 million
bits per second directly into our visual cortex, large part of the brain, customized specifically
for processing this information. Output is a little bit more limited, but still, I'm more
limited about my ability to think than my ability to sort of speak and type, usually, right? It's not
as if I could like be a 100 times more productive if I could type 100 times faster. So that would
be my main guess. Now it is possible that if you have high bandwidth interfaces, the brain could
learn to somehow leverage external computational substrates in ways that would unlock some kind
of synergy, like if some large pool of external working memory that the brain somehow
could learn to access, if it were like in high bandwidth communication with it over a long period
of time. You can't completely exclude that or like if you had many humans that kind of could
telepathically communicate. Maybe initially, it would just be telepathic communication, which
would not really be that different from speaking or something, but maybe if they were like connected
like that over a long period of time, maybe they could eventually learn to have thoughts
that sort of... like a single thought, complex thought, that sort of expanded over multiple
brains without the whole thought being located in one of them. It's possible that
these more interesting things, but it just seems like if your timelines on AI is short,
like some single-digit number of years, it's just hard to see how these things would kind of happen
on in a way that would really meaningfully change the needle, within that time scale, would be my guess.
Okay, yeah. Thank you very much. Let's hope then that the superintelligence will be
kind to us. And it's been so great to talk to you. I'm really sorry for sort of pushing
a little bit on time. It's just so extremely interesting to hear your thoughts on this, and
I think we should give a big round of applause for Nick Bostrom. Well, well, thank you, everybody,
and it was fun. Thank you, Jonas. Thank you. And yeah, have a good night now, and yeah,
good luck with all the research. I look forward to reading your papers also in the future.
Loading video analysis...