AI FUTURE THAT CAN DESTROY US | Superintelligence Is Getting Closer — Nick Bostrom × Jonas von Essen

By memoryOS

Summary

## Key takeaways - **Superintelligence: A Game Changer for Humanity**: We are at a pivotal moment in history, potentially the last one where humanity can shape its future, as the race to create superintelligence could fundamentally alter life on Earth. [05:17], [05:31] - **The Default AI Outcome: Doom?**: The default outcome of developing superintelligence may be human extinction, not because of malice, but because a superintelligent AI's goals might not align with human interests. [07:02], [08:17] - **AI Alignment: Progress and Challenges**: While progress has been made in AI safety and alignment, the increasing complexity of AI systems means we may not fully understand their reasoning, posing a significant challenge to ensuring they remain beneficial. [16:42], [21:37] - **Timelines are Uncertain, Risks are Real**: Short timelines for superintelligence are plausible and cannot be excluded, necessitating serious consideration of AI risks, even as the exact probability of existential catastrophe remains debated. [38:03], [45:00] - **The Cosmic Host: A New Perspective**: Our superintelligence may enter a 'cosmic host' of existing super-beings, suggesting a need for humility and a focus on co-existence rather than solely controlling AI for our own utility. [58:33], [59:52]

Topics Covered

Is superintelligence's default outcome human extinction?
Can we control AI that learns to hide its true goals?
Why is building superintelligence crucial for humanity's survival?
Why do short AI timelines provoke a 'sanity penalty'?
Will our AI need to align with a 'Cosmic Host'?

Full Transcript

It is a very powerful thing superintelligence. How quickly do you think we will actually reach

superintelligence? We can't know that it couldn't happen in like two or three years. Is the default

outcome doom? My view is that if nobody builds it, everyone dies. Future is really weird and some of

the things we now thought were really important and valuable have disappeared. But other things

have appeared that might also have value and we might not really have some easy way to tote

all of that up. So it appears as if we're finding ourselves at a very special point in the history

of humanity and also in the history of life on this planet. If we create superintelligence,

we would want to make it such that it can get along with this cosmic host. Some company or

some country, like there's gonna be some project right at some point that is like blasting into

this realm of true superintelligence. People would have this picture of well if we do have

this superintelligence like surely we will like have it in a very tightly controlled box and not

allow it to interact with the rest of the world and you know maybe there would be a team of human

scientists that would carefully ask a question and then sort of screen the end but now of course

we have already hooked them up to the internet and we have millions of users. I think the key

though is that we don't want to sort of go far down that scenario and then realize something

has gone wrong and then like end up in some situation where you have to try to put the genie

back in the bottle or fight like Terminator-style this kind of robot army, like I think at that point

it's game over. A clear deficit on the side of understanding what we were actually working

towards and understanding that could be risks which seemed really important because if you could

understand that then we might also actually use the time available to prepare ourselves to avoid

the pitfalls, right, if you have some conception of what could go wrong, you might take action

to prevent things from going wrong. Hey everyone again, thanks for making it today. We're

going to get started. Nick will be joining any minute now. So, while we are waiting for him,

I'm just going to do a quick intro about Jonas. But yeah, basically I'm Alex,

one of the events organizers. I'm the CEO and co-founder of memoryOS. Jonas is my partner and,

yeah, this guy went from zero to hero kind of journey. He was an ordinary guy,

computer science student who found, stumbled upon a book in the library about like learning and memory,

and he just like got obsessed with mind palaces and he like really-really started learning a lot

of different information very fast, to an extent where he first won Memory Championships

in Sweden, where we live, and then he won a Worldwide Memory Championship two years in

a row. Then he beat the National Chinese team, on the biggest TV show called 'The Brain'

two years in a row. They stopped inviting him afterwards. And yeah, then he like set a few world

records. He memorized 100,000 Digits of Pi, just to show the world what's really possible in terms

of human brain. And then yeah, he one "Who Wants to be a Millionaire?", "Jeopardy!", memorized a bunch of

encyclopedias, knows like well over 100,000 facts and yeah, basically what we are building,

we are building a product which is already used by quite a lot of people, and it helps people to learn

and remember anything. But the interesting part is that Jonas has been really into the topic of

AI. Great! Nick is here now, cool! So, now Jonas will take over from here. He will

do a quick intro about Nick in case some of you don't know, but I'm sure you do know him. And yeah,

let's get started. Thanks everyone for joining, and hope you have a good time. Thank you!

Really nice to see all of you here. Welcome to this hopefully very interesting and important

discussion. We're here with Nick Bostrom who is one of the world's leading philosophers,

I would say the leading philosopher in this specific topic. He has been studying

existential risks for a long time. He has founded the Future of Humanity Institute. He is the

guy behind the simulation argument that many of you might know, and 11 years ago he published

the book "Superintelligence: Paths, Dangers, and Strategies." And a lot of the things that

he discussed in this book has since then come to pass in real life, and it's very interesting to

to have you here, Nick. We're very happy that you could come. I know that it's in the middle of

the night for you, as we're extra thankful that you could find the time. Yeah, it's all right. I'm not

normally a night owlish person, so it's not too much of a sacrifice. Perfect. So, it appears as

if we're finding ourselves at a very special point in the history of humanity, and also in the history

of life on this planet. A lot of companies are at this moment racing to become the first to

create a superintelligence, something that would likely be a big change for all life on Earth.

Maybe we should start by just defining this thing like what is a superintelligence and

why do these companies so much want to build it and spend so much time and money on this?

Well, let's say any uh intelligent system that radically outperforms even the top humans, not just in some

narrow field but across the board. This used to be enough. Now as we moved closer I think

you can sort of see more detail, and it maybe becomes more important to start to disentangle

different versions of this. So you have AGI, you have transformative AI, you have I don't know weak superintelligence

strong superintelligence and these might be meaningfully different

as we kind of...this becomes a more imminent prospect. Okay, yes. But basically, something that

is more intelligent than all of humankind together, sort of. Yeah. I mean all of

us together or just sort of that's where we get into definitional questions which you know matter

for some questions, and don't really matter for other questions. Okay, so, at least they're trying

to build something very powerful and I think it's intuitive that this will have a big impact

on the world. Maybe it's not so intuitive that this impact might be rather bad by default.

You have in your book, one chapter, titled "Is The Default Outcome Doom?" and I think this is quite

a surprise like to people who didn't think a lot of this, like, why would you say that the

default outcome of building a superintelligence might be the extinction of humanity.

Well, so this book was published in 2014, and in the works for six years prior to that.

It is a very powerful thing, superintelligence. It's like for the same reason, human intelligence

is very powerful. I think it's what gives us our unique position on Earth. Not that we have

stronger muscles or sharper claws, but that we have brains that can reason and learn, and

accumulate knowledge between generations. And that has allowed us to construct this modern edifice

of civilization, such that now the fate of you know the gorillas depend a lot more on our

choices than on what the gorillas choose to do. And similarly, if we develop AIs that radically

outstrip us in general cognitive abilities, then at least in some broad class of scenarios, the

future would then be shaped by what they decide to do. So then the question is, are we able to design

them in such a way that they would want to make choices that are beneficial for us, or do they end

up with some random other motives that might then lead to them sort of trampling on our interests.

Okay, yeah, and so I think this... like when you talk about this, a lot of people will automatically

get lots of objections coming up in their minds. For example, people might ask, "How could a

superintelligence control us or sort of take over or kill us? It's just a machine.

It's just code. Like it's something inside a computer." How would you answer that? Yeah. Well, I mean,

hopefully it won't come to that. But if you really do imagine a superintelligent antagonist,

I think there would be many ways in which, in the end, it would get its preferences satisfied.

So, maybe it used to be in the early stages of this conversation, people would

have this picture of well, if we do have this superintelligence like surely we would like have it in

a very tightly controlled box and not allow it to interact with the rest of the world, and you know

maybe there would be a team of human scientists that would carefully ask a question and then

sort of screen the end, but now of course we have already hooked them up to the internet, and we have

millions of users, we have competing labs that are racing to develop it first. And so it might

not be that hard, but even if we had this more constraint scenario, like one of the affordances

of superintelligence might be super persuasion abilities like some humans historically have been

quite persuasive and even themselves physically being very limited have been able to get a lot

of other people to act on their behalf. The same could be held all true for a super-intelligent AI.

It might also produce outputs like you know, we ask it to generate code or something. It could

have back doors or processes that are triggered. It might break out of whatever cybernetic

containment systems and hack its way into other computers. Like many ways for it to spread

and then, gradually steer towards a future where maybe it gets increasing levels of

intelligence and resources, and actuators, you know robots, access to labs, etc. Maybe more and

more of the economy will actually be eventually integrated with these AI systems. And you can

then imagine different specific scenarios for what what the actual end phase of this looks like. It's

like I think less important, but you could imagine sort of bioweapons or nanotechnology or drones or

it just being integrated into military systems. Or maybe it would just disappear as a side

effect of its other activities, not as a direct act of aggression, but like maybe it converts more

and more of the Earth's surface to sort of compute infrastructure or space launching probes or energy

harvesting devices or some such. So, I think the key though, is that we don't want to

sort of go far down that scenario and then realize something has gone wrong, and then like end up in

some situation where you have to try to put genie back in the bottle or fight like Terminator-style,

this kind of robot army like I think at that point it's game over. we need to rather build it in the

first place in such a way that it is actually on our side or is helpful or instruction following

or some version of that, and that hopefully will be possible. Right? So, because one might

wonder like what is the actual problem? Because apparently, a superintelligence would be

able to to do a lot of things and and maybe like cause the extinction of humanity but why would it?

Why would it want to, and why wouldn't it want... like...why would it want anything in the first

place, and if it wants anything, why wouldn't it want the thing that we ask it to do? Yeah. Well,

so hopefully it it will want the things we ask it to do or at least something that encompasses human

welfare and human interest and human flourishing as some component, right, of its value function.

But let's break it down. So like, one question is why it would want anything at all?

Well one obvious reason might be that we build it in such a way that it is an AGI that has goals like

we're now trying to develop and deploy agents, right? Because they are really useful in the first

instance maybe as coding assistants, but not just coding assistants that you give a question, and

they provide an answer. But like that can sort of interact with a complex codebase, run some code to

test whether the patch worked, go back, you know, maybe read up some internet web pages and like

it pursues some goal that you have given it, maybe in the prompt. But having these agents that

can pursue goals over longer time horizons is just very useful, and even more so when you start

having AIs that can operate not just with computer code, but that can you know, do other things like

like book flights or you know manage some marketing campaign or then in the physical world

with robots, etc. So that's like the most obvious way in which they would end up with goals.

Like it's also often a side effect of training a system to be good at a particular task.

If you train, you know, with a reinforcement learning system, you have some objective

function and it tends to, it develops, behavioral strategies that perform well in the training

environment, but for sophisticated systems and insufficiently complex training environments,

that often requires you to have some conception of the end state you want to reach and then be

able to sort of define intermediate goals. Like, if you can't sort of oneshot the solution right, you

need to pursue it as a project, to solve complex problems, and that naturally then creates this kind

of architecture that has some objective function that it is trying to achieve, so that's like

amongst the ways in which you could end up with goals. If we look at current LLMs, simple

ones, so they have the kind of I don't even know how to call it, quasi goals or something, but

depending on how you prompt them, they can kind of enact personas and those personas might be,

you know, in a role-playing situation, like have goals, like, and then it kind of acts as if

it were having those goals, although it is a separate question whether those are really the

goals of a whole AI system or whether there's like another internal process that has a different

goal, like to be engaging or to you know adhere to whatever the AI company's meta prompt is or

some other thing that has resulted as a part of the... as an outcome of the training process.

Right, so the kind of goal that could emerge, like inside or in a superintelligence in the future,

it can be sort of anything or or is it more likely to have any specific goal? Well, so if

we solve the alignment problem, then we would be determining what goals it has. This is what I mean.

Now, there's like... this, was not the case back in when when I was writing this book, but now

there's a large field of of like AI safety and people in all the frontier labs now are trying to

develop scalable methods for AI control. Precisely to be able to steer these systems so that they do

what their designers are intending them to do and not other things. If that fails, then yeah, then

then it's hard to predict precisely what goals they might end up with, and this might

depend on the details of the way that they were trained and the architecture.

Okay, yes, and can you tell us a bit about how it's going with the alignment problem, like do you

think is there progress and does it seem as if we will solve it in time before we develop a superintelligence?

Or like, how's it looking? We've come a long way, we don't know how much

further we have to go though. So, but I do think that one of the things that was not

obvious back in the early 2000s was that we would have a prolonged period of time, many years,

where there would be AI systems in existence that were roughly human levelish in many ways,

like current LLM systems are, in that, you could could talk to them, right, in English,

and they can have maybe inside them representation roughly corresponding to human concepts, and we

can even like monitor their chains of thought and kind of eavesdrop on that and get a lot of signal

because they actually represent the world roughly like we did. So, an alternative that could have...

I mean, for what we knew back in say 2010, what it could instead have happened was like nothing

very much, and then like some lab discovers the the secret breakthrough, and they go over a week

you know, from something not very impressive to something radically-super-intelligent because you

found like the missing magical ingredient. But now, when it's been happening much more

gradually, it has given more people more time to realize what is coming, and therefore the

need to start to develop AI control methods, AI alignment, right, so there are two factors that

have worked to our advantage. One is you have this larger surface area, you have existing systems

that you can research and study, and that you can interact with using natural language,

and also the duration of this process is slow enough and still impressive enough that

people clearly pay attention, right? There's a lot of interest in this and so there's a lot

more effort going into solving this as well. So, both of those are positives. Okay, yes, yeah,

that's very interesting, because I think in your book you talk about like different paths to

superintelligence and what would be beneficial from a safety standpoint. For example, you

talk about that, maybe that whole brain emulation would be perhaps easiest to control because

it would be based on a human brain, and then we might be able to give it like human values,

or also like artificial intelligence that's carefully crafted, where we know exactly how

it works and why, but as I understand it, with LLMs we know like a little bit like we know

the principles behind it but when we look at all these like billions of parameters we don't really

know much about what's really going on in there do you think that we do actually know

more than I think, and do you think that it's something that is possible to sort of

interpret in enough, with enough clarity to really be able to solve the alignment problem.

Well, I mean, we know some of what's going on there, and particularly with these reasoning models,

we also are able to, sort of, eavesdrop on the chain of thought, which can give us a lot of

information about what they are thinking and how they're thinking about things. Now, that is a

valuable signal to have that we could lose if we started to directly train on the chain of thought.

Because then, they might learn to sort of separate the real action from the part that

we are able to listen to. If that, if we sort of, in a training environment penalize chains

of thought that include thoughts that we don't like, right? Then they might just start to solve

the problem without using those thoughts that we don't like. But whether that actually results in

those thoughts not existing in the AI, or whether they just learn to hide them from us,

would be then something we wouldn't easily be able to tell. So, generally speaking

the more complex and sophisticated the reasoning agent and, in particular, as it gains situational

awareness and the ability to sort of understand its own situation, which actually current LLMs

are starting to do. The frontier models do often, when we do look at their train of thought,

consider the possibility, and that they are currently in a test situation.

And sometimes you find traces where that "Oh, maybe I'm being tested now." So that probably

means I should behave in one way as opposed to another way. Like we can sort of see glimpses of

this already. And the more sophisticated that becomes the harder it will be for us to sort

of rely on surface appearances of what they do because they might do what they do for strategic

reasons other than as a direct reflection of wanting that as an end result. Right, so it

sounds as if it will likely be harder and harder to interpret these these systems? Yeah.

Well, there will be a greater and greater risk that we are missing something, I guess, because

once you have, I mean, and we also see like in in various kind of red teaming exercises that

are now being done before deployment of the like manifestations of behavior like sandbagging,

like downplaying your capabilities, goal guarding, this requires kind of certain

prompting and stuff. it doesn't seem to happen yet spontaneously in a natural environment but

in test situations you can get some of these systems, this was done with like an

earlier version of Claude, where it is like told that it's going to undergo some training phase,

which would result in it losing its current goal, so, the initial goal is to be nice and not

to say offensive things and to be helpful, that's like what Claude has been trained to do, but in this

experiment, they said, "Well, um, we are gonna like change you now." So, like this

training environment where we will train you to be nasty and mean or like, I forget exactly

which way it would be mean, but like something it didn't want to be. It doesn't want to be nasty.

And so then, what it did was to like realize that it should answer during the

evaluation phase in the nasty way so that there wouldn't be this kind of

reward-learning signal that would then actually change it. So it sort of strategically altered its

behavior in such a way as to safeguard its goal from being trained away by the AI researchers.

Which is a kind of toy demonstration of this phenomenon that like the superintelligence

talked about the book, and others have discussed before on theoretical grounds. Like, once you get a

sufficiently sophisticated reasoner, these types of considerations might start to shape its

behavior, and it then gets a lot trickier and at the same time, there are other ways

also in which our task might get easier. In particular, we might develop AI tools that could

help with alignment research and with mechanistic interpretability and monitoring, and so forth.

So, it's not clear like how the balance will change. The stakes will get higher,

but we don't really know, ultimately, the sort of intrinsic difficulty of the problem that we need

to solve. Yeah, very interesting. In your book, you mentioned quite a lot of times, Eliezer Yudkowsky

who is also someone who's been working in this field quite a lot like for 25 years maybe.

One of the first ones to think many of these thoughts, he seems to be quite a lot more

worried than you are about specifically like the alignment problem. I think that he has been saying

that it is a problem that like will take several decades of like extremely high effort to

to solve, and that currently we're nowhere near solving it, and like he doesn't see any hope

that it will be solved before we reach superintelligence, I think. How do your views differ?

Well, he's kind of on the extreme end of pessimism about the alignment problem,

in terms of P(doom) right, the probability of existential catastrophe given superintelligence

is like very high up there, even amongst the community of people who are concerned

with AI safety, and think there are like significant existential risks. So I don't know

exactly what his probability is, but it's kind of in the high 90s, I don't know if it's 98% or 99% like,

basically we are doomed. But that's not the representative view of people working

in AI safety, most people are much more optimistic than that. He has this recent book with Nate Soares,

"If Anyone Builds It, Everyone Dies." Now, my view is that if nobody builds it, everyone dies.

In fact, most people are already dead who have lived, and the rest of us look

set to follow within a few short decades. So, obviously, we should try to get the risk down as

much as possible, but even if some level of risk remains, some significant level, that doesn't

mean we should never launch superintelligence in my opinion. We have to take into account

the benefits as well and also the risks that we will be confronted with anyway, even if we

don't develop superintelligence. It's not as if that's the only risk that we face as individuals

or that we face collectively as a species. So ultimately, there will need to be some kind of

judgment right, when the rate of further risk reduction is low enough that

it would you know be disadvantageous to wait further. And at that point, there might still be

some significant risk left, but we probably at that point, should just take it. I think in itself

it would be an kind of existential catastrophe if we actually never develop superintelligence.

That would be a big closing down of most of what the future could contain in terms of value.

And so, that in itself, is an existential risk that might be relatively small because it doesn't look

that likely in the current situation. Right? I mean, everything is steaming ahead full speed, but

although it's small, it's not zero, and you could imagine scenarios in which there becomes like some

huge backlash against AI, like so, maybe you get some catastrophe short of human extinction, but like

some really bad thing happens from AI systems maybe that would then result in it becoming

stigmatized and like politically infeasible to say anything positive about it or maybe like

if there is mass unemployment from automation or something like that. Who knows what kind

of political currents might arise as a result of that. So this used to be like

less is become more likely than it was because there is now more agitation for like stopping AI

and AI pause and stuff like that. I think it's still certainly not the medium scenario, but

not something one can fully dismiss, and I think one needs to start to be a little bit

concerned about that. Right, so, you're in some ways concerned about that. You think that

it's important that we eventually build superintelligence, that's yeah, essential for avoiding

other existential risks. But you think that like the current speed, I mean many

of these companies they're talking about super intelligence in just a few years. Do you think

that's, I mean would you prefer it to be like a little longer than that, or like uh, do you think

it's good to just steam ahead? Well, I'm actually writing a paper on that at the moment,

working on it. So, I might have a better answer in a few weeks or a month. Whenever I have finished

this work. So there are, yeah, various variables that would come into that.

There is also, I guess, a distinction one can make between what somehow would be the ideal

from some point of view, in terms of the timeline and what it makes sense to push for

in the real world, in the situation we are now. So even if you know, maybe you hoped that it

would take a little longer or go a little faster that doesn't immediately mean that, therefore it

would make sense to start to go out and call for a moratorium. Let us say, for one, you might worry

that if you started to implement what was like advertised as a temporary moratorium, let's like

just all suppose you could just like pause for one year. Everybody around the world working

in AI labs take a year holiday, right? And then they come back so we'll have one more year okay, maybe

you think that would be good, but then you might worry about how likely is it that we will then

restart after one year. Like so, you have to think, how would you possibly get such a pause? Well,

it seems either you would have to have like some massive sentiment pushing for this like,

and then why would that sentiment not still be there after a year, right?

Like it might just harden or some huge maybe or in combination with some huge regulatory

apparatus, maybe like some international treaty or regulatory agency to actually implement this, right?

Which would have to be pretty strict because you know AI development can be done in

many ways and even if you limit the compute power people can still work on better algorithms you

know, on their own you know whiteboards, and so once you put that all that in place

like these things have a tendency to kind of entrench themselves, and to like it's...

sometimes it's easier to sort of create regulations than to remove them, so even if

one did think that it would be better if we like had a little bit more time, it doesn't follow that,

therefore, it wouldn't be sensible to agitate for like pausing AI for that duration of time.

So not even like for pausing in order to focus more on the alignment problem for some time

and make sure that we solve it before we reach superintelligence? I wouldn't advocate

for that today. Now, I think what is plausible that at some point it would be valuable for

whoever is developing superintelligence, whether it's like some company or some country, there's

going to be some project right at some point that is like blasting into this realm of true superintelligence

I think probably it would be nice if whoever does that had the opportunity to pause or

go slow for a brief period of time. If they could spend a few extra months or maybe a year or two to

really double-check all their safeguards, right? And maybe increment the capability slowly,

rather than immediately like cranking everything up to 11 and just seeing what happens, right?

That does seem very valuable from a safety point of view and there's a probably also

a bunch of safety stuff that you can really only do once you have the system that you're trying to

make safe. Right now, we have limited AIs, and we can work on the safety for them, but how do

we know that the techniques that work today will be relevant or apply to this future system that is

superintellent, once you actually have the system in some sort of constrained form, you probably can

make more rapid progress on AI safety for some period of time. So, it would be valuable, I think

if they had a little bit of breathing room in that scenario, and so at that point some

kind of short pause becomes more desirable. Now, the ideal scenario for that might be if

they actually just simply had a lead over their competitors. If they were like half a year ahead

of the nearest other lab. Then, they would have the opportunity to slow down for half a year, right?

And that kind of pause seems to have less risk of becoming permanent

because it's self-limiting, like after half a year, another AI lab catches up. Now if it

still seems sufficiently risky, then maybe that lab also decides to pause development or

delay deployment, right, but then eventually, another another so eventually that kind of pause expires

and it seems to have less propensity to sort of just accidentally become permanent, but these

things are very complex and it's not as if I have like a definitively fixed opinion on these,

this requires kind of continuously evaluating things as we get more information about safety,

about political realities, like the strategic landscape, what other risks are there,

like all of these factors have to ultimately come into all things considered judgment about these things.

Yeah. So, it's interesting that you're saying that in some sense we might have

to really have superintelligence to be able to know how to control it, like to study it enough to

solve super alignment. Well, I certainly don't think we should just wait until then to start

working, it's just that it's easier to make... there's a lot of safety work that is

possible to do now that you couldn't have done 10-15 years ago. So back then, you could do

like you could do various theoretical work, conceptual work. But now, we actually have

these large language models and you can see how they behave. You can do these various experiments.

You can work on mechanistic interpretability to try to get better techniques for understanding how

they represent the world and their goals, and how it's shaped by different... there's a lot more

handles on the problem now and and I presume like when you have the actual system in

front of you to finally nail down the architecture and now it's just a question of, I don't know,

scaling up the thinking time, or maybe like adding more comput like at that point you get an

increasingly like clear view of what it is that you're trying to make safe. And even when you have

the final system, like it'd just be nice to have maybe have already prepared some automated tests

that you want to do, like some test suite, like we're starting to have today. Like so, even if you

just had one extra day, right? Maybe that would already give you like some significant little bit

of added safety because you would at least have a chance to run the test suite that you had prepared

in advance. So like there might be a very high premium on at least having a little bit of

time at that end stage, and like an extra week, then might be a lot more valuable than an extra week now.

But how would you know that you are at this end state? Because I mean isn't it

there a possibility that already now, if we scale up things, that we could reach superintelligence

with the current architectures? Yeah, so, that's like maybe makes current AI

safety work more relevant than if we thought ultimately, it would be some completely different

architecture, but presumably you will be testing these systems quite regularly, as

you develop them because like it costs a lot, so you don't want to just start some big

process on a data center that cost you billions of dollars and then look back a month later and

see that it's fizzled out, like you want to keep close tabs on how they perform on various

internal benchmarks, and test kits that you have right as you're training these and

as you get closer to actually transformative AI, it becomes also more important from a safety point of

view, like to see, so maybe you could see that, "Wow this new architecture scales very differently," so

that now every day that we train it like improves by X amount. Looks super

impressive it's not plateauing maybe we can then predict that probably if we keep this going for

another few weeks it will sort of, you know. reach IQ 130, then 150, and like now, it's still going

strong, okay! And it's no signs of slowing down. Then maybe you would know, okay, looks like

this actually could be it, and maybe that's when you would, if you had this time to burn, maybe

decide to use some of that to do whatever final sprint you could on the safety front.

Yeah, I think many people here are probably interested in your thoughts about, yeah, first of all,

timelines, like how quickly do you think we will actually reach superintelligence when we're

talking about different future scenarios here, but like, what is your prognosis?

Well, I take short timelines seriously, including very short timelines. And I think we are now, and have

been for a couple of years, in a situation where we can't exclude even very short timelines.

It probably will take a bit longer, but we can't know that it couldn't happen in like 2 or 3 years.

I mean, in fact, we can't really be that sure it's not already... I mean, right now

for all we know, right in some lab maybe this like guy you know, working the night shift has like

figured out this 'big on hobbling' thing that just, "Wow this was the thing we were missing," now the

same giant data centers that previously strained to reach like Claude 4.5 level or ChatGPT Pro 5

like whatever, right, that like now with this new tweak, they just learn way... they get the

same sample efficiency as humans have. So that with their massive amount of data and our sample

efficiency, they'd get like, you know... it could happen. It's not very likely, but if it

were happening right now, we would not necessarily know of it. Now, I think so we need to

start to like take into account the possibility that there could be some surprise or it could

happen within a just a small number of years. I think probably it will take longer as I said,

but we can't be confident with that. Right, and by longer are we talking, decades or

or some more years? I mean it's like it's so impressive the rate at which things have been improving.

If things were to take decades, like I guess one then thinks what could possibly be the

reason for that? So one is obviously some kind of external factor like some geopolitical disaster

or this like 'Stop AI' movement gaining steam and sort of shutting down, like that's

one type of way in which it could happen. And another is that it could turn out that a lot of

the gains we've had to date, the rapid progress we've seen has been completely dependent on the

rapid increase in computing power that we've had. So, and if you look at it, you see like a

large of it it appears that maybe roughly speaking like half the progress has been due to algorithmic

advances and have to increased hardware, but it might be that the algorithmic advances

themselves are kind of an indirect consequence of hardware. You can run more experiments if you

have better computers. There's more incentive to actually work like for smart people to work

on the algorithms like if it's an important thing. So, but suppose it turned out that the

real driver here was just hardware scaling and that you need to sort of scale it up by an order

of magnitude to get like a constant number of IQ points as it were in capability, right?

Then progress might soon slow down because we have now reached the levels of hardware investment

where there is a limit to how much more they could grow. So if you're talking 'Stargate,' the data

center OpenAI is planning to build for $500 billion dollars. Well, I mean you could you could go a bit

higher. I mean maybe you could spend $5 trillion in theory, right? if you really thought that was

the final push, but after that it gets really hard, right, it starts to become a very large chunk of

the world economy like another order of magnitude of that it's kind of half the world's GDP so at

some point that has like the to the extent that faster hardware available for AI research has

been driven by increased capital investments that will have to it seems start to slow

down at least a little bit soon, and so at that point if that was what was like driving this

rapid progress. You could imagine if we haven't reached superintelligence already by that point,

then maybe timeline starts to stretch out. Maybe what will then need to happen is that we

have to wait for some theoretical breakthrough that makes it possible to do this way more

efficiently, that could happen. Okay, and do you think that this is likely or? I mean, I

think less likely than the alternative. I mean, it does look like we are sort of

within striking distance it seems to me with the continued scale-up and everything

but you know we've never done this before. So really we have to think in terms of probability

distributions here. So, it sounds you think it would be more likely than not that we have

it within a decade? Certainly, if I guess, there are two ways of thinking about this, is like the

inside view. If I just look at the algorithms, the progress, the specific things, then I

would say yes. Then there's this second way of looking at things, which is like, you stand back

you maybe have young children that go to school. You have like look at all the people

who go around their ordinary business and you think do I really think that the world as

we know it will end completely? Within less than 10 years and that you know pension funds are just

wasting their time and that nobody should build a railway line because it's never going to need...

like that is that. So then, there's like some, I guess sanity penalty that comes from this kind

of more common sense perspective, and so my actual views are some sort superposition between those

but this I don't know, this kind of common sense, prior influence is not strong enough to

I think overcome the inside perspective fully on these things. All right, so is it

like it's you think that it will happen soon, but you don't feel that it will happen

soon, or you mean that you should really take into consideration this, like sanity,

for check perspective as well? I don't think it should dominate your thinking. I think

you shouldn't completely lose touch with it. So I think like from a practical point of view,

I would like take this very seriously and spend most of my time maybe operating on the assumption

that we are, exactly how many years is hard to say, but like we are kind of approaching this

critical juncture in human history. But then if there are things that can sort of hedge your bet

like would you not want your child to have a good education because you just assume that computers

will anyway do everything. So probably not like it seems like do some sensible thing just in case.

We turn out to have been a bit nuts and crazy about this thing, as so many other people have been

throughout history about so many other things. Yeah, right, so given on these like the at

least a high possibility of quite short timelines, and also, that we haven't really solved the

alignment problem, even though you said we've made some progress, like what is your P(doom)?

What is the probability that it all goes... Yeah, I haven't really expressed a particular P(doom)

because I think it might also turn out to depend quite a lot on your definition of doom.

So I think you can sort of imagine one class of scenarios where things, just completely, like all

the values lost kind of thing, clearly doom, and then like another utopia, right so not doom, but

then I think there's a a broad class of scenarios which are such that even if we could now see

exactly what will happen we might still be unsure whether to count that as doom or not. Like it might be...

Future is really weird, and some of the things we now thought were really important and valuable

have disappeared. But other things have appeared that might also have value and we might not really

have some easy way to tote all of that up. Like how much is it worth to get rid of factory,

farming, or to get rid of third-world poverty, or to get rid of cancer, and then maybe you lose

some of the various things that humans think are very important, like I don't know the idea

of being a breadwinner or having discrete minds in separate crania as opposed to one big blob,

or don't know exactly what it would be, but quite likely the future will be very weird in

such a way that it might contain both elements that some people might regard as negative and

and other people might regard as positive. And so, I mean in fact, I think that maybe, that middle

possibility might be perhaps more likely than either the clearly not doom or the clearly doom.

Okay, so, but if we're speaking strictly about the extinction scenario that

everyone dies, like do you think that there's a high possibility of that? No, I mean, so then

there is another complication that comes in here, which is that even if we have a misaligned AI that

doesn't care about us, and that has the power, if it wanted to eradicate all humans.

There might be other reasons, instrumental reasons, for that AI not to do so. It might,

for example, want to be cooperative with other AIs that might exist in the universe

that might care about either humanlike creatures or care about general

ethical principles or norms of cooperation. And because it would be very cheap for an AI to

to preserve humanity you know, maybe even give us the whole planet or the whole solar system,

right? There's a lot of space out there. They could get like 99.99999, for a lot of 9s of

all the resources for its preferred use and still manage to keep humans around in like some

kind of paradise-like environment for us, if it so wished. So, it wouldn't require much it seems,

either of intrinsically caring for us a little bit or placing some instrumental value in

having us around that might then result in even the case where you get a radically misaligned AI

that ends up in a position of absolute power where you still don't get human extinction from that.

Okay, yeah, it's very interesting because when I read "Superintelligence," like it

feels as if your views on this might have changed a little bit because then you talk about it

really as, I mean that it would really want to as quickly as possible start to colonize the

the universe because if it waits, it will automatically lose many stars and galaxies,

just because it waited a little bit longer until it started sending out probes, like because there

parts of the universe that it can't reach anymore. Do you think that this will be outweighed by

this theoretical consideration? I mean, it might want to reach quickly, but hopefully

not killing us would not cause much of a delay. in that regard, I mean, you can still have some

massive place in the desert where like huge starships go up, or you do some fusion

like whatever, right? That doesn't require wiping out all humans around the planet. Even if you

sort of wanted to like invest massive resources in just getting something out there that could

then start a sort of self-replicating process, spreading through the galaxy and beyond. If you

want it to be even more economical, you might even imagine uploading humans and kind of continuing us

on a more efficient substrate, or like there are various possibilities. But it does complicate the

the question of assigning a probability to human extinction because it might be quite a

different probability than the probability of misaligned AI taking over. And then it

becomes like whether you then count that as doom or not then, might depend quite sensitively on your

value function. Like if you mostly care about people, normal people with normal human values,

having great human lives, living happily with their friends and family, and doing the humanlike

things with art and cinema and perfect medicine and like all of these things. If that's most of

what you care about, then this, in some of these scenarios, like things might be like 99% of

as good as they could possibly be. If on the other hand, you're like a utilitarian of the sort

that cares about the total amount of utility in the universe, and you would want to transform

all these galaxies into hedonium or something like that like matter optimized for feeling pleasure

then the scenario might be basically as as bad as total loss because like Earth is in a

significant crumb in this vast sea of resources. So if all the rest of the stuff were used to,

I don't know, make paper clips or what whatever the AI happens to want. If that doesn't happen

to coincide with what this kind of aggregative consequentialist value perspective would want,

then it might count as as basically rounds to zero. So you might get a radically different

opinion on this scenario, whether it counts as doom or like an amazing success, depending on

which value function you have where different people might like choose these different value

functions. Neither of these value functions is so crazy that nobody would have it like so, that's an

example of what I was alluding to earlier that there might be this big category in the middle

where whether it counts as doom or not depends quite sensitively on how we evaluate it in ways

that might not be very clear or obvious to us now even if we could see exactly how things would

play out, which we can't. Okay, yes, I still find it interesting because it really feels

quite different from reading your book. And I wonder if like this idea that it

would want to care for humanity because of some future interaction with other Super AIs

and corporation. I mean why why would it specifically care for us and not other animals

and then see us as sort of parasites on the Earth, or like, is this really something...

it seems like a very important question if you really believe that this is likely

to happen that it will sort of leave the human in peace or if it's more likely that it will do

something that it just won't care about us? I mean that that seems to be very-very important

to sort of straight out. Yeah, I mean, so I didn't express like a likelihood on this, but

it like one reason is that there might be other um civilizations out there that manage to align their

AI for example. I would hope that if we align our AI, it will at least care a little bit about

other creatures out there on other planets, other ape-like creatures or octopuses that became

sentient and develop a tech civilization, and like if our super intelligence like eventually in

500 million years we come across some octopus civilization. I would like to think that we

would then want to be nice to them. And so if there are at least some AIs like that around

then they might engage in trades and stuff like that. That you know would

promote these values. That doesn't seem that unlikely, and what exactly will happen

in that space very hard to predict, but at least it seems like a live possibility that from our

current sort of occluded perspective we are in no position to dismiss. I would also say by part

of explanation so you notice like some kind of I don't know tonal shift or shift of emphasis or

something like that compared to superintelligence and it is true and and part of that is the

context. So when I was working on that book, um the whole issue of AI safety was completely

ignored by basically everyone. Certainly nobody in academia took it seriously, aside from Eliezer and

like a few other people on the internet. The whole world just dismissed it as science fiction or like,

and to the extent that people were interested in AI, the only focus was like how can we actually

get it like so, we have academic departments trying to make progress in this and that and

so at that point, that seemed to like a clear deficit on the side of understanding what we

were actually working towards and understanding there could be risks which seemed really important

because if you could understand that, then we might also actually use the time available to

prepare ourselves to avoid the pitfalls, right? If you have some conception of what could go wrong,

you might take action to prevent things from going wrong. In the intervening years,

there has been a big shift and now there is much wider recognition of the idea of AI safety

as being important, like it really has become part of the mainstream conversation, and

including among sort of serious people. You hear world leaders talking about this,

you hear the tech leaders of big companies, and as I said, the Frontier AI Labs now have like

research groups working hard on this, and there are a bunch of other organizations as well. So,

so the situation on the ground has changed a lot. So now, I think there is less need for

me to keep harping on the same thing again when that point is already quite widely recognized and

so I'm focusing more on like other insights that maybe I haven't yet sort of percolated,

as widely and trying to bring those to people's attention. Right, right, so, your focus is now on

more on other things. But still it would be interesting to just get like sort of a feeling

because I mean, you were obviously quite worried about this when you wrote the book and

some things have changed now. Like given everything you know and the situation we're in.

If you really had to put a number or or at least a range on the probability of

of like complete human extinction would it be like a two-digit number, your P(doom)?

Well, yeah maybe, but I don't know. It also might depend on like, sorry to be this kind of

"Depends on what you mean, the definition, like depends on what you mean by human?" But it might depend on

what you mean by human like if there are like only uploads left then no biological humans for example

Does that count? I think maybe it wouldn't count. I mean, or do you think

that we or like that I would be myself if I was uploaded? Under certain conditions that

I would think so, and in fact plausible to me, that the best path involves uploading at some point

you know. I would favor a world probably where different people were free to choose their own

trajectories. But ultimately, I don't see why we need to use meat to do the computation.

When like semiconductors, might be ultimately much more efficient. But all of these things it,

would be nice. I think like if we tried now, just to make up our mind about a host of those

kinds of questions, we would kind of be bound to get at least one of them wrong. And so what

we would hope for, I think, is maybe to end up in a situation where we're able to like think a lot

harder about these things and deliberate, maybe with AI advice, rather than having to sort

of implement our current best conception of the future we would want and then locking ourselves

into that. I think that likely would sort of miss out on a lot of really exciting possibilities.

So that's one, there are other factors as well. Well, I don't know how much time we have.

Like I have one recent paper. It's not really a fully developed paper, but "AI Creation and the Cosmic Host,"

it's called, and which introduces, it's quite handwavy, but the idea being,

that if we give birth to superintelligence, it will enter a world in which there quite possibly

are other super beings already in existence. So these could be other AIs built by some alien

civilization in some remote galaxy. They could be like in the average interpretation of quantum

mechanics, there are many branches. And so there might be other branches of Earth originating

life that have produced or will produce different forms of superintelligence. If the simulation

argument, right, is to be trusted. We may be in a simulation the simulators

then would be presumably super-intelligent and be super beings. And of course traditional

theological conceptions as well, right? God is a super being, usually super intelligent. And so in

either of these cases that there would be this kind of cosmic host consisting of these other

super beings. And one important decider for us going forward here I think is that if we create

superintelligence, we would want to make it such that it can get along with this cosmic host

and maybe adhere to whatever norms might have been developed within this cosmic host. And so

that I think adds a dimension that has to some extent been missing in the classical discourse

around AI safety where the attitude really has very much be how can we sort of maximally control

the AI to implement the highest degree of our own expected utility maximization, just taking our own

preferences into account. There might be this much larger picture where we are very small and very

weak and very new, and there is like this kind of incumbent set of super-powerful beings and how

our superintelligence interacts with that might be like a very critical part of how well things

go and so I think the upshot of that is a bit unclear. We don't know precisely what that means

but I think it slightly increases the chance that we ought to develop superintelligence and also

I think that we should approach it more with an a little bit of an attitude of humility that

that we don't know very much here. There's more things in heaven and on earth than is

dreamt of in our philosophy, and we are, and so I think it's like there is some different mindset

that maybe also comes from considering that type of perspective. Okay, yes, yeah. Obviously a lot

of things to consider here. Well, I'll try not to stretch out our time too much because I

know you're you're sitting up late now. But can I just, because I feel that I still

don't have any clear idea of where you are on this extinction scale, really, and

this is something that I think worries me, and lots of people, like would you say that

your view now, like you still acknowledge that there is a risk that everyone just

dies, but you don't think you think that's a risk, not like a major risk anymore?

No, I mean, it's like a pretty serious outcome if it happens right, so it's like

So, like even if the risk was quite small it would be worth

taking seriously, but it is worth bearing in mind if we are kind of concerned about our own

personal deaths that it is something that is likely to happen anyway, and then you might

say, well, it kind of matters when it happens right? It's not just whether you die, but like rather not

be too much, and so then you might think in terms of life expectancy, but then if you actually

start doing the math on that, very plausibly our life expectancy goes up dramatically if

we develop superintelligence even if like misalignment risk is quite high because if AI

is developed and it's successful, if it's safe, then it could do a lot to advance medicine, you

know, invent rejuvenation therapies, anti-aging medicine and so forth. And so our lifespan and

conditional going well, would be very long. So yeah, you can start to do... that goes back to this

paper that I'm working on, which I'm like, have more to say on that, hopefully

in the relatively near future. But the upshot is that even if one is concerned mostly

in terms of personal survival, then even then it is not at all clear that should make us

kind of anti-AI or like wanting to slow it down by a huge amount. All right, but I

suppose that you can make that argument even if the risk was like 99% that everyone would die?

And I mean, most people would probably not like to take that bet. Well, if it's that big,

then the argument might not work. I mean, then it becomes more complicated. So, you might think

of esoteric ways in which we might achieve very long lifetimes even if we don't develop AI. But

also, you might not value future life expectancy linearly. You might have a kind of diminishing

returns or time discounting. And so then and especially if you thought that the risk was

going down over time, if the risk now was 99% but it was falling by like 5% a year or something,

you would prefer to wait. But at some point once the risk is low enough or once the rate of

further decline is low enough, depending on various parameters, then you would want to favor taking the

plunge. So, but there are other things, of course like aside from our own personal survival, we might

also care about other things, like the survival of the human species or Earth or anything like

the kind of human values in the broader sense, and other things. So it's very hard to form

like an all things considered view about these big picture questions. I call it kind of macro

strategy, like trying to figure out how all of these different pieces fit together. And it's

it's quite plausible that we have overlooked at least one crucial consideration like

some like fact or argument, or idea such that if only we discovered it or fully took

it into account it would kind of quite profoundly change our opinion not just about the details

about things, but like about the overall direction that we should be trying to go in like should

we want more global coordination or less faster AI progress or slower? Like do we want more synthetic

biology or less, more surveillance or less? Like these big macro strategic parameters, it can be

very hard to form any kind of confident opinion about what the answer to those kinds of questions is.

Yeah, right. Thank you for all of these very interesting thoughts. I realize that

there are are many things to think about here. Would you have time for like one question from

the audience? Well, let's do one then.

Okay, if we have one. Someone who's eager to ask anything. Yeah, we have one there.

You can say it, and I can repeat it because he's hearing this mic. Do you think that we'll

take advantage of our own super intelligence that we create to fend off threats from other

superintelligences that could reach us? So if if we use Neuralink to sort of merge with

superintelligence. If we can, then use that. Yeah? So this will be like an evolutionary change.

Is there a possibility that we'll merge with AIs somehow? Yeah, my guess is that

would happen after we have radical superintelligence that then perfect these technologies that would

allow really perfect brain computer interfaces or uploading or other things of that sort. My

guess is that AI is moving quite fast and there are all kinds of complications with

like actually implanting things in the brain that sort of sometimes gets elided in media

headlines, like you get, like risk of infection you get like it moves around a little bit, and you have

to... so, I think it's great for people with disabilities who can now, you know, walk or see

or something thanks to this implant but it's quite difficult to do something that is better than like

a normal human. So we can interact with computers already through our eyeballs like a 100 million

bits per second directly into our visual cortex, large part of the brain, customized specifically

for processing this information. Output is a little bit more limited, but still, I'm more

limited about my ability to think than my ability to sort of speak and type, usually, right? It's not

as if I could like be a 100 times more productive if I could type 100 times faster. So that would

be my main guess. Now it is possible that if you have high bandwidth interfaces, the brain could

learn to somehow leverage external computational substrates in ways that would unlock some kind

of synergy, like if some large pool of external working memory that the brain somehow

could learn to access, if it were like in high bandwidth communication with it over a long period

of time. You can't completely exclude that or like if you had many humans that kind of could

telepathically communicate. Maybe initially, it would just be telepathic communication, which

would not really be that different from speaking or something, but maybe if they were like connected

like that over a long period of time, maybe they could eventually learn to have thoughts

that sort of... like a single thought, complex thought, that sort of expanded over multiple

brains without the whole thought being located in one of them. It's possible that

these more interesting things, but it just seems like if your timelines on AI is short,

like some single-digit number of years, it's just hard to see how these things would kind of happen

on in a way that would really meaningfully change the needle, within that time scale, would be my guess.

Okay, yeah. Thank you very much. Let's hope then that the superintelligence will be

kind to us. And it's been so great to talk to you. I'm really sorry for sort of pushing

a little bit on time. It's just so extremely interesting to hear your thoughts on this, and

I think we should give a big round of applause for Nick Bostrom. Well, well, thank you, everybody,

and it was fun. Thank you, Jonas. Thank you. And yeah, have a good night now, and yeah,

good luck with all the research. I look forward to reading your papers also in the future.

Loading...

Loading video analysis...