10 years of AlphaGo: The turning point for AI | Thore Graepel & Pushmeet Kohli

By Google DeepMind

Summary

Topics Covered

Fast Intuition Meets Slow Calculation
Move 37 Redefined AI's Creative Potential
When Machines Discarded Human Knowledge and Improved
From Impossible to Solved in a Single Match
Verification Separates Insight from Hallucination

Full Transcript

Welcome back to Google Deep Mind the podcast. I'm Professor Hannah Fry.

podcast. I'm Professor Hannah Fry.

Picture this scene. It's March 2016.

Inside a hotel suite in Soul, South Korea, two players are playing the ancient game of Go. A game of unimaginable complexity, long thought

impossible for a machine to master. On

one side is Lisa Doll, a legendary 18time Go world champion. on the other, Alph Go, a neural network-based AI system built on a powerful technique

called reinforcement learning.

Welcome to the Deep Mind challenge live in Soul, Korea.

That's a very surprising move. Not a

single human player would have chosen move 37.

After hours of intense gameplay spread over 7 days.

Yeah, that's an exciting move.

Leisad Doll placed two stones on the board to signal his final resignation.

And in the blink of an eye, the world changed.

Final result of 4-1. Congratulations to

Alph Go and to the entire team.

That was exactly one decade ago. And the

field of AI has changed unimaginably since then. We have seen the rise of

since then. We have seen the rise of large language models, the growing sophistication of AI agents, and the solving of scientific grand challenges like protein folding. But in many ways,

the modern AI revolution arguably began right there on that wooden board in South Korea. So in this episode, we

South Korea. So in this episode, we wanted to look backwards and forwards to how a bold experiment in teaching machines to play games became the foundation stone for the AI

breakthroughs of today. And with me are the perfect guests to tell that story.

Tori grateful is a distinguished research scientist at Google deep mind who was right there in soul as a key architect of the alpha go project and push me kohi who leads Google deep mind

science work and is the person to tell us how those early techniques pioneered in go can tackle crucial problems today welcome to the podcast both of you um

Tori I know you're an accomplished go player yourself just explain to us why go was seen as a good challenge for AI Yes, the the game of Go seemed like the

perfect challenge for AI. Uh because the game has such simple rules yet it leads to such complex gameplay with tactics

and strategies and and complex patterns.

And uh once the game of chess had been solved as it were or at least you know Deep Blue had uh won against um the world champion then go was this open

challenge. Um, it's much more complex

challenge. Um, it's much more complex than chess by many orders of magnitude.

Uh, and nobody was expecting uh it to be solved anytime soon. Yet, it it it looks so elegant and simple for computer scientists. And so, it was the perfect

scientists. And so, it was the perfect game to tackle uh at the time.

I mean, the idea of nobody thinking it would be solved anytime soon, that's sort of hits the nail on the head, right? Push me. I know you were working

right? Push me. I know you were working at Microsoft at the time, but just how complex was this problem considered to be?

I think it was considered extremely complex. And that is because not only

complex. And that is because not only because of the breadth of the search space, of the number of moves you can make, but also the depth, how long you have to reason and how long the games

are. In in a game of chess, you might

are. In in a game of chess, you might think about reasoning about 60 to 70 sort of moves. In a game of Go, it's much much longer. And that's uh leads to

the challenge of uh of the problem.

Tori, I know when you first started at Deep Mind being a a Go player. Didn't

you didn't you play against Alph Go on your first day?

Yeah. Yeah. Exactly. So imagine I come first day at work uh at Deep Mind. I

know a couple of people including David Silver and he asks me, "Tora, you you're a Go player, right? couldn't you do us a

favor and test our baby version of of something that wasn't even called Alph Go at the time? Of course, you know, it was an internship project and they had

just about um taken a few thousand or games from the internet and had trained um a system or a few hundred thousand games maybe and um I had the opportunity

to be one of the first people to play against it. But you can imagine I was

against it. But you can imagine I was excited but I was also nervous. It was

my first day at work and there I was being dragged to a centrallylo table on the other side. I think it was Aja Huang

who would later be known as the hand of Alph Go with his poker face. Um and uh I got to uh play against this baby version of AlphaGo

with people watching presumably with a lot of people watching all around me. you know, there was no escape.

me. you know, there was no escape.

Later, Demis showed up and of course David was there the whole time.

Yeah.

And so what does one do? Um, play

conservatively, right? So I just thought, just don't make a mistake.

Surely this can't be so hard. But of

course, that was exactly what that version of the program was good at. It

was trained on human professional games, so it knew exactly what to do against conventional play. And so as this little

conventional play. And so as this little test match proceeded uh my position became worse and worse and I uh I ended

up losing by a small margin but um I um I took the crown of the first person who officially lost against Alph Go. It was

quite the experience and of course afterwards everyone knew me at it was a wonderful way of introducing myself.

Um a humbling way.

A humbling way. Exactly.

Absolutely. Push me just remind us I mean okay so I I know that the the algorithm advanced quite substantially from from that early point where it was an internship but just broadly explain to us how it worked and this this idea

about cracking the kind of combinatorial spaces in particular.

Yeah. So I think if you look at the game of go um the number of moves that you can make uh at any given time there are a finite number of moves but if you look

and reason about the the overall game state it's exponential. M

um and that exponential growth in the number of states that you have to reason about is what makes the game extremely complicated.

So how did they crack it then? What's

just remind us of of of the solution that they discovered.

The beauty of Alph Go was there is this element of thinking fast and thinking slow. And Alph Go in some sense was the

slow. And Alph Go in some sense was the perfect combination of those thinking fast and thinking slow processes coming together to take on this extremely large

search space.

And it matches quite well to how humans play the game. I think you know if you imagine how how a human would play a game of chess or a game of Go, we also have the capacity to look at a position

and pretty quickly appreciate if that's good for black or good for white. And we

can also look at a position and already see moves that seem promising. We never

look at all the possible moves, which would be maybe 20 or 30 in chess or or 200 or 300 in go. We immediately draw on to certain maybe even aesthetically

pleasing moves that seem like just the right ones guided by our intuition. And

um that element is complemented by planning where we re explicitly reason through the possibilities. If I make this move, my opponent might make that move and then I have to counter with

this move. And these two different ways

this move. And these two different ways of thinking come together in how humans play these games and they also come together in in how Alph Go plays the intuition and the calculation as it were.

Exactly.

So was that the inspiration then? Did

you sort of think about how you were playing the game, how other Go players were playing the game and and draw that direct inspiration from from neuroscience effectively as it were?

Yeah, I think that that is definitely one direction because a lot of team members uh were actually game players who were able to introspect and see how we tackled the game and then of course

that comes together with uh uh deep learning uh that at the time you know since 2012 had had grown as a direction and now for the first time gave us the

tools to to learn these approximate functions for that. For example, the value function that takes a board and tells us how good it is for either black

or white or the policy network that takes a board and uh effectively ranks the available moves according to how likely it would be that a professional

player would take them. And so deep learning was just ripe at the time to to tackle this problem and gave us the opportunity to implement the fast

thinking. The slow thinking is not

thinking. The slow thinking is not unlike what happened in Deep Blue. You

know, it's the search of the game tree that was already known and that we might now call good old-fashioned AI.

Okay. Well, I mean, you lost to this thing quite early, but once it had gone through a lot of the people on the team, let's say, I know that you tested it with a professional Go player because you had Fe come into the office.

Yeah, exactly.

How confident were you at that point that it was going to beat him? Yeah, we

had different levels of confidence which was really interesting. So, we had been really lucky to find him. You know, he was the European um go champion at the

time. He lived in Bordeaux and and came

time. He lived in Bordeaux and and came over. Uh we lured him into

over. Uh we lured him into playing this uh game with us. And um the setup was that he would play 10 uh test

games against the version of Alph Go at that point. And I personally thought

that point. And I personally thought that um Alph Go cannot possibly be at the point already that it beats the European champion, a professional

player. And u so I had a bet with David

player. And u so I had a bet with David Silver. David Silva was confident. He

Silver. David Silva was confident. He

said, "I think AlphaGo is going to nail it 100." And I said, "No, I think

it 100." And I said, "No, I think AlphaGo will lose at least one game."

And um the the bet was that whoever lost would have to show up at the office dressed as an ancient Japanese go master

and be in the office for one day um with that. Well, who showed up like that? It

that. Well, who showed up like that? It

was me because um it was in fact 10 nil.

Yeah. But it did give us confidence and gave Damis confidence that we would be able to tackle even harder opponents in the near future

which you did of course um on a plane you got in 2016 to to Seoul in Korea to play against Lisa Doll. I mean just tell us give us a sense of how phenomenal a

player he actually is. Yeah. So, um,

Isidol was really one of the or maybe the best players at the time with an incredible track record of, um, of winning tournaments. Um, he was compared to

tournaments. Um, he was compared to Roger Federra at the time for his success and, um, intellectual brilliance. And so for us, it was a

brilliance. And so for us, it was a tremendous honor that he accepted our challenge um, to play against him. And

um it was a tremendous challenge because we had to set a date, right? You can't

just say, you know, we'll tell you when we're ready. A date was set and we had

we're ready. A date was set and we had to work towards that date to to actually make Alpha Go strong enough. And uh what added uh tension and excitement to it

was that Isidal was convinced that he would win. He thought it highly unlikely

would win. He thought it highly unlikely at the time that Alph Go would win. And

of course he was basing his assessment on the game records that he had seen against Funhoy and he assessed that he was better. But of course what he wasn't

was better. But of course what he wasn't so aware of is that um AlphaGo was constantly improving through the training and the algorithmic refinements

and so on that we made. And so the entire team basically went to South Korea and you wouldn't believe the excitement of people there. You know,

the truth is in in England, Go is a bit of a niche activity, right? Very few

people would be able to play it or even know about it. But in South Korea, people were so excited. The best Go players are celebrities. And, you know,

we came there and there were hordes of photographers that uh took pictures. We

had a documentary film crew with us. And

so imagine typical computer geeks as it were suddenly um in the limelight of the world for this match. Uh that was quite the adventure.

Yeah. I mean were you nervous about the performance of of Alph Go?

Um yes we were definitely nervous. So uh

of course we had a very um sophisticated evaluation pipeline. You can test

evaluation pipeline. You can test against players that you have access to like Fon Hoy. That was super helpful.

You can also test against previous versions of the program. And you can calculate what we call the ELO score of the system, which basically takes the outcomes of all the games that you play

against uh other versions, maybe earlier versions of your program, and calculates how what the rating of the new version is. And you can calibrate these things

is. And you can calibrate these things quite well. But of course we didn't know

quite well. But of course we didn't know where on that scale E is doll would be.

And of course we wanted a cushion as well. You know it would have been would

well. You know it would have been would be nice to be quite a bit better to have some certainty because this is the world stage right if you lose this that's uh a

bit of a hit to the to the reputation.

And so yeah we there was we were nervous. We worked up to the last

nervous. We worked up to the last minute. We also needed to make sure that

minute. We also needed to make sure that the system is really stable. you know,

you don't want to make lastm minute changes to make it that little bit better, but risk that it now becomes unstable. But in the end, we were we

unstable. But in the end, we were we were quite happy with it. And so we entered that now uh kind of famous uh hotel floor where all the action

happened um where all the press was waiting and so on and uh embarked on the match and people were watching from around the world including Bushi, including you.

Yeah. So, I mean, where were you at this point? You were watching on

point? You were watching on Yeah, I was in Seattle. I I mean, I really started getting into it um in the middle of the first game. It became so

clear that uh AlphaGo had reached that specific milestone and you could even see the reaction from um the press and u

the commentators and Lisa Doll himself.

Well, it's interesting that you said in the middle of that game because in the early stages of that game, was it clear who had the upper hand?

I think from like from a person who was just watching it, I felt that in the early stages, everyone felt quite confident that Lisa Doll would would

win. In fact, only as the game

win. In fact, only as the game progressed and it became closer to the final outcome that they realized that as

you count the the territory, u Alph Go had an an advantage. In fact, it came as a surprise to people. What did you think?

Yeah. So I had this interesting interaction on site with a professional go player, an American professional go player who was sitting next to me while

we were watching and and um there was some sequence unfolding in a corner and he kind of approached me and said, you

know, I always uh tell my students not to play that stupid move that um AlphaGo just played. So I mean it's pretty

just played. So I mean it's pretty hopeless. And I was like, I'm not as

hopeless. And I was like, I'm not as much as an expert. Um, let's just wait and see. Was my was my reaction. And

and see. Was my was my reaction. And

then after after that first game, uh, this gentleman came to me and said, "This is the most phenomenal thing I've ever experienced. I'm so grateful that

ever experienced. I'm so grateful that I'm allowed to be here to witness that a machine can play go at this level and there's going to be so much we can learn

from it." And he was already embracing

from it." And he was already embracing this. I mean, you have to imagine these

this. I mean, you have to imagine these people dedicate their lives to the study of this game and they've often trained from being young children uh to their

current age just to master this game.

And so, of course, it comes as a shock to them that a machine might match or even exceed a human Go player. Cuz if

that was the first game when Alph Go won, in the second game, Alph Go did something that I mean really surprised everybody.

That's a very surprising move.

Professional commentators almost unanimously said that not a single human player would have chosen move 37.

AlphaGo said there was a 1 in 10,000 probability that move 37 would have been played by a human player.

just just explain to us what happened with the with the now famous move 37.

Yeah, so this this was a remarkable scene and I was sitting in the international Englishspeaking commentating room and uh and Michael

Redmond, our American uh commentator. He

had this big demo board on the wall and he would put all the stones up there on the board to show people what was being played and comment on different variations. And so he took this stone

variations. And so he took this stone corresponding to move 37 on the board.

And then he stepped back and said, "Ah, this must be wrong." And he took it back. And then he looked at the screen

back. And then he looked at the screen again and said, "No, no, that is actually what AlphaGore played." And he put it back. He was puzzled. you you

could you could see it that that was such a counterintuitive move for a human player. It was a shoulder move on the

player. It was a shoulder move on the fifth line. And uh this is typically

fifth line. And uh this is typically something that human go players avoid.

So often in go there is some kind of uh pushing going on along the along the edges and one of the players builds territory along the wall of the board

and the other side builds influence towards the center of the board. And if

that happens on the third and fourth line, this is considered to be roughly equitable. You know, both sides get

equitable. You know, both sides get something out of it. But what AlphaGo was effectively suggesting is that it's still profitable if you do it on the

fifth line and you give that much more territory to the other party. And that's

what what was so surprising to people that that could that there would be situations in which that would be correct. And so not only was it a very

correct. And so not only was it a very special move, but it in a way it represented a new way of of weighing these two factors of immediate territory

versus uh influence towards the center of the board against each other.

Something that went beyond what a human go would would normally do.

Yeah, absolutely. I mean there are moments like this where you see the true potential of an AI system, the expanding

human knowledge where people have regarded um in this in this particular case the game of go as a as a thing to

be studied for many many years and uh there comes this particular point where that knowledge is expanded and people were at first skeptical and which was

the case in the game as well. When when

the the move was played, it was considered a hallucination or a mistake, right? For a for quite a bit of time

right? For a for quite a bit of time before its implications became clear later on in the game.

Exactly. Because it proved to be pivotal to the second win.

Yeah. It was not just a moment in that game but was also a moment I think in the whole sort of history of AI where

that particular moment showed us that there will be times when these systems will produce insights which we might not

even be able to discern whether they are the right things or amazing breakthroughs but yet they will have a lot of influence in how we look

at whole areas of study in a completely new light.

Well, I also want to talk about move 78.

This is a a move that was played by Lisa Doll that confused AlphaGo, causing it to resign the game.

What is Lisa Doll up to here? He's just

burned like seven or eight minutes just on this move already.

Oh, look at that move. That's an

exciting move. Oo,

you know, I'm not actually sure what AlphaGo is trying to do here.

So, he's found his weakness. That that

wedge move.

World champion Lisa Doll went looking for AlphaGo's weakness in game four and he found it.

So by this point, Alph Go has won three games in a row and now Lisa Doll does a move that confuses the system. Is that

fair to say?

Yeah, that's absolutely fair to say. So

move 78 um was an unusual wedge move that Isidol played. There had been a very

played. There had been a very interesting uh battle as it were at the center of the board and um Isidel found

this move and um it was also surprising to people um similar to move 37 and um from then on we observed that um

AlphaGo didn't have a good grasp of the position anymore. we saw that the moves

position anymore. we saw that the moves that it made didn't really make sense to us in a bad way. You know, 37 also didn't make sense to us maybe, but these

moves even to to amateurs like us seemed seemed strange and so it had it had been confused by the move. And um just to

zoom out to give you a sense of why this still mattered so much. So you might say, okay, it's a match of five games and AlphaGo has won the first three.

What more is there to prove?

But then we were thinking, well, if now is to win the last two, what would you conclude? He's got it figured out,

conclude? He's got it figured out, right?

He's found the fragility.

Exactly. So the human it would have been the human triumph. And so that's why that game and the last one were still very exciting to us. But um it wasn't an

entirely the case that we were disappointed. We were certainly

disappointed. We were certainly disappointed but also we had so much admiration for Isadol to you know as a human to be able to find this move. You

just have to imagine this this master who has dedicated his life to playing this game in this battle that must have been so hard on him, right? To to see

this machine play so perfectly and him struggling to find a way and then in game four he finds a way. And as he put

it in the um press conference, I think later he uh he said that he was so happy and proud that he was able maybe for the

last time on behalf of humanity to find a way to overcome the machine because some people called it the divine move, didn't they?

Yeah. Yeah. And I think um given the uh the tension at that point in time and and him really outgrowing himself at

that moment and finding that move. Um I

think it's a it's a good name for it.

Well, the final score was 4-1 to Alph Go in total. What was the reaction from the

in total. What was the reaction from the Go community?

Yeah. So the go community followed the match very closely and um of course the outcome was dramatic and for many people unexpected and so people showed very different reactions. You know some

different reactions. You know some people were absolutely amazed and surprised about the the outcome. Some

people couldn't believe it. Others of

course also thought that some era had come to an end because now maybe the strongest uh go player was no longer a human but but a machine. Um but overall

what we found amazing is that there was an uptick in interest in the game of Go.

I think more people play Go now than did before. And the Go community really

before. And the Go community really embraced the learning from Alph Go. So

there are now many programs that work essentially the same way that Alph Go does and people use it for teaching purposes. They analyze their games

purposes. They analyze their games through it and overall I think it has provided a lift to the whole go community.

Let me ask you about the reaction from the AI world to to this match. What was

the buzz? What was the conversation like? The lethal match, the alpha goal

like? The lethal match, the alpha goal lethodol match uh was a key pivot point where um a lot of people uh especially

in the machine learning community who have been sort of working on these uh models and techniques. um as a

mathematical and applied uh project started to see evidence that these systems can self-learn and go beyond

human knowledge. And that is a very

human knowledge. And that is a very important sort of point because uh in machine learning you train with uh training data which has been collected

and your natural sort of expectation is that the model is going to just be consistent with that distribution and to show that you can go beyond that

distribution and that insight then can be utilized by uh the world I think uh is an amazing sort of uh insight. that

comes out of this whole experience and um it really points to what is possible with um artificial intelligence in not

just the game of go but in the understanding of our world in in chemistry in biology in mathematics in computer science what are these amazing

analoges of move 37 that these systems will be able to uh discover and uh and reveal to us. I think that point that you made there about going beyond human

intelligence is is just so fascinating.

But one of the things that I find most intriguing about the the Alpha Go story even after the victory of 4-1 is that you then built Alpha Zero where you took

away all of the human data, all of the the games of Go that it had been trained on and discovered that once you take out the human intelligence, the thing actually improved, which is astonishing

to me.

Yeah. From a scientific perspective, one could argue that that is even an even bigger step than the original Alph Go because as you were saying, the Alpha

Zero system doesn't have access to any human game records, how humans play, didn't have access to prior knowledge about the game, how the game is played,

but really only had access to the rules of the game and means of representing and learning these functions that that we talked about, the policy net and the value.

So basically it starts playing entirely randomly at the beginning because it has no notion of what good or bad moves are.

But it gathers experience from playing these games and it learns what are moves that are more likely to lead to a win, what are moves that are more likely to lead to loss, what are positions that

look promising, what are positions that are not promising. And eventually it it starts playing better and better moves.

And now of course it's not um limited by human knowledge and what it discovered was amazing. So first of all it uh dis

was amazing. So first of all it uh dis it rediscovered ways of how humans play and that was totally reassuring. You

know there are certain patterns in the corner in go that we call joiki or in chess there are certain opening moves.

The system was now more general. It

could play chess, go and shogi, and could have played any number of other board games if we trained it that way.

And so at first it rediscovers human knowledge and we think, "Wow, this is so cool. It it finds the same openings and

cool. It it finds the same openings and so on. And then we look at some of these

so on. And then we look at some of these openings and it stops playing them. We

think, what's going on?" It has found a reputation. So it discovered

reputation. So it discovered rediscovered human knowledge and then it discards it because it has now gone beyond it and has found there's actually better ways of playing. I'm not going to

continue playing in this human way.

Stuff that humans hadn't found yet effectively.

Exactly. For Alpha Zero when it played go, the way it played go looked alien to me in the end. So this wasn't the kind of go that I had learned from my go

teacher you know which is structured maybe in a way that enables humans to understand it. These moves looked very

understand it. These moves looked very free and um didn't make much sense at the time but 30 moves later everything would fall into place and you see oh

yeah oh wow that makes sense now and so on as if it had the foresight in a way which it did right. So that uh that discovery from nothing to to that level

of play was very uh impressive.

Okay. So there's there's something I I want to show you something that happened actually when you guys were in Soul because you as you mentioned before you were being filmed for this documentary for um for Alpha Go and there's some

footage that didn't make it into the film um but it was it was captured by the cameras as they were sort of packing up but the microphones were still running. I don't know if you've um

running. I don't know if you've um you've heard this little clip. Let me

play it for you. Hold on. This is Demis and David having a sort of private conversation.

Well, it's just amazing seeing how quickly a problem that is seen as being impossible can change to being done.

We can solve protein folding. That's

like I mean it's just huge.

I'm I'm sure we can do that now. I was I thought we could do that before.

Yeah.

And now but now we definitely can do it.

Good job. Thank you.

Okay. No problem.

Beautiful.

Isn't that great?

Yeah. Tori, do do you think that captured the mood at the time?

Yeah, that was the kind of door that um AlphaGo opened at the time, right? If we

can do this, then what else could we do?

Because this is a game with 10 to the power of 170 different positions. Um

this is super complex and if we have principled ways of navigating that kind of combinatorial search space then it seemed plausible that we would also be

able to handle other large combinatorial search spaces and at the time one of the favorites um was um protein folding.

Absolutely. And and this is now the point really where you come on board with the deep mind team fish meet because when it came to Alpha Fold I mean you're integral part of that story.

Did Alpha Go did that project directly influence what you guys went on to do or was it sort of like the the confidence of a victory that made Deis say things

like that? No, I think uh Demis

like that? No, I think uh Demis um from very early on I think uh he has a very strong notion of what AI is being

developed for. He really sees AI as a

developed for. He really sees AI as a tool that will help us um understand the world better. In fact um at the time

world better. In fact um at the time when uh when the Alph Go matches were happening, I was at Microsoft working on AI for programming. M

now AI for coding is everywhere but at that time not many people were working on program synthesis and AI for coding and Demis wanted me to join deep mind

and my question to him was um I am really interested in having AI systems machine learning systems uh for solving the most challenging

problems uh in the world and to make sense of what's happening and I think his reaction was if you want to understand the world and if you want to solve the most important problems in the

world then you have to join deep mind because we will need AI to really understand the world deeply and to tackle these problems. So if you are

interested in sort of uh uh learning to program, if you are interested in cyber security, if you are interested in uh dealing with climate change, if you are interested in understanding how to deal

with impossible to treat uh sort of diseases, you have to come and really lead the charge on how can AI be used for these applications. I want to ask

you about some of the some of the innovations that you guys had made in in Alph Go and how they ended up finding their way into the the science projects that you guys were doing. One of the big

things that Alph Go did was to make that gigantic search space more tractable.

So, how have search algorithms changed since then and how are they being used in science?

I mean, search uh is such an integral part of many problems that you encounter in the real world. Um we just spoke about uh protein folding which could be

considered as the search over the space of all possible structures. But uh just to give a more sort of simpler example you can think of search as also the

search of algorithms for solving a particular problem. So everything around

particular problem. So everything around us that computers do has some form of matrix multiplication underlying it. So

even the fact uh that we have these machine learning systems and neural networks that are changing the world today, these neural networks are based

on matrix multiplication. Uh essentially

taking large uh matrices of numbers and multiplying them together. And even the very simplest operation of matrix multiplication which is just taking two

matrices and multiplying them is the simplest thing that you sort of learn in school and and college. And yet we don't know as a whole research community what

is the fastest way of multiplying two matrices. So if you think about that

matrices. So if you think about that problem you can reason about it as a search problem. You can say there are

search problem. You can say there are there's a space of possible algorithms and now search over that space of algorithms and try to find me the best

algorithm. The issue is that the search

algorithm. The issue is that the search space for that problem is even larger than the the search space for go. So one

of the first things that we needed to do is uh we came up with um this um agent called alpha tensor which u made matrix

multiplication as a search problem as a as a game.

So instead of did you win or lose the game of go you're saying did you multiply these two matrices together quickly or not?

Yeah. Did you multiply these matrices completely accurately in the smallest number of moves? Right.

And that was the game and uh there was an algorithm that Strasen in 1969 had had come up with and since

then for 50 years there was no progress and then Alpha Tensor found a better way of multiplying these two matrices and

then the the and that was a key sort of proof point of what is possible with the same sort of techniques. In case there's anyone watching who's who's sort of I don't know maybe not that familiar with the the the things you're talking about

matrix multiplication for example. I

mean we we need to like be really clear on the potential of this thing. I mean

every single large language model in the world is essentially at its heart just a massive matrix multiplication problem.

Right.

Yes.

All of the fuss about different chips that are being made is because some of them can multiply matrices faster than others. Yeah. And what you're describing

others. Yeah. And what you're describing here is is like turning that into a game and even small gains that you might make on how quickly you can do something once you scale it up to the size of how much

everybody in the world is using AI.

We're talking about gigantic differences.

Yeah, absolutely. And since then what we have done is we have said let's not just uh tackle matrix multiplication. Let's

tackle all the possible algorithms that you can think of. So um our new agents like Alpha Evolve they search in the space of all possible programs trying to

find the best algorithm that can solve these uh these important problems whether it's how do you schedule jobs in a data center which is an extremely

important problem has and has implications in terms of energy compute uh utilization and so on. uh or how do you sort of uh tackle these logistics

problems where you are trying to move packets around in a network. So the same basic methodology of track tackling

these search problems now has expanded in terms of what you can do with it.

Okay. But I'm thinking here about the policy network the the intuition as you described it where you know a go player might look at the board and say I think this is a fruitful direction in which to

search. If you instead of a board,

search. If you instead of a board, instead of a game of go, you've got all possible algorithms of everything in the entire world and beyond, let's say, how

on earth do you create intuition in that sort of a situation? How do you know how to narrow down the search space?

Yeah. So, I think this is this is a very interesting sort of uh research topic that we are now starting to think of when we apply uh agents like Alpha

Evolve um to discover these new algorithms. Sometimes those algorithms are not very um intuitive to us. In

fact, they could be counterintuitive. So

sometimes you can see the patterns. You

can see that there are certain symmetries in the problem that we did not understand. Mathematicians did not

not understand. Mathematicians did not understand, computer scientists did not understand, but somehow there were those symmetries. The agent somehow discovered

symmetries. The agent somehow discovered those symmetries and then they it exploited and utilized those symmetries to make the uh the solution much more efficient. In some cases we just don't

efficient. In some cases we just don't understand how it made things faster but they are faster and and then our challenge is that when you think about

collaboration where humans and these AI agents are working uh together then how do we make sure that the systems that are produced and the algorithms that are

produced are interpretable by the human uh computer scientists and engineers.

It reminds me a little bit of of this situation in Alph Go where people in the end game were observing Alph Go and found that it didn't quite play

optimally and they were really surprised to say look this is a better move than what Alph Go played you know is it not playing well is it making mistakes and

the solution was that Alph Go was optimizing the objective we had given it which is to maximize the probability of winning the game. M uh humans tend to

use a uristic which is they want to have more territory than the opponent by some margin and they think the larger the margin is the better it is for them

which is often true but alphagu doesn't care about the margin for alphagu it was enough to win by half a point and so often in the end game it was almost

toying seemed to be toying with the opponent and giving up points just up until the point where it was Sure, it could win by half a point and you sometimes you get these counterintuitive

behaviors, but if you then drill deeper, you can see why they come about because the algorithm and the humans are ultimately optimizing for slightly different things.

Exactly.

Yeah. Yeah.

Okay. But then that does make me wonder.

So move 37 as as an example of where you know it went beyond what human humans are able to do. At the same time when Move 37 first came through, people thought it was a mistake, right? So how

can you tell the difference? I mean if if the algorithm comes up with something that is original, can you be sure it's not a hallucination?

Yeah. And I think this this is a very important point, right? Like with the large language models uh especially when they were being developed initially in the the first versions of them, they

would hallucinate. They would come up

would hallucinate. They would come up with uh solutions which were not correct or come up with responses which were completely invalid. And this is where

completely invalid. And this is where the importance of the agent harness comes into play where you couple the large language model with a verifier

which is able to sort of prune out when what is being hallucinated and what is actually something that might be remarkable that we need to uh investigate further.

But then if those large language models are based on human data, are you is there a danger of you limiting yourselves to what humans have already discovered? I'm thinking of the you know

discovered? I'm thinking of the you know the what's already in the textbook as it were.

When we build these agents we uh deliberately increase the amount of uh things that they have to explore. So we

tell the models that you have to go beyond the distribution that you were trained on and you should feel free to explore more and in fact you might sort of uh produce uh new things which might

not be uh appropriate or not be correct but we have that verifier and evaluation function to prune out those uh those insights.

I think this is really how KL PA would also characterize the whole scientific process. Um conjecture and reputation is

process. Um conjecture and reputation is the famous essay. Uh and you know conjecture is maybe hallucination. It's

this production capability of producing plausible hypothesis and then reputation is the step by which you filter out the things that that are wrong that don't

work. And I think it it also um makes

work. And I think it it also um makes clear why the current AI capability landscape looks uh like it does. Namely,

it is very good in verifiable domains.

Code is a verifiable domain. You define

the objective. You can write down tests for the code.

Uh the first test is that it compiles.

You know, it's already a good sign, right? Then you test it on those tests.

right? Then you test it on those tests.

But you have hard criteria to reject failure, which is super important for these um kinds of tasks. Um if you don't have it, things become much trickier.

For example, if you work on open scientific problems, you may not have a verifier who can tell you that this is right or this is wrong. Ultimately,

often experiment physical experiment will be the verification that you need, right? But that's quite a long long way

right? But that's quite a long long way down the road, isn't it? I guess the the experimental part of it because I'm just wondering here about interpretability.

Coming back to the point that you made earlier, does it matter that you might end up with results that are not easily interpretable here given that the stakes are so much higher than they are on a

board of a go game?

Yeah, I think it it does matter. Science

is also about communication, right? If

you can come up with this new insight but if you are not able to communicate and people are not able to build on top of it, then uh there are limits to what

uh uh the impact that will be achieved, right? So um so interpretability plays a

right? So um so interpretability plays a very important role uh but it's not the the only thing take the example of

alphafold. Alpha fold is able to uh

alphafold. Alpha fold is able to uh solve uh this amazing problem of protein structure prediction. Um do we

structure prediction. Um do we understand completely uh the conceptual uh sort of operations that it does like

at the mechanistic level? Yes. But uh uh we don't know completely the underlying theory that can be used to recreate uh a

human level reasoning process to make the same predictions and we will somehow need to convert them to a human digestible form uh that the uh the

bounded rational human mind will be able to comprehend. I think there's a really

to comprehend. I think there's a really interesting point there which is that an explanation not only needs to account for the phenomenon that you're

explaining, it also needs to account for the intellectual level of the recipient of the explanation. So sometimes on YouTube you can see these things life explained at the level of a six year

old, an 8-year-old, a 10-year-old, a 12y old. I quite like the explanations for

old. I quite like the explanations for 12 year olds I have to say. Um and uh and that reflects this fact right you know some an explanation really is a

bridge between the phenomenon and our capacity to understand it. So it may very well be the case that future AI systems come up with explanations that

might seem simplistic to them but that that are just about right for us to keep up with the AI system. Right.

Yeah. Exactly. I mean if you look at uh our agents like Alpha Proof what they are able to do is um you give them open

maths problems and they will give you a proof and that that proof is verifiable.

You can tell whether it's correct or not.

Exactly.

Even if you don't understand it.

Yeah. You might not understand it but you know it's correct.

Right. The the uncertainty about whether the original theorem was correct or not is now resolved. But do we completely understand it? Like in fact till now the

understand it? Like in fact till now the results that we have had uh we have spent the effort and and then converted those results into a form that mathematicians have been able to to see

and say yes it makes sense. I can

actually translate it in in English and it it all works. But there are two key u phenomena that come out of it. One is

that the importance of framing the problem uh now rises because if you don't uh one of the challenges when we are trying to

solve these very hard maths problems when we giving the agent these hard problems is to specify the problem accurately so that the agent can now understand what is the reward function

that it needs to optimize for and then once it finds the solution then there's a challenge of actually uh converting the solution back to a human readable form.

If we do get to a point though where an algorithm can just come up with its own proof, where's the role for mathematicians in all of this? Speaking

selfishly, no, I think mathematicians are even more important today because what these uh

agents are able to do is they're able to solve these incredible problems. But what are the problems that need solving?

How do you specify that problem? That's

where mathematicians and scientists come in. I do like the idea though that one

in. I do like the idea though that one day there might be I don't know reman hypothesis and it comes back and says yes there's a proof unfortunately.

It's it's beyond any human's ability to understand it. So, you know, sorry about

understand it. So, you know, sorry about that. But actually, I mean, I'm joking

that. But actually, I mean, I'm joking slightly, but but if we are talking here about uh advancing scientific knowledge and understanding beyond what humans

have done, do you think you've seen examples of move 37 in science already?

Yeah, I think absolutely. I think uh just the example of the matrix multiplication algorithms it it is something that people had studied for many many years and yet we are we have

been able to come up with a new algorithm. So that is genuinely a move

algorithm. So that is genuinely a move 37 moment in uh algorithmic discovery and I think we are we are now seeing the

same thing in many other areas um of uh of science in mathematics in in material science coming up with new um uh structures uh that we think now are

stable. Um so there are a number of

stable. Um so there are a number of these uh uh things but the original move 37 moment is still very relevant because

it was in some sense the first and it brought about that that concept of going beyond human understanding.

I am thinking here about alphaz again and how that really moved away from human data and and showed these profound results. Large language models on the

results. Large language models on the other hand ended up being almost a a shortcut to intelligence I guess that was based very much on on human data.

Was that a sort of surprising turn of events for you?

Yes, I think that is an interesting uh thing that we observed. Deep mind was based on this idea that we use games as

a microcosm of of the real world. And

the philosophy of deep mind had been to place agents within these environments and let them learn how to master them and thereby grow their intelligence. Uh

and then what happened with large language models was really this discovery that there's a shortcut that somehow there's this huge amount of crystallized intelligence if you like

stored in the form of data on the internet first text data maybe images maybe videos and so on. And that the

shortcut is really to first mine all of that data and train systems uh based on that. And that's basically the first and

that. And that's basically the first and second generation of large language models that are based on that. Uh but

then of course um you come to the point where first of all uh that doesn't lead you to novelty. You're now within this corpus of existing human knowledge and we know how competent these models are

within that but but it's very difficult to to get out of that now. How do we go beyond what we already know? And that's

I think where now the community or for the past few years is exploring um the methods again that deep mind pioneered early on. Uh others of course

early on. Uh others of course reinforcement learning in environments.

Part of the post-training now is routinely forms of reinforcement learning and either on human generated data or also on problems um on

environments like coding environments and so on and so so now we're in a period where we're going again beyond human knowledge.

Push me. Do you think that we would be here at this moment in the AI revolution if it hadn't been for Alph Go? I think

AlphaGu uh was that u transition point where it became very very clear that the moment of transition

where we go beyond human level intelligence in particular areas is not uh science fiction or many decades

later. It is happening now. And if it

later. It is happening now. And if it could happen in a game of go, there was no reason why it couldn't happen in protein structure prediction, in fusion,

in material science. And the legacy of that match and uh move 37 and uh and that experience is what we are all living in now.

I think that's a great point to end the episode actually to be honest with you.

Dor push me. Thank you so much for joining me. Amazing. Yeah, pleasure.

joining me. Amazing. Yeah, pleasure.

Yeah, these big paradigm shifting moments in the story of humans and machines have happened before. But the thing about

happened before. But the thing about chess is that it was always just a question of calculation. Can a machine brute force its way to a victory? Alph

Go was different. It was the first time that a machine had demonstrated something deeper, a genuine intelligence that combined intuition with calculation

and took us beyond human capability. Now

10 years on from the Alph Go match, the field has moved at an incredible pace.

But many of the questions that preoccupied researchers then are more relevant now than ever. How do you create AI systems that go beyond human

knowledge and are capable of new insights? And how do you separate the

insights? And how do you separate the genuinely new insights from hallucinations?

You have been listening to Google Deep Mind the podcast with me, Hannah Fry. We

have got plenty more episodes to come out this year. So, please make sure you subscribe to our YouTube channel. We'll

see you soon.

Loading...

Loading video analysis...