Koray Kavukcuoglu: “This Is How We Are Going to Build AGI”

By Google for Developers

Summary

## Key takeaways - **Co-building AGI with customers**: This is how we are going to build AGI, co-building it with our customers through products to get real-world user signals. It's a joint effort with the world, not a purely research effort off in a lab. [00:07], [00:24] - **Benchmarks evolve with frontiers**: Benchmarks guide model development but become saturated as technology progresses, so you define new ones when you approach the frontier. The most important progress is models used in the real world by scientists, students, lawyers, and engineers. [03:32], [05:35] - **Prioritize instruction following, tools, internationalization**: Key areas for Gemini improvement include instruction following to understand user requests, internationalization to reach everyone, and tool calls plus code as multipliers of intelligence. These enable users to build anything in the digital world. [06:33], [07:40] - **Products like Anti-Gravity drive model gains**: Products like Anti-Gravity provide critical user signals from software engineers to improve models in tool calling and coding. Their feedback in the last weeks of launch was instrumental. [09:50], [10:47] - **Shift from research to engineering mindset**: Gemini marks a journey from research papers to engineering mindset, building models every 6 months with monthly updates, connected to products. Safety and security are engineered from the ground up as first principles. [19:35], [14:28] - **Innovation balances scaling and exploration**: The biggest risk for Gemini is running out of innovation; it's not just scaling but exploring new architectures and ideas, even outside Gemini via DeepMind and Google Research, to achieve intelligence. [38:17], [39:05]

Topics Covered

Benchmarks Lag True Progress
Code Enables Universal Building
Co-Building AGI with Products
Engineering Mindset Secures Safety
Innovation Prevents Stagnation Risk

Full Transcript

Gemini 3, we're sitting here. Reception

seems super positive. The vibes of the model are good.

>> I'm very excited about the progress. I'm

excited about the research.

>> We had actually pushed the frontier on a bunch of dimensions. This is how we are going to build AGI. We want to do it the right way and that's where we are putting all our minds, all our innovation.

>> It's not like it's this purely research effort that's off in a lab somewhere like it's a it's a joint effort with us in the world.

>> This is a new world, right? There's a

new technology that is defining a lot of what users expect. We are in some sense like co-building AGI with our customers.

>> So all of a sudden you enable a lot more people to be builders.

>> Bring anything to life.

>> Bring anything to life. Right.

>> Yeah. I feel like the next 6 months are going to be probably just as exciting as the as the last 6 months and the previous 6 months before that.

>> We are lucky to be living in this in this age. It's happening right now. It's

this age. It's happening right now. It's

very exciting.

Hey everyone, welcome back to release notes. My name is Logan Kilpatrick. I'm

notes. My name is Logan Kilpatrick. I'm

on the DeepMind team. Uh today it's an honor to be joined by Cororey Kavachulu who is the CTO of Deep Mind and the new chief AI architect of Google. Cory,

thanks for being here. I'm excited to chat.

>> Me too. Yeah, very excited. Thanks for

inviting.

>> Of course. Gemini 3, we're sitting here.

We've launched the model. Reception

seems super positive. Like I think we we went out and we obviously had a hunch about how good the model was going to be. Leaderboards looked awesome but I

be. Leaderboards looked awesome but I think putting in the hands of users and actually getting out is like >> that's always that's always the test right like I mean we have been like benchmarking is the first step and then we have been doing tests we have been

like with trusted testers with pre- release and everything. So you get a feeling that yes, it's a good model.

It's capable. It's not perfect, right?

But like I think um I'm quite pleased with the reception really. People seem

to like the model and the kinds of things that I think we found interesting, they also found interesting. So like that's good so far.

interesting. So like that's good so far.

Like this is uh this is good. Yeah, we

we were talking yesterday and the the thread of the conversation was just around like appreciating this moment that the progress isn't slowing down, which I think resonates with me. And as

I was reflecting back to the last time I sat next to you, uh we were at IO as we launched 2.5 and we were listening to Dennis and Sergey talk about AI and all that stuff. I feel like the progress has

that stuff. I feel like the progress has not slowed down, which is really interesting. Like when we launched 2.5,

interesting. Like when we launched 2.5, it felt like a state-of-the-art model and it felt like we had actually pushed the frontier on a bunch of dimensions and I feel like 3.0 delivers that again.

>> Yeah.

>> And I'm curious what the the scaling conversation of can it continue continues to go. Uh what's your sense right now?

>> Yeah. I mean um look I'm very excited about the progress. So I'm excited about the research like when you are actually um there um in the research there are a lot of excitement in terms of like in in

all areas of this right like I mean from data pre-training post training everywhere um we see a lot of excitement we see a lot of progress a lot of new ideas at the end of the day like this

whole like this this this whole thing is really running on innovation running on ideas right the more we do something that is impactful that is in real world that people use you actually get more

ideas because like your surface area increases, the kinds of signals that you get increases and I think like the the problems will get harder, the problems

will get more variet and with that like um I think like we will be challenged and these kinds of challenges are good.

>> Yeah.

>> And I think that is the driver for like going towards building intelligence as well, right? Like that's that's that's

well, right? Like that's that's that's how it's going to happen. I feel like sometimes like if you look at one or two benchmarks you can see squeeze but I think that's normal because benchmarks are defined at a time when something was

a challenge you define that benchmark and then of course like as the technology progresses that benchmark becomes um not the frontier it doesn't define the frontier and then what happens is like you define a new

benchmark. It's very normal in machine

benchmark. It's very normal in machine learning, right? Like benchmarks and

learning, right? Like benchmarks and model development is always hand in hand. Like you need the benchmarks to

hand. Like you need the benchmarks to guide the model development, but you only know what the next frontier is when you when you when you get close to it so that you can define with the new benchmark.

>> Yeah, I I feel this way and I there was a couple of benchmarks like HLE originally all the models were horrible on and doing like one or two% and I think now the newest with deep think uh

is like 40 something% which is crazy.

Yeah, >> RKGI2 was originally all the models could barely do any of that. It's now

like 40 plus. So, it is it is interesting and then it's also interesting to see and I don't have the context on why >> the benchmarks that are static that do have a little bit of like uh the test of time if you will and like I I think

there are probably close to saturating but like GPQA diamond as an example like continues to stick around even though we're ekking out 1% or whatever. It's

like there are there are really hard questions there like I mean and and and like those hard things we are still not able to do.

>> Yeah.

>> Right. And and they still test something but if you think about like where we are with GPQA like it's not like oh there's a there's like you're at 20s and you need to you need to go to 90s right like

so you're getting close there. So the

number of things that it defines as unsold is like of course like decreasing. So at some point it's good

decreasing. So at some point it's good to find new frontiers, new benchmarks and defining benchmarks is really really important right like because if you're going to think about like if we think

about benchmarks as the definition of progress which does not necessarily always align >> right like there's this thing between like there's progress and then there's the benchmarks in an ideal case it's

100% aligned but it's it's never 100% aligned >> like to me the most important measure of progress is like We have our models in real world and scientists use them,

students use them, like lawyers use them, engineers use them and then like like people use them to do like all sorts of things, writing, creative writing, emails, easy or hard, right?

Like that spectrum is important and different topics, different domains. If

you can actually continue delivering larger value there, I think that's that's progress and these benchmarks help you quantify that.

>> Yeah. How do you think about and like and maybe even there's a particular example from like 2.5 to 3 or like whatever we could choose whichever model version change you want where are we

hill climbing and like like how actually how in a world where there's like a zillion benchmarks now actually of you know you could choose where you want to hill climb how are you thinking about for just like broadly Gemini but also

maybe the pro model specifically like where do we try to hill climb for that >> I think like there are there are several important areas right like one of them is instruction following is important.

Instruction following is where like the model needs to be able to understand the request of the user and to be able to follow that, right? Like um you don't want the model just like answering whatever it thinks it should answer,

right? So that instruction following

right? So that instruction following capability is important and that's that's what we always do. And then like um like for us internationalization is important. Google is very international

important. Google is very international and we want to reach everyone in the world. So that that that part is

world. So that that that part is important >> and I feel like 3.0 No pro at least we're talking I was talking to Tulsi this morning and she was remarking about how incredible the model is for like languages that historically we haven't

been really good at which is awesome to see. So like continuously you have to

see. So like continuously you have to put the focus on some of these areas right they might not look like like okay it's the frontier of knowledge but they are really really important because like

you want to be able to interact with the users there because as I said it's all about getting that signal from the users and then like if you come to actually like a little bit more technical domains

function calls tool calls agentic actions and code these are really important right like function calls and tool calls are important because I think it's a whole different like multiplier

of intelligence that comes from there.

Both from the point of view of the models being able to just naturally use all the tools and functions that we have created ourselves and then use it in in in its own uh in its own reasoning. Uh

but also the model writing its own tools, right? Like you can think that

tools, right? Like you can think that like the models are in a way the models are tools in themselves as well. So that

one is is is a big thing like um obviously like code is because like not just because we are all software engineers but also because like like we know that like with that you can

actually build anything that is happening on your on your laptop and on your laptop it's not just software engineering that happens >> bring anything to life >> anything to life right so a lot of the things that we do right now happens in

digital world and code is the basis for that to be able to to integrate with anything that happens pretty much anything that happens in in your life not everything but like a lot of things

that's why these two things together I think like makes up for a lot of um a lot of reach like for users as well I give this example of like wipe coding

right I like it why because um like a lot of people are creative they have ideas and all of a sudden you make them you make them productive right like going from creative to productive in a

way that like um you can you can just write it down and then like you see the application in front of you and like it is like I mean most of the time it works and when it works it's great right like

I mean so cool and that loop I think is great so all of a sudden you enable a lot more people to be builders like building something like I mean like it's great >> I love it yeah thank you for this is the

AI studio pitch I appreciate we'll clip this part out we'll put it out online um one of the interesting threads that you mentioned is like how um and and actually the as part of this Gemini 3

moment we launched Google Anti-gravity new agent coding platform. Um, how much do you think about >> like the importance of having this product scaffolding to hill climb on

quality from a >> from a model perspective obviously?

Yeah. Tool calling and coding.

>> Yeah. Yeah. It's I like to me it's very very important and I think like um like anti-gravity as a product itself yes it's exciting but like from a model

perspective if you think about it um so it's like double-sided right like let's talk about first the model perspective from the model perspective like being able to have this integration with the

end users in this case software engineers and learning from them directly to understand where the model needs to improve is really critical for us I mean it is important for in areas like Gemini app is important for the

same reason, right? Like I mean understanding users directly is very very important. Anti-gravity is the same

very important. Anti-gravity is the same way. AI studio is the same way. So

way. AI studio is the same way. So

having these products that we work really closely with and then understanding and learning that getting those user signals I think is really massive and anti-gravity has been a like

a very critical launch partner. um it

hasn't been long that they have joined right but in the last um two three weeks of our launch process like their feedback has been really really instrumental the same thing with search

AI mode right like I mean AI overviews have been that we get a lot of feedback from there so like to me this integration with the products and getting that signal is the is the main driver that we understand like of course

we have the benchmark so like we know how to push the STEM the sciences the math that kind of intelligence but it's really important important that actually we understand the real world use cases

because like I mean this has to be useful in real world.

>> Yeah. How in in your in your new chief AI architect role um you're now responsible for also making sure that we don't just have good models but the products actually take the models and

implement them and sort of build build sort of great product experiences across Google. how like how much obviously I

Google. how like how much obviously I think this is the right thing for users like getting Gemini 3 in all the product services on day one isn't is like an awesome accomplishment for for Google um and I think even more so hopefully more

product services in the future how much additional complexity from the deep mind perspective do you think it adds to try to do this in some sense life was simpler a year and a half ago >> true >> but like we are building intelligence

right a lot of people ask me you have these two rows like I mean I have these like two titles in a way but like they they're very much the same thing.

>> If we are going to build intelligence, we have to do it with the products through the products connecting with the users. With the Kairo, what I'm trying

users. With the Kairo, what I'm trying to do is make sure that the products in Google have the best technology that is available to them. We are not trying to do the products like we are not product people. We are technology developers,

people. We are technology developers, right? like we develop the technology,

right? like we develop the technology, we do the models and of course like I mean just like everyone is opiated on on anything like I mean people are opiated but like the most important thing for me

is like making the models making the technology available in the best way that is possible and then work with the product teams to to enable them to build the best product in this AI world like

because this is a new world right there's a new technology that is defining a lot of what users expect and how how the product should behave that information that they should carry over

and all the new things that you can do with this new technology. So to me it's about enabling that across Google working with all the products. I think

that's exciting both from the product perspective from what users getting perspective but also from the point of view of like as I said that's our main

driver right like it's really important for us to be able to feel that user need to be able to get that user signal that's critical for us so that's why I wanted to do it that like this is how we

are going to build AGI this is how we are going to build intelligence like with the products >> that's that's how I think it's going to happen >> this is a great this is a great tweet for to put out at some point because I do think it's I do think it's

interesting. It's I I share this

interesting. It's I I share this perspective that like we are in some sense like co-building AGI with our customers with the other PAS like it's not like it's this purely research

effort that's off in a lab somewhere like it's a it's a joint effort with us in the world >> and and and I think like it is actually a very trusted tested system as well that like it's a very engineering

mindset that I think we are adopting more and more and I think it's important to have an engineering mindset on in this one because like um when when when something is like like nicely engineered

you know that it is it is robust it is like safe to use so like uh we are doing something in real world and we are adapting all the trusted tested in a way

ideas of how to build things and I think like that's reflected in how we how we think about safety how we think about security right like we try to think about it again from that engineering

mindset of think about it from the ground up from the beginning not something that comes at the end right like we don't like so when we are doing post training models when we are doing pre- training when we are looking at our

data we always have this like everyone needs to think about this like do do we have a safety team obviously we have a safety team and they are bringing in all the technology that is related to we have a security team they're bringing in

all the technology but enabling everyone in Gemini to actually also heavily be part of that development process that is that is taking this as a first principle and those teams are themselves part of

our post-raining teams Right? Like so when we are developing

Right? Like so when we are developing these when we are doing these iterations release candidates just like we look at like GPQA hle those kinds of benchmarks we look at it safety security measures

as well like that's that's I think I think that is a very like that engineering mindset is important.

>> Yeah I completely agree with you. I

think it also feels natural to Google which is also helpful like because of how collaborative and like big it how big the effort is now to ship Gemini models out the door. I mean with Gemini

3 I think like we were just reflecting on this like to me one of the important things is like this model has been a very team Google model.

>> We should look into the data. I might be like one of the I mean some of the like maybe the Apollo NASA programs had a lot of people but like it is I think this massive Google global also global effort

across all of our teams to make happen which is crazy. Every Gemini release like takes people from like this continent, Europe, Asia, all around the world like we have teams all around the

world and they contribute and like not just like uh not just GDM teams, right?

Like all teams across Google.

>> Yeah. Like it's a huge collaborative effort and like we sim shipped with AI mode, we sim shipped with Gemini app, right? These are not easy to do um

right? These are not easy to do um because they were part of they were together with us during our development.

That's the only way that on day one we can actually go all together out at the same time the model is ready and we have been doing that when we say across Google it's not just like people actively trying to build the model all the product teams they doing their parts

as well. Yeah, I I have a maybe this

as well. Yeah, I I have a maybe this isn't a controversial question, but you know, uh Gemini 3 were sort of soda on many benchmarks, a lot of benchmarks.

We're sort of sim shipping across, you know, across the Google product surfaces, our sort of partner ecosystem surfaces. Uh the reception is very

surfaces. Uh the reception is very positive. Sort of the vibes of the model

positive. Sort of the vibes of the model are good. If you sort of fast forward I

are good. If you sort of fast forward I knock on wood. Uh if we sort of fast forward to like the next major Google model launch like are there things that

you are like still on your list of you wish we were doing X Y and like how does it get better than than the G or should we just enjoy the moment of Gemini 3 and I think we should do both right like we

should enjoy the moment because like one day of enjoying the moment is a good thing like this is the launch day and like I think people are appreciating the model so like I'd like the team to enjoy this moment as well

>> right but But at the same time, every area we look at, we also see gaps, right? Like is it perfect in writing?

right? Like is it perfect in writing?

No, it's not perfect in writing. Is it

perfect in coding? It's not perfect in coding. I mean, especially I think on

coding. I mean, especially I think on the on on the area of like agentic actions and coding. I think like um like I think that there's a lot more room

there. Um that's one of the most

there. Um that's one of the most exciting growth areas and like um like we need to identify where we can do more and we'll do more, right? Like I think we have come a long way. The model is

like I would say pretty much like maybe 90 95% of the people who will engage with coding in some ways are their software engineers or these like creative people who want to build something.

>> Yeah, >> I'd like to think that this model is the best thing that they can use, right? But

there are some cases probably that is we still need to do better.

>> Yeah, I have another sort of pointed question for coding and tool use. What

do you think historic has it just been if you sort of look at the history for Gemini and obviously we had like a very multimmodal focus for 1.0 and I think for 2.0 know we started to make some of the like agentic infrastructure work

like do you have a sense of like why we >> and I'll make the caveat that like I think the rate of progress looks really strong but like why has it just been like a focus thing why we haven't been like state-of-the-art and agentic tool

use from the get-go but for example multimodal we have been literally Gemini 1 was state-of-the-art and multimodal and we've sort of held that for a long time >> I like I don't think it was a deliberate thing I think it was like I mean

honestly I think like if anything when I reflect back I tie it to using the models the development environment being closely tied to real world. The more we are tight then we are more better

understanding these like real requirements that is happening >> and I think like in our journey in Gemini we started from a point where of course like I mean like the AI research

in Google is a huge history right like the amount of amazing researchers that we have and the amazing history of AI research that has been done in Google um I think it's great but like Gemini is

also a journey of moving from that like research environment into this like as we talked this engineer mindset and getting into a space where we are uh we

are really connected with the products right like when I look at the team like I have to say I feel really proud because like this team is still majority formed by people like including me right

like for five years ago we were writing papers like we were researching AI and here we are actually we are at that frontier of that technology and that technology you are developing it via

products with the users it's a completely different mindset that we are building models every 6 months and then we are doing updates every month, month and a half. It's an amazing shift and like I think like we walk through that

that that shift.

>> Yeah, I love that. Uh Gemini 3 progress has been awesome. Another thread that was top of mind is just generally sort of how we're thinking about like where the gen media models which I think historically have like not been a huge I

mean not that they haven't been a focus they were they've al always been interesting but I feel like we've had with V3 VO3.1 with the nano banana model we've had like so much success from like

a product externalization standpoint and I'm curious how you think about in this like pursuit of we want to build AGI sometimes I think sometimes I and convince myself that like a video model

is like not part of that story. I don't

think that's true. I think in general you can sort of you should understand the world and physics and all this other stuff. So I'm curious how you see all

stuff. So I'm curious how you see all these things intertwining together.

>> If you actually go back like 10 15 years ago, generative models were mostly on images, right? Like because like we

images, right? Like because like we could we could much better inspect like what is going on >> in terms of and also this idea of understanding the world understanding

the physics was the main driver of doing generative models with images and and and and so on like some of the exciting things that we have done with generative models like date back to like 10 years ago like way 10 years ago

>> feels like 20 >> right 20 years ago we were still doing image models right I mean that's why I was stating a little bit but during my PhD we were doing like generative image models, right? Like everyone was doing

models, right? Like everyone was doing those >> at that time. We woke through that like I mean we had uh we had things called like pixel CNN's, right? Like they were

like image generative models. In a way what happened was um I think it is also a big realization that text actually was the uh better domain to have very fast

progress. But I think it is very natural

progress. But I think it is very natural that the image models are coming back and like at GDM we have had really strong um image video audio models for a

long time. I think that's what I'm

long time. I think that's what I'm trying to explain. Maybe bringing those together I think is natural. So where we are going right now is we have always talked about this multimodality

right and of course naturally like we have always talked about like input output multimodality and that's where we are going right and when you look at it as the technology progresses the architectures the ideas in between those

two different domains have been merging with each other it used to be that these architectures were very different >> right but they are getting together quite a lot so like it's not like we are forcing something in what is happening

is naturally the technology is converting ing as the technology is converging. It is converging because

converging. It is converging because everyone understands where to get more efficiency from where the ideas are evolving and we see a common path and that common path I think is is getting

together well. So nana banana is one of

together well. So nana banana is one of those first moments right like where you can iterate over images you can talk to the model because what happens is like text models have a lot of world understanding right like from the text

they have a lot of world understanding and then the image model has the world understanding from a different perspective so like when you merge those two I think you get exciting things because like I think people feel that

this model understands the neons that they want to get through >> I have another question about nanobas do you think we should just have goofy names for all of our models. Do you

think that would help?

>> Not really. Look, I mean, like I think we didn't do it on purpose.

>> Gemini 3. If we didn't name it Gemini 3, what would you what would we have called it? Something ridiculous.

it? Something ridiculous.

>> I No, I don't know. I'm not good at names. I think I like I like I mean um

names. I think I like I like I mean um it was RiftRunner, right? Like it was Rift Runner. Like we actually use Gemini

Rift Runner. Like we actually use Gemini model. Those are code names. We use

model. Those are code names. We use

Gemini models to come up with those code names too. And Nano Banana was not one

names too. And Nano Banana was not one of those. Like we didn't use Gemini,

of those. Like we didn't use Gemini, right? There's a story about it. I think

right? There's a story about it. I think

like it's published somewhere. Yeah. Um

I mean as long as these things are natural and like organic, I think I'm happy because I think the teams who are building the models, it's good for them

to sort of like have that connection.

>> Yeah. And then um when we released them like I think that that just like I mean that happened because we were testing the model with the code name right on LM Marina and people loved it and I think

like I don't know I'd like to think that it was so organic that like sort of it caught on. I'm not sure if you can

caught on. I'm not sure if you can create a process to generate that.

>> I agree with you. That's my feeling.

>> If we have it we should use it. If you

don't have it it's good to have standard names. Yeah, we should talk about uh

names. Yeah, we should talk about uh Nano Banana Pro, which is our new state-of-the-art uh image generation model built on top of Gemini 3 Pro. Um,

and I think the team I think actually even as they were sort of finishing, Nano Banana sort of like had early signal that potentially doing this in a pro capacity like you could sort of get

a lot more performance on a bunch of like more nuance use cases like text rendering and world understanding and stuff like that. any anything sort of top of mind for I know we're we're a lot of stuff going on but I think like this

is like probably where we see this like alignment of different technologies is coming into play right like I mean because with Gemini models we have we have always said like every model

version is a family of models right like we have the pro flash like this family of models uh because at different sizes you have different compromises in terms of speed accuracy cost those kinds of

things as these things are coming together of course like We have the same experience on the image side as well.

Yeah. So I think it's natural that the teams like thought about okay like there's the 3.0 pro architecture. Can we

actually tune this model more to be like generative image using everything that we learned in the first version and increasing the size and I think like where we end up with is something a lot

more capable understands really complex like some of the most exciting use cases are you have large set of really complex documents you can feed those in. We rely

on these models to ask questions. You

can ask it to generate an infographic about that as well and then it works right. Um so this is where this natural

right. Um so this is where this natural input modality input output modality just comes into play and it's great.

>> Yeah, it feels like magic. I I don't know uh hopefully folks will have seen the examples by the time this video comes out but I think it's just it's so cool uh seeing a bunch of the internal examples being shared around. It's it's

crazy.

>> Yes, I agree. Like it's exciting when you see that all of a sudden, oh my god.

Yes. Like that's sort of huge amount of text and concepts and like complicated things explained in one picture such a nice way.

>> Like uh when you see those things like it is it's nice, right? Like you you realize the model is capable >> and it's Yeah. And it's it's the there's like so much nuance to it too which is um which is really interesting. I I I

have a parallel question to this which is uh probably December of last year uh December 2024.

>> All right.

>> Tulsi was promising how we were going to sort of have these unified Gemini model checkpoints. Um and I I think what

checkpoints. Um and I I think what you're describing is like actually that we've gotten really close to that now where like the architecture historically was done >> unified in terms of like image generation and Oh, I see. I see.

>> Yeah. Yeah. And and I'm curious. Do you

think like that I assume that's like a goal is like we want these things actually mainlined into the model uh and there's like natural things that stop that from happening and I'm curious like if any like context or sort of high level

>> look I think as I said the technology the architectures they are aligning >> right so we see that happening at regular intervals people are trying but it's an hypothesis and like you can't be

ideology based in this right the scientific method is the scientific method like we try things we have an hypothesis and you see the results sometimes it works Sometimes it doesn't but that's the progression that we go

through. It's getting closer. I'm pretty

through. It's getting closer. I'm pretty

sure near future we are going to see something getting together and I think gradually it's going to be more and more like one single model out but it will require a lot of innovation right like

it is hard like if you think about it the output space is very critical for the models because that's where your learning signal comes from right now our learning signal comes from code and text

that's the most of the driver of like that output space and that's why like you are getting good at there now being able to generate images is like like we are so tuned for the quality in images

like it is it is a hard thing to do right like generating really like the quality of the images the pixel perfectness is hard and then images are also conceptually it has to be very

coherent like every pixel both the quality matters but also how it fits with the general concept of the picture like it matters right it is harder to train something that does both the way I

look at this is to me I think it's definitely possible it will be possible it's just about finding the right innovations in the model to make it happen.

>> Yeah, I love it. I'm excited. Hope it'll

hopefully make our serving situation easier too if we have uh if we have a >> Yeah, that I don't know >> a single model chart.

>> It's impossible to say.

>> It's impossible. I agree with you. the

sort of interesting thread as we sort of sit here and you know deep mind has a bunch of the world's best AI products hopefully vibe coding and AI studio

Gemini app anti-gravity uh and sort of across Google that's happening now we have a great state-of-the-art model with Gemini 3 we have now banana we have vio we have all these models are sort of at

the frontier the world looked very different like 10 years ago uh or even like 15 years ago and I'm sort of curious like for your personal journey to get to this point. You when we were

talking yesterday, you had mentioned which I had no idea. Um and I mentioned this to someone else and they also were like I had no idea of this. Uh you were the first deep learning researcher at

DeepMind. And I think taking that thread

DeepMind. And I think taking that thread to this place that we're at now feels feels like it's a crazy jump uh to go from just like the fact that people

weren't excited about this technology. I

don't know how long ago you started at DMI like 10 years.

>> Uh 2012.

13 years.

>> Yeah, >> that's crazy. 13 years ago, people weren't excited about this technology to the place or I guess deep mind was excited about this technology to the place now where like it is literally powering all these products and is like

the main thing and I'm curious as you reflect on that um what comes to mind uh well >> is it surprising or like was it obviously >> well I mean I think this is the hopeful

positive outcome scenario case right like um the way I say it is like like when I was doing my PhD I think it's the same for everyone doing their PhDs you believe that what you do is important or

is going to important right like you are really interested on that topic you think that it's going to make a big impact and I think like uh I was in the same mindset that's why I was really

excited about deep mind when like Dennis and Shane uh reached out and we talked I was really excited to learn that there was a place that was actually that was that was really focused on building

intelligence and deep learning was at the at the core of it and u it's actually like um like me and like uh my friend Carl Greger actually like uh we

were both in Jan's lab in NYU. We joined

we joined deep mind at the same time. So

just to be very very specific and then at those times it was very unnatural that you would have a deep learning focused and AI focused uh startup even.

So like I think that was very visionary and an amazing place to be like it was really uh it was really exciting and then like like I started the deep learning team it grew. I think one of

the things that I like I mean my approach to deep learning has always been that like a mentality of how you approach problems and uh the first principle it's always learning based that's what deep mind was about

everything is bet on learning it was an exciting journey to start from where we were at the days and then RL and agents and um everything that we have done along the way like you go into these

things at least like this is how I think I go into these things hoping that a positive outcome happens but I reflect and I say that like we are lucky right like we are lucky to be living in this

in this age because I think a lot of people have worked on AI or the topics that they are really passionate about thinking that this is their age and and and this is when it's going to pan out

but it's happening right now and like we have to also realize that AI is happening right now not just because machine learning and deep learning but also because it's like the hardware

evolution has come to a certain state like internet and data has come to a certain state, right? So there are a lot of things that align together and I feel

lucky uh to be actually be doing AI and sort of like working up to this moment.

I think it is a like when I reflect that's how I feel that like yes they were all choices that like we worked on AI and we we made and I made like specific choices to work on AI but also

at the same time I also feel very lucky at this time we are in this position.

It's very exciting.

>> Yeah, I agree too. I love that. What um

I'm curious like what are some of the and I was watching the the thinking game video and sort of like see like learning more about like I hadn't wasn't I wasn't around for AlphaFold. Uh so that's the

only context that I have is like reading about it and seeing people talk about it and I'm curious like as you reflect and having lived through a bunch of that how things are different today versus what they were before. And I'll sort of tee you up with one example which is what

you kind of alluded to off camera right before this which is uh and this is not exactly your words. You were like we've kind of figured out how to make these models and bring them to the world was like sort of an essence of what you're getting at which I agree with. Um and

I'm curious if that felt like yeah how that >> is similar or not to how things were for some of the previous iterations. how to

organize or the cultural traits of what is important to be successful to turn hard scientific and technical problems into successful outcomes. I think we

learned to do that a lot with many of the projects that we have done starting from DQN, Alpha Go, Alpha Zero, Alpha Fold, all these kinds of things have been quite impactful and in their ways

like we learned a lot on how to organize around the around a particular goal, particular mission, organize as a as as as a team, right? Like I remember in the early days of deep mind like uh we would

work on a project with like 25 people and we would write papers with 25 people and then everyone would say to us like surely 25 people didn't work on the >> I would say yes they did they did right

like I mean we would organ because in sciences and in research that wasn't common right >> and I think like that knowledge that mentality I think is is is key we evolved through that I think that is

really really important at the same time I think like with the latest like the last two three years as we talked right.

Um what we have been merged like what we have merged this with is like the idea that now this is more like a like an engineering mindset where we have a mainline of models that we are

developing and we learned how to do exploration on this mainline how to do exploration with these models. The good

example where I see this and I like every time I see this or think about this I feel quite happy is our deep think models. Those are the models that

think models. Those are the models that we go to the IMO competition with with to the ISPC competition, ICPC competition with and I think that's a really really cool and good example because like we do the exploration you

pick these like big targets like am competition is really important right like u it's really hard problems and like um like kudos to every student out there who's competing in those

competitions amazing stuff really and like being able to put a model there of course like you you have the urge to do something custom for that >> we sort of what we try to do is use that

as an opportunity to evolve what we have or to come up with new ideas that are compatible with the with the models that we have because we believe in the generality of the technology that we

have and then that's how things like deep think happen and then like we come up with something and then we make it available for everyone right so everyone can use a model that is actually the the the one that is used in the IMO

competition >> yeah just to draw a correlary between what you said the 25 people in the paper. I think now the today version of

paper. I think now the today version of that is you look at like I'm sure there there's a Gemini 3 contributors list that will come out or is already >> 2500 >> and there's like 2500 people and then I'm sure people conservatively >> yeah I'm sure people are thinking

there's no way that 2500 people contributed to actually but they did which is crazy and it it is fascinating to see like the how large scale some of these problems are now. Yeah, we did and

I think it is important for us and that's that's one of the like the great things about Google. There are so many people who are amazing experts in their areas. We benefit from that. Like Google

areas. We benefit from that. Like Google

has this full stack approach, right?

Like we benefit from that. So you have experts at every layer from like like data centers to chips to like networking to how to run these things at scale, right? It's all it come to a state again

right? It's all it come to a state again going in on this like engineering mindset. come to a state that these

mindset. come to a state that these things are not separable, right? Like

when we design a model, we design it knowing what hardware it's going to run on and we design the next hardware knowing where the models will probably go. But this is beautiful, right? Like I

go. But this is beautiful, right? Like I

mean, but coordinating this, yes, of course, you have thousands of people working together and contributing. And I

think we need to recognize it and that's a beautiful thing. That's great.

>> Yeah, it's not easy to pull off. Um, one

of the one of the interesting threads is around back to this sort of like deep mind legacy sort of doing all these different uh scientific approaches and like trying to solve these really interesting problems and today where we

actually like sort of know that this technology works in a bunch of capacities and we truly just need to keep scaling it up and like and obviously there's innovation that's required to keep doing that but I'm curious like how you think about deep

mind sort of in in today's era balancing you know purely doing scientific exploration versus like we just trying to scale up Gemini and maybe we can use my favorite example for you which is

Gemini diffusion as an as a sort of like an example of like that decision-m come to life in some capacity >> like that is the most critical thing right like finding that balance is

really important even now when people ask me like what is the biggest risk for Gemini and of course I think about this a lot um the biggest risk for Gemini is running out of innovation

>> because I really don't believe that like we figured out the recipe and like we're just going to execute from here. I don't

believe in that. If our goal is to build intelligence and we're going to do that of course like with the users with the products but the problems out there are very challenging. The like our goal is

very challenging. The like our goal is still very challenging and it's out there and I I don't feel like we have the recipe figured out that it's just scaling up or executing. It is

innovation that is going to enable that and innovation you can think about it as at different scales or at different tangential directions to what you have

right now like of course like we have Gemini models and inside the Gemini project we explore a lot like we explore new architectures we explore new ideas

we explore different ways of doing things we have to do that we continues to do that and that's where all the innovation comes from but like also at the same time I think Deep mind or

Google deep mind as a whole doing a lot more exploration. I think it is an it is

more exploration. I think it is an it is it is it is very critical for us like we have to do those things because like again like there might be some things that like the Gemini project itself

might be too constraining to explore some things. So like I think the best

some things. So like I think the best thing that we can do is both in Google deep mind also in Google research right like we would explore all sorts of ideas and we will bring those ideas in because

at the end of the day Gemini is not the architecture right Gemini is the goal that you want to achieve the goal that you want to achieve is the intelligence and you want to do it with your products

enabling goal of Google to really run on this AI engine in a way it doesn't matter what particular architecture it is >> we have something currently and we have ways of evolving through that and we

will evolve through that and the engine of that will be innovation. It will

always be innovation. So finding that balance or finding opportunities of doing that in different ways I think is very critical.

>> Yeah. Yeah, I have a parallel question to that which is at IO I sat down with Sergey and I made the comment to him that sort of when and I I personally felt this at IO which is you bring all these people together to launch these

models and and have this innovation you sort of like feel the the warmth of of humanity as you do that which is really interesting and I I was referencing this because of you know I was sitting next

to you also listening to them and I sort of uh was feeling your warmth and I I mean this very personally because I think this translates into like how deep mind sort of as a whole operates. I

think like Demis has this as well where it's like this deep scientific roots. Uh

but also it's just like people who are like nice and friendly and kind um and there there is something interesting where like um I don't I don't know how much people appreciate like how much that culture matters and like how it

manifests. And I'm curious like as you

manifests. And I'm curious like as you think about like like helping sort of shape and run this um how how that yeah how that manifests for you. Like first

of all like I mean thank you very much you're embarrassing uh but like I think it is important to be I believe in the team that we have and I believe in giving people like

trusting people giving people the opportunity and that team aspect is important and I think like uh this is something that at least to my part I can say I've learned through uh working at

deep mind as well like because like we were a small team and of course like like it's like you build that trust there and then like um how you maintain that as you grow. I think it is

important to have this environment where people feel like okay like we really care about like solving the challenging technical scientific problem that makes

an impact that matters for real world and I think that is still what we are doing right like Gemini as I said is about that like building intelligence is a highly technical challenging scientific problem we have to approach

it with that way we have to approach it with that humility as well right like we have to always question ourselves like hopefully The team feels like that too.

And I'm like that's why I always keep saying I'm really proud of the team that they work together amazingly well. Like

we were just talking upstairs at the at the micro kitchen today, right? Like I

mean I said to them, "Yes, it's tiring.

Yes, it's it's hard. Yes, we are all exhausted." But this is what it is. Like

exhausted." But this is what it is. Like

we don't have a perfect structure for this. Everyone is coming together and

this. Everyone is coming together and working together and like uh supporting each other. It is hard but like what

each other. It is hard but like what makes it fun and enjoyable and also like what makes you tackle really hard problems is I think to a big extent like having the right team together working

together the burden is the way I see it is more like be clear about the potential of the technology that we have >> I can't definitely say that 20 years from now it's the exact same LLM

architecture I'm sure it won't be right so I think pushing for new exploration >> is the right thing to do as as we talked about like GDM as a whole together with

Google research we have to be doing with the academic research community like as a whole we have to push many different directions I think that's perfectly fine like what is right what is wrong is a

like I don't think that it's the important conversation I think like um like the capabilities and the demonstrations of those capabilities in real world is the real thing that that

that that should that should speak for itself.

>> Yeah. I have one last question which is and I'm I'm curious to have your reflection on this as well. I feel like the for me personally like my first year

and a half plus at Google felt like uh which I really liked actually this like uh Google underdog story um to a certain extent which you know despite all the infrastructure advantage and all that like for me personally showing like

working >> when when did you join?

>> Uh April 2024.

>> 2024. Okay.

>> Yeah. Yeah. So like for and also like the AI studio context. So like we were building this product and like sort of >> Oh yes. Now now I remember >> we had no users. We had or we had 30,000 users. We had no revenue. We had sort of

users. We had no revenue. We had sort of very early in the the Gemini model life cycle. And I think fast forward to today

cycle. And I think fast forward to today and like it's obviously not like I was getting a bunch of pings earlier as as sort of the the last couple of days as this model has been rolling out and you know from folks across the ecosystem.

I'm sure you got a bunch of these as well. people like very I think they're

well. people like very I think they're real finally realizing that like this is happening but I'm curious from your perspective like what did you feel that like again I had belief that's why I joined Google that like we were going to

get to this point but like did you >> did you feel that underdogness too and I'm curious like how that how you think the team will that man manifest for the team as we turn that corner

>> I definitely did even before that because like um when LLM's really like became apparent that they're really powerful right like I felt like very

very honestly I felt like we were the frontier AI lab right like in deep mind but also at the same time I felt like okay like there's something that we haven't investment invested as much as

we should have as researchers and that's a big learning for me as well right like that's why I'm always very careful about like we need to cast a wide net that's really important that exploration is

important >> um it's not about this architecture that architecture and I've been very I've been very open with the team that when we started taking LLMs a lot more

seriously and starting with like with the Gemini program like two and a half years ago I think we have been always and I've been very honest with the team

that like like we are nowhere near what is state-of-the-art here like we don't know how to do a lot of things there are a lot of things we know how to do but like we are not at that level yet and

it's a catchup and it has been a catchup for a long I feel like nowadays we are at that leadership group.

>> I feel really good and positive about the pace that we are operating at. We're

in a good sort of rhythm. We have a good dynamic. We have a good rhythm.

dynamic. We have a good rhythm.

>> But like um yeah, we have been catching up. You have to be honest with yourself,

up. You have to be honest with yourself, right? Like when you are catching up,

right? Like when you are catching up, you are catching up. You have to you have to see what others are doing and like um learn what you can learn, but you have to innovate for yourself. And

that's what we did. And that's what I feel like um it's a good underdog story in the in in a sense in that way, right?

like we innovated for ourselves and like we found our own solutions both like technology wise model wise process wise and how we run right and it's unique to

us right like we run together with all of Google like look at like what we are doing it's a very different scale I never saw these things as like sometimes people also say oh Google is big and it is hard

>> I I I see that as like we can turn it into our advantage because we have unique things that we can do so like I'm quite I'm quite pleased where we are but we had to learn through and innovate through that. That's a good way to

through that. That's a good way to achieve uh what we have achieved right now and like there's a lot more to do >> right like I mean I feel like we are

sort of just catching up we are just getting there's always comparisons but our goal is to build intelligence right like we want to do that we want to do it the right way and that's where we are putting all our minds all our innovation

that way >> yeah I feel like the next uh the next six months are going to be probably just as exciting as the as the last six months and the previous six months before that. Uh, thank you for taking

before that. Uh, thank you for taking the time to sit down. This was a ton of fun. Um, I hope we get to sit down again

fun. Um, I hope we get to sit down again before IO next year. Uh, which feels like forever, but it is going to sneak up. I'm sure there's going to be

up. I'm sure there's going to be meetings like next week that are like IO 2026 planning to make everything happen.

So, uh, thank you for taking the time.

Congrats, uh, again to, uh, to you and the Deep Mind team and everyone on the model research team for making Gemini 3, Nano Banana Pro, everything else happen.

>> Yeah, thank you very much. It's been

amazing having this conversation. It's

an amazing journey as well and glad to have all the team but also like sharing with you as well. It's it's great. Thank

you very much for inviting me.

>> Uh we got a special a special little gift. Thank you to congratulate you and

gift. Thank you to congratulate you and the team for making this happen. Uh

>> oh, nice. Thank you very much. Very much

on point.

>> 1500 point club. First first model, right? 1501 for

right? 1501 for >> first model. Very kind. Thank you very much.

Loading...

Loading video analysis...