CTO of Google DeepMind on why world models matter
By Allie K Miller
Summary
Topics Covered
- Speed Makes AI Part of Daily Life
- Models Now Decide How to Present Content to You
- The Key to AGI Is Generality
- AI learns both inside and outside its weights
- Ask your AI agent before every decision
Full Transcript
Hey friends, I am here at Google IO. We
are in South Bay, California. It is
burning up outside, not just because of the amazing Google keynote, but also because it's something like 90°. I am
here with Kori Kovakulu, who is the chief AI architect at Google and the CTO of Google DeepMind. And we are just going to be chatting about what the heck just happened and maybe a little bit of
his predictions looking forward. Thank
you so much for being here. I appreciate
it. Thank you very much for having me.
Okay, so here is what I felt at least walking out of the keynote. Sundur is on stage and he's talking about agents and he's talking about new models and I feel
like the way that I'm supposed to spend my day-to-day has changed and so could you give a little bit of a sense of maybe how you imagine the average business professional? What is the
business professional? What is the business professional day-to-day in the AI age in 2026?
Thank you very much. I mean first of all it's exciting everything we are announcing um I think like that path towards building intelligence what you are seeing is where we got to right like
these are the latest things we talked about agents a lot we talked about world models a lot dayto-day I think like the first thing that happens at least for me is like I start my day by getting like
from Gemini spark actually okay how is my day looking give me a little bit of a prep and it does that it goes through everything that you have like I mean emails, chats, messages, calendar and it
gives me a detailed prep of okay these are the meetings that you have these are the things that you need to actually answer for or like uh make decisions about and all those kinds of things. It
is great to be able to start the day like that. And then during the day like
like that. And then during the day like it is always good to when you are going around either researching something or you need to actually like um like know
about something like the first thing I do is always just go to Spark and say okay like what is happening about this what is the latest and u that's sort of
always there helping me and like um sort of like that that companion that is um that is knowing about everything, pulling things
together and uh bringing the information and day-to-day when you do your research I think it is a whole new world where we work with agents and when you're going
to iterate over something it's always about iterating over ideas agents are there to help with that iteration either you are doing research or coding obviously like you're very heavily doing
it with agents or like even when you are doing something creative something like omni actually like you want I have a video and you want to iterate over it.
Like that's that's magical.
And so Omni just was announced today. It
is the early teaser of maybe world models. We also saw Project Genie. Can
models. We also saw Project Genie. Can
you talk to me just a little bit about the world model space? So right now the ability to generate edit videos sounds really exciting. Where does this go from
really exciting. Where does this go from here? Dennis even said like we would be
here? Dennis even said like we would be able to input anything and output anything. It seems like it seems like a
anything. It seems like it seems like a big big sentence.
So look, world models are really important on that path towards AGI because what we want is to be able to have this technology, these models that
can understand our world but also can simulate it. That is a very critical
simulate it. That is a very critical aspect when you think about it. You want
an intelligence that can simulate the real world that you have not just from visuals understanding the dynamics and physics but also the textual information. That's what Omni brings
information. That's what Omni brings together. Now with that power, you want
together. Now with that power, you want that simulation because you want that model when it is going to make a decision to think ahead considering both the physical world and the textual world
at that like high level reasoning space and that is what it's enabling and so like for the average viewer what does that open up for us in terms of capability? So if you have models that
capability? So if you have models that can generate and understand text obviously we can write amazing novels about dinosaurs living on Mars. Yes.
And then we can also start to talk to our data, start to create a thousand versions of different reports for all of our different clients. But once we have models that can also think about fluid
dynamics and physics, what does that open up for business professionals?
What it brings is we now have models like fluid dynamics and physics that critical elements of engineering life,
right? Like um like when you think about
right? Like um like when you think about models that can actually be in that loop that can understand the problems that we are trying to solve in real engineering life
then you can actually start thinking about engineering and like engineering design can go much faster for business professionals like think about financial worlds like what they would like to be
able to do is really tap into the high level reasoning/ aentic encoding capabilities of the model so that like something like 3.5 5 flash it would
bring in all these capabilities so that you can go through ideas and you can go through prototypes in a much faster way.
So in different ways both of these models 3.5 flash and Omni is bringing together different capabilities.
So it sounds like there's a movement in terms of speed because 35 flash is the same sort of state-of-the-art but a heck of a lot faster. we get a little
bit of this spatial awareness. Uh, and
we've also seen updates come through with Google Maps as well.
And so I'm here thinking about simulations at scale. Do you find yourself like trying to run simulations?
Are you building out games? Like how are you using this stuff? Um like for me it's really mostly about like coding and like um going through the day with like
discussions with people and decisions and like getting the help there. But I
can really imagine that what we are getting through is to a world where with something like 3.5 flash that speed really enables a much more like high
frequency iteration of like getting things out of the model. That is really important when we always like it is important to define always the Pto
frontier of AI right like um when we when when we release the models starting from the Gemini one times right like we always had this flash and pro and like sometimes ultra as well because it's
always about like you have the highest capability but of course like that you're trying to get the most out of the model there and then what happens is 80% of the time you would like to actually
be able to have that frontier capability, but you also want it to be efficient, right? You want it to be fast
efficient, right? You want it to be fast because then it becomes it can become a part of your daily life, daily iteration quick immediately right?
Like you want to be able to use these things on your phone immediately and and and see the effects, see the results.
And that's what Flash brings.
Not to mention cost effective.
Like I I'm fine with things that are very fast and especially if they're cheap. That's a great bonus, too. So you
cheap. That's a great bonus, too. So you
had talked a little bit about 35 flash coming out. We saw that there's new
coming out. We saw that there's new agentic coding in search. What I got the sense of is that the interface of how we are working is meaningfully shifting and
we were seeing the releases in terms of Gemini Spark. Um just this idea that you
Gemini Spark. Um just this idea that you can have like dynamic interfaces. So
we're not just in text world. we can be in video and video editing or in dynamic sliders like what is the interface of how we're interacting with AI in the
next year or so.
I think this is the exciting thing that is changing like this is one of those capabilities that you wouldn't imagine that would be the first interface that you would have by increased capability in coding right but what what is
happening is with increased capability in coding in search now you can have that interactive visual interfaces in Gemini app you can have models defining um what the interface should look like I
think it's an exciting capability and a new way of thinking about the interface between AI and people Because like you're asking a question, the model is preparing some content for you and it
should think about what is the best way to present this content for you to be and it's personal to you as well, right?
Like for you to be able to consume it and make benefit from it in the best way.
This is I'm reminded of like my bosses who are like never show me a PowerPoint again. I only want to see it in this
again. I only want to see it in this certain format. And now Google is kind
certain format. And now Google is kind of trying to do that for you as well.
Yes. And like um like the model is guiding that design. Yeah.
And like as I said, I think this is all enabled by coding capabilities and of course the speed.
Yeah. And and even just as you're saying this, I'm hearing different trade-offs in speed, different trade-offs in cost, different tra So can you talk to me a little bit about compute? You'll see,
you know, everyone's probably seen the headlines of compute shortages, chip shortages. Clearly there's a world where
shortages. Clearly there's a world where small models are helping in this space.
Are you worried at all about this comput shortage?
Um is it real?
I think like AI adaption is growing amazingly right. uh when we have
amazingly right. uh when we have products and people use them I think that's the great thing that we want to see that positive impact in the world like people are using people are adapting and what that brings is of
course like there's more request for serving all this so everything that we are talking about like visual interfaces dynamic interfaces all that of course they rely on models and like efficient
models are important there but also um one thing to remember is that as Google we have this long-term investment on TPUs and that's critical ical, right?
Like Google was really visionary like a decade ago when we started the research on hardware for AI and now what you see is with TPU8 that we just like um
released like cloud next right and u like we have the inference chip and the training chip because we see these two different kinds of uh worlds that are appearing like for training you need
different capabilities but as AI adaption increases in the world you really need that inference chip to make things more efficient. So, it's the models, it's the hardware, and the code design that goes together.
What I'm also hearing is you're less worried than maybe other people that I've talked to, maybe because of TPUs, but no, you guys have been in the chip space for so long. I'm I'm not surprised
to hear that. Um, so we talked a little bit about AGI. You you mentioned in the beginning, I I'm not going to predict or even ask you to predict like when are we
going to see robots folding our laundry.
um although I'm interested but can you give me a little bit about just like what we should be looking out for right so maybe not a specific timeline but what are some key benchmarks key
indicators that business professionals you are a CTO but for the average person out there who is brilliant in their space what little indicator should they be looking for to know AGI is coming or
AGI is coming soon yeah um like one thing to say maybe is Like it's always hard to predict what path the technological development will take but like I think your question is
great in the sense that what are the kinds of capabilities we should see there are some things that we know that are key to intelligence right like that high level reasoning capability is important being able to have some memory
recollection is important and reflect over it and continue to learn from it is important adaptation to ongoing conversation or ongoing work is really
important I think and like the most important probably is generality. So like we don't want to be developing technologies where
we are solving all the problems as we see them right now and like when the world develops and there's a new thing the models are useless right like the main thing is that generality and that
that generalization to new problems right like that's the key and like for for the models to be able to think for themselves learn by themselves and innovate by themselves I think that is
where we are we are trying to So I think people will see these in coding capabilities will increase definitely agent capabilities will increase all across the board like everything that you do you'll see more and more agents
being helpful to you and being a companion or a helper to you when you are getting through your day doing your work all sorts of things right I think like the first thing we should see is that surface area increasing
more and more agents because we're going to be able to rely on them because they're making better decisions and we are figuring out the right way of that user flow that as a user you need to be in control in the right times.
Yeah.
Right. Like the models and the products should be able to do that. So getting
that balance I think in the near future those are the kinds of things we'll see and then we'll move towards this exciting world of models thinking for themselves.
And in terms of surface area coverage are you saying that there's like a little dot in the corner and it hasn't spread out yet? Are you saying there's almost a a rug taking over the whole living room? like how much service area
living room? like how much service area coverage do you see now?
I think this year is the year that we see that expansion happening.
Yeah. And like with 3.5 flash that is where we feel like we can bring this to developers in a very capable way but we can also bring this to users consumers and businesses with Gemini Spark right
and that increases the surface area again like getting it right is important and that's what we are trying to do but the capabilities are there that we can
actually start increasing that surface area making agents a broader part of people's lives and of course we are excited to see how people are going to use them and we are going to learn from
that. That is a big signal to like for
that. That is a big signal to like for us to decide what are the next problems we should be working on.
Yeah. Any of the new releases I feel like people should be testing it seeing what it's capable of tweeting it out posting on LinkedIn because you're probably reading all them. Um the the
last kind of big topic before one last final question you had kind of teed up this idea of of learning whether models could learn and then you had also talked about memory and I feel like memory for
the last year has been a really big topic in the world of AI and agents just the idea that like Ally has you know her own model Kore has his own model and we want those to kind of behave differently
or at least our agents to behave differently. So I am a big fan of alpha
differently. So I am a big fan of alpha evolve. It is one of my favorite papers
evolve. It is one of my favorite papers in the last like five years and it feels like a lot of the labs right now are coming out with similar looking loops. I'm putting that in a
looking loops. I'm putting that in a really rough phrase but this idea that we are constantly iterating and slightly shifting some of the outcomes to eventually get to a much larger option
pool. Yeah.
pool. Yeah.
To then be able to figure out the best solutions for things. Can you just talk to me a little bit about where we are with self-learning in AI today knowing that you had literally one of the front
row seats this whole time?
Um, that's an exciting area. That's
really an exciting area in terms of research when you think about it. Um,
like being able to create this loop that the model can figure out what is there to learn, what is the next step for me.
That's the most important thing, right?
like that one step is actually a step that you can repeat many times and for the model to be able to make that decision I think that is really critical but if you take a step back like there
are multiple ways the models can actually improve or personalize or specialize depending on the conditions or depending on how it is used when we
talk about the technology and the models it always comes with its own environment right I think that is one of the biggest changes we are seeing with by going agentic but we are seeing is now you have the model but it has the
environment in that environment it has the basic tools like um it can code there right like it can sometimes browse there that's what we are doing with spark right like you have a browser there that is your browser and that
virtual machine is your virtual machine so what is happening is like um when you do tasks and when you want the model to repeat those tasks the model can take notes and of course like those are notes
only for you and then of course like it can like the next time you have that conversation then it is like localizing on those and making use of those to be
able to answer you the best way right or do the things that the way you like it.
So when I get my um personal daily prep, I have a specific way I like it, right? And I say that
right? And I say that do do you want to share what what way that is?
Cora only likes 50 bullets, nothing else. Only emojis.
else. Only emojis.
All emojis. U no I'd like to be able to there's always some documents attached and I always like the links to be there for documents as well.
So these kinds of little things are important. Like once you go through
important. Like once you go through that, you want that repeatability there, right? And these are little things but
right? And these are little things but actually make a big difference in the day-to-day adaption of the models and these are all things that that are like that are that the models are capable being in that environment that is
personal to you and like learn from those. So there is the learning in the
those. So there is the learning in the weight space and then there is the learning in the outside of the weight space in the environment because there's persistent memory and the model can
selectively right now the model can selectively modify their environment and be useful to you. The models getting to a state where they can update their own
weights. I think that is like that is an
weights. I think that is like that is an exciting feature. future like next five
exciting feature. future like next five minutes, future future next five years more like that right like um I think right now the first step for that is uh
when we are doing our own research us relying on models more so that the models in a way also learn to be useful in a scientific research environment. We
see that big jump happening right now.
We literally use anti-gravity all the time internally when we are doing our research when researchers are doing their experiments and then like of course like slowly this is going to lead to models doing more and more for
researchers.
All right, I've got one final question for you. This is a short one but imagine
for you. This is a short one but imagine that there is a 40 45year-old business professional who has just started dabbling with AI agents. What is your
hey here's your homework for the next week? like what is that one thing that
week? like what is that one thing that you think is going to really bring them into 2026? Show them the highest value.
into 2026? Show them the highest value.
I would say that before doing any communication and making any decision, try to ask the agent what like what you
want and let's see what the agent actually gives you. that will get them to be in a daily habit of for everything that you do how do you use AI in your
daily life and like if you force yourself to do it then you will explore actually for your own environment for your own work how the agents can be
useful like it's a habit so you need to build it I I love that and we talked so much about habits and iteration and you kind of almost teed up this idea I have this
visualization now where I'm in a sandbox box and I have an AI in its own sandbox and we're all just playing and experimenting and trying to figure out what is right around the corner. Cory,
thank you so much. I really appreciate it and I hope every single person learned something new.
Thank you very much for having me. Thank
you.
Loading video analysis...