Gemini Co-Lead on World Models, RL's Next Domains & Continual Learning
By Unsupervised Learning: With Jacob Effron
Summary
Topics Covered
- We Haven't Seen the GPT Moment for Video
- Scaffolding Gets Swallowed by Scale
- Memory Is a File System Problem
- Narrow RL Creates Unexpected Generalization
- AGI May Already Be Here—We Just Moved the Goalpost
Full Transcript
Oriel Vignal is the co-lead of Gemini alongside Nome Shazir and Jeff Dean.
He's had an incredible career in AI uh pioneering many of the breakthroughs in deep learning in the last decade. And it
was a ton of fun to get to sit down with him after Google IO. If you've been following Google IO, they basically shipped a bunch of products across a ton of interesting surface areas throughout AI. And so Oriel and I hit all of them.
AI. And so Oriel and I hit all of them.
We talked about what's required for further advances in multimodal models uh and what's going to make these world models actually usable. We talked about the uh increase in memory and the importance of memory and how the
advances there will look like reasoning these next few years as well as what Oreal thinks the path forward is. And we
hit on the state of scaffolding today, what folks are building and what Oriel thinks persists. Uh it's a ton of fun to
thinks persists. Uh it's a ton of fun to get to basically take all the top questions that founders, investors are thinking through and just pose them to Oriel. So I think folks will really
Oriel. So I think folks will really enjoy this conversation. Without further
ado, here he is.
L'Oreal, thanks so much for uh coming on the podcast.
Yeah, it's great to be here. Thanks,
Jacob.
Yeah, uh very exciting to have you a a day after IO. I know things have been uh have been busy, but I've been really excited for this because you're one of the people kind of most directly shaping the frontier of of models today and your
work at Google. Um and you obviously in in the releases that happened yesterday at IO, they hit on like pretty much all the themes that people are thinking about in the in the space, where these products and models are going. Uh, and
so I feel like there's just our goal today is to to talk through kind of the research behind those announcements. Um,
you know, where where this is all headed, you know, the kind of future p path of RL and post training and, you know, get your read on the uh on the space as a whole. I figured where I'd start was with world models because I
think that was just a really impressive part of you know of yesterday and also I think a pretty you know where Google's pretty distinct from a lot of the rest of the field. So you obviously shipped this incredibly impressive world model
in Omni yesterday and you know I think Demis has talked a lot about you know seeing world models as a path to AGI and it's interesting right because it seems like other labs maybe are more focused on code and you know getting to
recursive self-improvement um and so I'm wondering if it's a fair characterization and you know why you think uh you know you and the team in Google have been somewhat uniquely focused on this uh world model space.
First of all, I guess the the coding or like self-improvement angle is is a is it at a bit of a different layer, right?
So, you can certainly bet and believe that you know these models can can reprogram and improve themselves and it's something I've been actually quite actively working on at the moment. But
then the object that they improve the model whether it's multimodel um and closer or a world model as we call it and that even how to define that is a
bit abstract since day one and way before actually Gemini program started we were working on not just language but you know understanding the visual world
and kind of jointly modeling words in the context of um vision blank video etc. So I think that part um you know it's been at the core of Gemini and and
before our research and I think maybe one way to characterize it is you know language clearly there's a lot of information um collectively that we
wrote about the world. So that's clearly paid off big time. Um we've kind of distilled in a way all the knowledge written um and that is being written at the moment into into these weights.
It was definitely convenient that we put it all on the internet too. Yes,
exactly. Right. So, and and and also like this now with users, right? There's
obviously like also a flywheel effect, but at the same time, there are there is lots of knowledge in videos and images and what I I would say it kind of has
happened but softly. I I think there probably might be a big moment is how would you extract all the knowledge that you would acquire if you were to look at all the videos and images which we
certainly um use right in our in our training mixtures but could that knowledge somehow um add value and efficiency to the language component and
I think we've seen constructive um sort of um let's say transferred learning right from one to the other we see that
and we see generalization but probably what I would characterize as the the GPT moment of video and images. I'm not sure we quite have seen that.
Do you have any like uh uh thoughts on like what that GPT moment might be for for for video and image as you as you kind of have this intuitive feeling that it hasn't yet been reached.
Yeah. So at the moment we train um all the modalities we mix them and we keep enhancing the recipe. So Omni is a a good way to see that progress in which we not only input videos and images, we
we've seen amazing capabilities with long context um understanding etc. But we also now are able to output um you know bio but also interact with it in a
very natural way through language um editing it combining you know the the modalities in a way that feels almost almost magical right so that progress is
absolutely there but maybe one of the you know deep learning dreams and it might be an original kind of dream from way before uh large language models
would be hey can I train on all the you know image data without text perhaps as as a hard challenge and still somehow extract all the all the meaning and
nuance from from that modality or set of modalities and vast amounts of data.
Right? So could we train on all the videos ever produced and images and get to the same level of understanding that clearly the language models using language get to although probably
slightly superficially and some missing links with cause effect and so on that for instance Demis talks about often right so that is the moment that have I seen that probably not and most likely
we have the most advanced um or one of the most advanced you know multimodel recipe that mixes everything but that pure transfer um is I think one of the
core quest of machine learning um for the last decade plus.
I mean to the extent you you can talk about it I'm curious could you give our listeners some context on like what are still the key problems that need to be solved around this or as you think about like the the kind of you know the the types of problems that you're you're
trying to you know work on to further advance this. It's hard to describe like
advance this. It's hard to describe like the the solution space but the idea of um you know you could imagine observing or learning from all the video data and
then somehow deriving um you know the rules of gravity is one that is used um often right like how could you precisely describe how the world works based only
on images right and so the the issue there is linking language or these concepts as we sometimes call um to what you see in the image without the
explicit language linkage um is fairly tricky, right? So, so what you end up
tricky, right? So, so what you end up doing is trying to explicitly create data sets where there's some sort of correlation or connection between the images and video and some language like
maybe it's labels or descriptions and so on. But of course, the amount of data
on. But of course, the amount of data now at your disposal is much less because we haven't clearly described and transcribed every single piece of media out there. So I think that's kind of
out there. So I think that's kind of extracting that those concepts in the purest form not in just some language that we associate to the words and what we see would be very very powerful and
there's lots of early research on uh discrete representations representation learning and I mean that's one of the things that probably I would say is in fairly research stage. So it's not
something we can possibly scale up but I think that's one of the possibly I'm not sure it's needed. I mean whether we agree with that or not is another question but it's if it was to be unlocked it would be massive.
You mentioned kind of this the term world model and how it's thrown around a bunch and you know obviously uh you you kind of Omni was was positioned as a world model and I'm curious you know how you thought about that categorization versus you know you obviously had really
good video models for a while right what makes Omni like a world model and and you know how is it different from kind of the the generation of video models that you guys have been working on? I I
guess a pure aspect of world model would be representation learning, right? So so
you could imagine we take these modalities like the the the videos which are like a sequence sequences of images or even just images and then um
compressing that into sort of a set of concepts and what that those you know the movements the objects etc are within those. um that's kind of called
those. um that's kind of called representation learning and it models the world in a very compact way uh that compresses away uh what's probably not
relevant right so probably that one is the more classical but also probably not exactly what we mean or we see or we feel when we interact with omni right what what you see there is a bit more
about um you being able to really change how the video um behaves or the kinds of videos you're you're getting out of um
an initial maybe image that you ask to animate. You explicitly ask all the
animate. You explicitly ask all the movements um or even like actions that would be like move forward and you can see that being kind of precisely
simulated. And so that is more of like
simulated. And so that is more of like the world model itself is acting as a renderer of the world um that you can really just change by a language and
then having that now object besides being a cool product to play with of course like we love to generate you know all sorts of different um you know movements or situations and so on very
rigidly. Uh it could also meaningfully
rigidly. Uh it could also meaningfully um add maybe a dimension of simulation that could make us you know use for example things like um prediction um
before acting in the world and of course obvious applications um for for these kind of 3D or video world models would be uh clearly you know self-driving cars
or or robotics. It seems so relevant to robotics and it feels like um you know everyone's kind of still trying to figure out the right data mix of simulation data you know versus versus you know forms of teleop data and
egocentric video data but it seems like as these simulations continue to get better uh you know it's more and more of of a compelling thing to put in uh into the data mix and I'm curious like you know does this work then directly
intersect with you know the broader robotics work you all are doing and and how do you think about what's actually required to you know append robotic actions onto uh you know these types of models. there's a bit of a also
models. there's a bit of a also beautiful connection because of course if if we acquire even if it's obviously a bit more expensive um or time consume
consuming but if we get more data that is captured from robots that we we certainly are investing in you know that data could make it into the model enhancing the world model capabilities
themselves and then the other direction which is kind of what you're asking about perhaps is okay now we can simulate um and we could create lots of different
scenarios in which these robots or you know whatever um 1D 3D groups etc could be training on without the cost and the time latency of the physical world right
so for the latter to work better I mean it's it's still a very open problem there's also all sorts of issues with transfer but the more powerful these models get clearly there's kind of a
inflection point where things start to be worth doing and and and we might see an acceleration in robotics indeed in you know that definitely we're seeing in
the hardware space lots of investment so things are accelerating and picking up there but but for the world models to be useful at least from my limited knowledge but of course I've you know
I've been able to interact with these systems and see them that the precision um of even grasping a model which we get for granted as humans the the the
visuals the exact you know how would this feel to your hand which is a modality we currently obviously don't even have data for um and then the the
the exact forces how things would move it needs to be very very accurate right so that's where there's a gap and and perhaps then some creativity and research is still required and lots of
investment in robotics over the years uh but it's promising and at some level maybe not at the precise motor control but at the kind of planning and gross
we are going to start seeing how these models accelerate our progress into the quest of robotics.
A huge part of these models is kind of like you know learning implicitly learning physics through you know consuming lots of of of video data. And
so I think you mentioned gravity is like the canonical example of what people look for. Do you have any kind of gut
look for. Do you have any kind of gut sense being so close to these models of like when you think that will just be a solved problem within within world models?
Yeah, it's a good question actually.
You're kind of you made me think about evaluation, right? like how would you
evaluation, right? like how would you evaluate if you train a very good you know video.
Yeah. How do you evaluate physics in a model?
It it yeah it is a good question right you could imagine the problem is as soon as you add language all of a sudden that knowledge is is there in in the weight. So if you
ask basic questions about gravity of course you would answer them by just having read um you know explanations of them online and so on. So you would need
to somehow connect the the concept of gravity which could be present or not in a world model to then decoded that into an explanation that would satisfy you
know maybe initially would be some basic explanation later on could even derive like the the the equations and so on that's how can you you could build an
ebal I don't think to my knowledge we we've been thinking about this from this point of view there's definitely lots of early work on an unsupervised machine
machine translation where you you would try to translate to a language that you would never see during training and you you could align the representation. So
there's probably some ideas on you get a language model that can speak or you can decode from you get this world models that would create this kind of concept conceptual level understanding and
aligning both you know there's some papers I mean these are like old papers the one I remember from I think it was Stefan Gaus at all was from 2014 but then you could try to start decoding
that and converting that to an ebal seems then you know a trivial step but again these need to then be meaningful from like an application point of view.
So, so ultimately you could also say look I mean we have a world model can we decode or I don't know like induce movement in a complex system um from its representation for example right that
would be another indirect eval so many ideas but yeah are so important always you know shifting gears to some of the other stuff that you you all shipped yesterday you know I definitely want to talk agents um you shipped some really
interesting consumer agents in in Spark right as part of uh as part of IO and you know I think it's such it's so interesting because um it seems like uh from the outside at least like a like a really improved version of some of maybe
the stuff you guys had explored in project mariner in 2024 and some of this other like computer use work and so it it does feel like there's been a real step change uh in in in what the capabilities are and so I'd love to hear
you just riff on you know the research breakthroughs that enabled that and then kind of how people should think about what these agents can and can't do today. We knew that was going to be a
today. We knew that was going to be a very important modality actions right acting and changing a state of let's say a digital computer and then I think the
as you evolve and make the model better um you start realizing sort of out you know you get the model really good and then you focus on the system building a system around the model then optimizing
the system and the model jointly as much as you can and so on so forth. So in
terms of what creates the delta or the increase in capability, it's mostly focused and about it's about sequencing sort of releases. Um and and also in
some sense the model capability needs to reach certain you know level for you to then be able to dream about what's the next stage of capability, what the model might do next.
Yeah. And I guess we one thing that's so interesting about the consumer footprint is like there's just such a broad array of things people want to do with it. And
so, you know, I I I wonder like to date and how you see this evolving over time that work of model plus system, how like bespoke is it to subcategories of the problems people want to do versus, you know, incredibly general and like, hey,
you're just optimizing a a system and model combination that works across pretty much everything you might want to do in Spark.
There's always a sequencing to specializing to something that feels controllable and and that, you know, very useful already. um where you know if you look at Spark I mean it has
access to um you know information that it would be needed for it to be able to assist you in sort of scheduling and organizing your day and even thinking about how how you should you know tackle
different problems because it has this very rich context. Um so it's useful to build kind of the system slightly more narrowly around something you care
deeply about. But if you if you look at
deeply about. But if you if you look at the history of machine learning and deep learning, we always go from you know the components we're building are general.
So so and and there's a big hypothesis which goes a bit to the world model point actually that training on everything jointly must be better than
just focusing narrowly on just one domain. So even from the modeling
domain. So even from the modeling perspective that is very clear. Um but
even from the systems perspective a system that is fairly generic and then based on how you instruct it or you interact with it you can then of course put it in the space of like hey I mean
this user wants to do this but I have all these capabilities let me just figure out which ones to use kind of a at train time not necessarily building it for that but building something
generic and then the specialization happens through a layer of intelligence right of the model and the generality of the system. I think that's fairly
the system. I think that's fairly clearly you know already here and then maybe sometimes in practice um you know limiting or making it maybe more efficient still makes sense to
specialize but the the special list to to general um we've seen it just keeps happening both from even architectures right I mean the transformer was a machine translation neuron net right and
now it does everything from omni to you know um controlling your computer so yeah I think that's that's a step that I ect.
You've been vocal about the bitter lesson over the years. Um, and I'm curious like as you look out at the field, are there places where you think like it it's not currently being followed or you know basically places where you look out and you see kind of
structure or clever scaffolding that you think scale is just kind of eventually going to wash out?
Yeah, I think I think so. I mean one one one area that I I find exciting there's some research on this already kind of published is that
you know in the limit the system that we build now um sort of by coding um sometimes a complex sort of scaffold around the model you know multi- aent
sub aents delegation very long running that system itself is a piece of code that eventually the the model itself could write on the fly right so so you
could imagine um not having just a system that is very general but actually maybe no system and just the model being able to write those depending on what is
is it it's being asked to do like almost the most token efficient like highest quality output set of sub aents and whatever it is uh around around a set of problems.
Yeah, exactly. I mean we we've seen this
Yeah, exactly. I mean we we've seen this um also in the kind of one of the paradigm shifts we've seen in the last one year and a half or so is of course the reasoning models that um you know
can can reason for a long time in token space but of course eventually what becomes more important is should you reason for how long should you reason and adding that level of um intelligence
based on the complexity of what a user might be asking um will make it more efficient. So I think what what you do
efficient. So I think what what you do this what you do around these systems there is going to be a level of not sure exactly if it's going to be right it from scratch or some automation that
will make it smart to create the right scaffold for the right task on the agent side you know I think there's a lot of everyone's messing you know and experimenting with building these kind of longrunning agents and I think you
know uh obviously they run into all sorts of issues trying to get them stable across hundreds of steps how do you think about what's required to get to like further agentic reli ability.
Yeah, I mean I think that the answer to these questions in the most obvious way is is kind of improving both the the scaffold around the model. If you think of how you train
the model. If you think of how you train a neural network um it trains on some distribution of tasks or modalities or you know how to connect different words to video or whatnot, right? All these
are all about how you train uh or pre-train or post-rain these weights. So
if you think well there's a new type of work or modality that requires these very long running you know systems that need to somehow um learn from this very
long context which which we have also al always kind of innovated and pushed in 1.5 was kind of our long context breakthrough then um then it becomes
obvious that the model also will catch up right to to to meeting the users and the futuristic use cases and that's a bit of the researcher challenge, right?
Predicting what can be possible and then focusing not only on building a system that is robust to that, but also how how would the weights get less unhappy or
happy about when you push all the context and all these crazy things that you you do and not just hoping on generalization from the prompt that induces that behavior, so to speak.
A pattern everyone's trying to figure out is like memory, right? and how to kind of like solve this across uh across these agents. Do you have any any like
these agents. Do you have any any like gut instincts on on on where you think that ultimately gets solved?
Yeah, I mean memory is is fascinating. I
think um since very early days, right? You you
you can sort of think of uh I think I think initially we characterize this and I this is probably through biases from actually damis having had a PhD in in memory systems in the brain, right? But
um you know there's there's a few ways to think about memory but I mean the simpler one that I like is um you know just working memory right things that are very present because of what we're
doing or we're talking about and then what's called episodic memory um that is kind of kind of a retrieval system that you can access um and it's probably less
precise but of course longer context or potentially has all the context that you know you or I care to remember right holistically with all our experience
that is accumulated. Now the there's not only two levels of memory but it's useful to think about these kind of levels of memory um computers have the
same with cache L1 L2 and so on. So, so
when it comes to models, I think working memory um because of transformers and and so on is is we have a very powerful
mechanism to to kind of use that memory um you know have hundreds thousands millions of tokens at our disposal to modify that that memory and then do
amazing things with it. Um proving
complex theorems, you know, gold medal level maths and so on. uh and I think what where where where I'm seeing a lot of momentum is through then how can we consolidate then things that happen
either previously in in different interactions or throughout an interaction that might be longer than you could possibly remember in this working memory. How do we store that
working memory. How do we store that knowledge and you know through through different experiments um I think other you know like I think now it the the
standard name is called like what we call skills but but it's more general than that we do have access because it's an agent to a memory system which is the computer itself. So you can start
computer itself. So you can start thinking about writing you know your thoughts into files structure it into directories or folders um and doing that
as you either interact with the same user multiple episodes or a very very long episode. So the mechanism that
long episode. So the mechanism that is fairly good at the moment but again I don't think the weights of the model have caught up to this is this adding
this kind of knowledge base into a file system or you know any format that is storage that you can modify and read from with some basic retrieval mechanism. So that's very powerful
mechanism. So that's very powerful already yet I think there's still a lot to be untapped there. I think many many of us call this kind of a form of continual learning. Um but I think the
continual learning. Um but I think the mechanism I I want it to work kind of or I mean it's going to clearly work better and better is this kind of file system
style like non-parametric. Um it's a bit more convenient than integrating those back into the weights because even from a practical point of view we try to
serve one model at scale. So what what it would be really like painful to have to serve one model uh with different memories um to users. So even from a
practical point of view, I think we'll see better evaluations and and and ways in which these models accumulate this knowledge as they interact and I think
that's probably paradigm shifting as well in a way um similar to how we saw reasoning you know a year and a half or so ago. Does that look like everyone
so ago. Does that look like everyone having models that then have their own, you know, the file systems themselves being distinct or do you think over time people have models whose weights look different based on, you know, what they've done?
Well, as I said, the ways different would be would be a a challenge.
Yeah. Hard to serve. Yeah, it would be it would be I mean, if it's the best way, then we'll find a way, right, to have hardware that um of course we have lots of investment as well on on hardware design that would allow you to
have more personal weight, so to speak.
But um at the very least of course you will have your own knowledge base that is maybe personal to you. You're seeing
already like many examples of this uh realized over you know the last maybe even years in LLM space. Uh and then perhaps there's another layer of knowledge which is more common to all
the users for a given model that you could imagine having access to and uh enriching or enhancing the model capabilities without touching the weights. So that's very interesting and
weights. So that's very interesting and you know getting to that would be awesome. I feel like continue learning
awesome. I feel like continue learning has been, you know, the the topic dour and everyone's talking about it. And
you've seen, you know, kind of a few interesting examples now and high-profile examples of folks spinning out of, you know, OpenAI or other places and saying, you know, hey, I'm, you know, sure, I mean, you can keep scaling what we're doing now. And I think that,
you know, no one's denying that those scaling laws are there, but they're saying, you know, it feels like you need kind of almost a new research bet, uh, you know, to, uh, to achieve like real continual learning. and you know maybe
continual learning. and you know maybe it makes sense to pursue that outside of you know the kind of path of continually improving these core LLMs. I'm curious what you make of that whole dynamic and and you know yeah maybe your your
reflections on that. I was in Google brain very early days and then moved to deep mind in 2016. And at the moment I think the the
you know there is there is a a challenge and an opportunity on to you want to obviously have um investigate some research questions that might not be vi
for hey in the next three months this makes it into the next training run. uh
but at the same time this cannot be very disconnected from the head right where the LLMs are moving. I mean we're improving Gemini I mean it's fascinating
to see flash outperforming pro that of only few months ago and that keeps happening. So keeping kind of at the at
happening. So keeping kind of at the at the head of capability which which might enable or disable certain research whilst having the protection for research and of course that's not
multi-year anymore. things are moving
multi-year anymore. things are moving fast but but kind of combining these two is kind of the magic of building these organizations and all of us of course have different angles and have you know
can can kind of figure out how to bridge this and and identify the opportunity that's a bit of what it takes right to to to I mean not have full visibility this is
too large of an organization but um have some intuitions and and and then be able to pull in these ideas um eagerly sometimes right because it feels like the right thing to do. So that's really
what um defines actually organizations at that level, right? From a from a research perspective. So I can see from
research perspective. So I can see from investment in robotics to of course the the peak of the LLMs um to research that either has made it or will make it through, right? So but it's challenging.
through, right? So but it's challenging.
It's a it's it's resources are constrained. So it is it is an
constrained. So it is it is an interesting trade-off and and not one always you get right. I mean I think it's um it's a fascinating kind of different angle of research not just
what is the idea that will make it to the next paper or now into the model but actually how to even organize this this whole um yeah whole organization. It's
fascinating and this feels like one of the most interesting questions for for someone in in a role like yours where it's hard not to feel excited about the like so many things that you can advance with these models today and there's
there's obviously so much going on. Um,
and I feel like, you know, you even take an organization like OpenAI, they've kind of oscillated between like, hey, we should go, there's just so many lowhanging fruits and things to go do on the AI side to, you know, now this kind of more focusing moment where it's like,
god, we've just got to really nail code and and catch up to to cla code. Um, and
I'm wondering how you think about the trade-offs of like, you know, focusing on the one thing and and, you know, having the or all rowing toward that versus maybe a broader surface area, all of which are are super interesting.
You know, Google is in a unique place for a couple of reasons. First, we
indeed have a lot of surface area in in Gemini at the moment, right? This
literally powering everything. Um, but
we have the advantage that it's already like people like the other parts of the organization are completely bought into the LLM era. So, in a way they take the
model and then they might do something.
But um if you feel like that's not the next way to advance frontier capabilities then you know you can just rely that there's a very good group that
will take the model to where it needs to go right. Um at the same time um we have
go right. Um at the same time um we have stability from hardware procurement um and obviously like also investment um of capital given like it's we're very end
to end in terms of you know revenue streams and so on. So you you can probably push a little further the the risk takingaking for certain research
areas which which need to be done with taste as well. uh so you have kind of this it's not focused but it's it's a scalable because of how Google is
organized and then you can still invest in innovation which is at the very core of what we've always done right like I mean if if I look at brain and deep mind the two organizations I've been part of
of like now called Google deep mind which is which I appreciate given I've obviously been in both over different periods of time then I think you know like there is in our DNA to keep innovating
But at the same time, I think what Gemini created is focused and unifying force which was fascinating to do. It
was very helpful that me and Jeff had known each other for many years and had gone on trips together like just for fun, right? So I think that that time
fun, right? So I think that that time though was very special and I think that's the the center being the Gemini kind of core modeling effort being very focused on frontier capability and then
having kind of these inputs and outputs is a fairly you know reasonable way to to um to go about um being focused but also being able to leverage right a bit
of exploration uh which might still be needed or not right I mean I I think do we need world models I mean if we make it work. It definitely will need it. If
it work. It definitely will need it. If
we don't, maybe it's okay, you know, but it's good to have the bets as well placed rightfully.
On the model side, maybe switching gears to to kind of just Gemini models and and you know, the the path forward, you know, I think you called post training before still kind of a total green field. And I feel like what we've seen,
field. And I feel like what we've seen, you know, clearly uh there's incredible progress on post training in RL in, you know, coding and math. I think there was just a new math problem solved hours before we we came on this podcast. what
everyone's trying to, you know, figure out and I'm curious for your intuitions is the characteristics of like the next set of domains where we'll see uh RL really take off. Um it feels like we're on this crazy exponential path on on the
coding math side and um curious your intuitions on uh on on what makes other domains, you know, good fits.
Yeah, I mean it's a good question. I I I mean one must be quite humble in terms of the models are really good at many things. So so it's very hard to say
things. So so it's very hard to say insanely good. Oh yeah, you know, like
insanely good. Oh yeah, you know, like yeah, this this doesn't work at all, right? Like I mean almost bare prompting
right? Like I mean almost bare prompting and a bit of like, you know, smart prompting, maybe building the right system, lots of amazing things, at least on the digital world, as I call it, like
digital AGI, if you will, are like very impressive, right? So, so I think
impressive, right? So, so I think there's when I said that post training is green field, I think that's less about like a capability that I feel is
kind of very far from being you know at the level that is acceptable from a hey like this is this is fairly intelligent and fairly advanced and and more about
just mechanistically looking at um how how some other efforts that have leveraged kind of imitation learning or pre-training plus post-training, right?
Like and how much investment there has been compute-wise in post training versus the relatively speaking smaller amount that uh today's models use um currently. And I mean the reason there
currently. And I mean the reason there is kind of clear and and not sure it's easy to fix but um the fact that even if you take a a very narrow domain like go
as you play the game of go in reinforcement learning right you you have now a system that can play it it places a few moves and a few moves into
the game that scenario that game is now unique I mean you've never had seen that particular configuration so so the environment's complexity
um as you play makes kind of trading data infinite for free, right? like I
mean you play a few moves now you're in a new situation and so you can learn from it and the more you play the more hours you put into your RL algorithm uh the more knowledge you gain right so
that is what we've seen kind of in the game reinforcement learning era and in LLM's we are data limited and it's what
is the source of infinite complexity um is not so clear I mean there are some ideas but I think um cracking that recipe could be big at least in terms of
the the beauty of the algorithm. It
would be like it would be much more satisfying knowing how this has worked in the past to see it work now in other lamps. Now is it needed? Are the
lamps. Now is it needed? Are the
capabilities not there? That would be hard to to say. Um, but since you ask about which capabilities, I think I think the capabilities in terms of what
the models do that are most fascinating to me is like I call these meta capabilities. They're not math or
capabilities. They're not math or coding. They're like kind of what are
coding. They're like kind of what are the traits or attributes of intelligence and can these models do it. Right? So
actually the ability to continually learn or learn from experience very efficiently that would be one you know in context learning we used to call them metalarning whatever right these are
this is a capability that I can sort of measure or feel and probably it's not super super good yet right for example
um of course instruction following is a capability that you could argue is the ultimate capability because if I ask a model b ai it either follows that instruction action or or or doesn't
right so so but I mean trying to look at these capabilities that are less about one particular domain or or you know vertical and more like okay that is
intelligent behavior um and so the ability to to learn and adapt rather than the ability to you know be a professional player or um uh imo gold
medalist or whatnot is what I think um fascinates me the most when I look at new releases and models that we are getting our hands onto to every time we we train a new model, etc. Do you have a go-to way to like test
that?
Um, I like games. Uh, so I usually might define kind of a new game in context, right? Or or you know, this this is a
right? Or or you know, this this is a fairly classic way to do it. Of course,
you need to be careful because if the game is in the weights, if anyone else has put that game on the internet, uh, you're you're in trouble.
Yes, but I remember I think there was an eval. This is not exactly how I do it.
eval. This is not exactly how I do it.
Yeah. Actually, I'm I realize I'm being rude by asking you to talk about it because then this podcast will be out there and then the the next models will know how to do it. No problem.
Yeah, maybe. Yeah, hopefully we we unless we need to crack models, right, for unless it's, you know, fully transcribed, which I'm sure it will be.
So, maybe we don't even need that. Uh
but I really like an eval. I think this eval is actually very old. Um and um I mean way older than like uh LLM's um it
must be like let's say 2015 minus like probably before 2015 and the eval was simple. You give um the instruction manual for I think it was
Civilization the game and then you know you're meant to be able to play it right. So so I like that style of Uba
right. So so I like that style of Uba like I'm I'm you know you can kind of create this uh differently. Um but
that's one test that I like the to test the models and they're not that good especially as the games become either something I just invented and whatn not.
Um and the ability there is dual right like you could imagine first can you understand the instructions and from there follow follow the instructions to play the game but there's another
aspects which is as you play the game you learn to play better. So, can you do you see that happening in practice? And
it's impressive, but again, if you go very out of distribution of a game that could could be real, but still not in the training, um, this one in particular is not an easy test to for the models to
pass, for example. Right? There's many
others, but this one I really like and it brings games in a in a way that is useful. Um, yet you you will not train
useful. Um, yet you you will not train on the game at all. It's not about Go where you only train on Go. It's like
the opposite. But I like this kind of thinking for for a capabilities point of view. I mean obviously it feels like
view. I mean obviously it feels like there's been a lot of effort there's you know games were were kind of the canonical first example of like an a verifiable domain and you know you've had this with coding now and math and and I'm wondering I feel like a big kind
of outstanding question in the field is is the extent to which we'll see like generalization across RL right it feels like sometimes these models hill climb incredibly well on the domain that we're alling on and you know you'd have a better insight into me into whether you
see that then you know uh flow through to other aspects of the model but like I you know it feels like in some ways it it's it's almost an interesting you know we talked you know, the most general bitter lesson type moments. This is a
moment of like, you know, find data in a particular domain, RL against that data and like improve the model on that one thing. I'm curious, does that feel like
thing. I'm curious, does that feel like a fair characterization of of what's happening today or um and and you know, uh yeah, have you seen kind of signs of that of that generalization?
Yeah, you look hard for for sources of hard problems that will induce um indeed either deep reasoning that we see generalization from actually. So like
reasoning models um reason mostly through you know let's say coding and math but then you see how how they reason about you know a question about whatever like you know I just recently
moved back to the US so I asked a lots of questions about um moving and like taxes and whatnot and you can see the reasoning is is pretty good and that's a
very hard um hard to believe that there it's been you know trained on on that kind of question. Um so we're seeing definitely generalization and you're
you're creatively trying to get you know more data that induces um you know deep reasoning and also deep indeed agentic behavior right that's
part of like the the um you know the recent the recent improvements that we're seeing is just finding those those sources but being limited to just
verifiability is is is definitely unsatisfying because most of the times for the things I want the model to do. I
would not even be able to write a verifier if I had all the time in the world. Right? So, so I think but but it
world. Right? So, so I think but but it feels like that there is a bit of an asymmetry between creating the solution and evaluating the solution and if
evaluating the solution is indeed um simpler than creating the solution which arguably if you think of some arguments on for example you know um NP hard
problems which are very hard to create solutions for but trivial to verify um it gives me hope that the models themselves will be able to judge even if
there's no you know fully verifiable way to judge you know whether a piece of code creates a beautiful game or an engaging game you know all these kinds of things and I think that's a very
interesting research and also in practice um um seeing lots of impact there already from these kinds of ideas so the more we do that the more we can
train on more domains the question is do you even need that or is just focusing on certain math and coding problems enough to induce this meta capability of
being intelligent at problem solving, right? I don't know. I mean, I think it
right? I don't know. I mean, I think it could go either way.
Do you have a gut instinct one way or another?
I I want to believe like you you need to train on a broad distribution. Um, and
that should help the model. Um, but it is very strong how much you get generalization possibly through pre-training. So maybe it depends on the
pre-training. So maybe it depends on the level of ambition of superhuman or what's the upper bound that these models can achieve. uh but ultimately I I feel
can achieve. uh but ultimately I I feel like training kind of as much in distribution as possible seems desirable in machine learning. So that's you know
one one of of the quests for for researchers to to crack in the next few months and years.
One thing a lot of our our listeners are thinking through you know on that are that are founders or building companies is figuring out you know the extent to which they should be doing work at the model layer versus you know purely building the the application on top. And
so, you know, I'm wondering, there's obviously been a trend of some companies are, you know, are doing their own RL on top of uh of of of models and saying, hey, you know, there's this specific class of problem we can go solve or, you
know, even, you know, obviously maybe most notably cursor is kind of, you know, in the coding space, been like we need to go train our own base model. I'm
curious your intuition on like when, you know, if that does make sense or when that might make sense and and and and when it doesn't. What I would tell folks is the value and we discussed this a
little bit the value of evaluations and a and as a as a sequence of you know and data basically like th those two are very tied to each other um that there is
a huge amount of value there. So no
matter even if you don't build your own model because you know maybe it's very early stage or or you just can't get access to the talent resources all the things um thinking very carefully about
how to evaluate progress on whatever the thing you try to do um will will be actually very valuable right and something that um might even become a standard eval that uh folks like
ourselves might even adopt or or or monitor. Um and of course the value of
monitor. Um and of course the value of data is immense given what we were discussing about post training in particular and the scarcity of you know enough data to be able to run this kind
of months of you know go training that we we we happily did like a few years ago. Uh so I would say that's where the
ago. Uh so I would say that's where the opportunity is. Um and I know there's a
opportunity is. Um and I know there's a lot of energy in the space as well in terms of you know people that are building. Um at the same time I think
building. Um at the same time I think building on top of a model um even though the model capabilities will keep moving and again not being obviously a an
investor a professional investor or or or product person actually but even just focusing on something that you truly believe in um might create kind of some
opportunity for you to have that space at your disposal understand it um you know get the users get critical mass Um and if it's something that others are
not focused on um big big players let's say um I feel like there's a lot of value to be created by specializing even the product even if you don't do any of the other things. No, it seems almost like certainly in the early days you
specialize the product, you know, you build on top of the models, get to some level of scale, learn the evals, and I think a lot of these companies are starting to try and figure out like do we, you know, do we then use that to uh
you know, uh to to post-train a model or to, you know, uh to to to do something.
And obviously the the trade-off of that is as as these models generalize and the capabilities improve, they're never going to, you know, be training across the broad swath of things that like the largest labs do. And so you're probably
in a in a a treadmill of every 2 3 months even if you get slightly ahead of state-of-the-art you probably uh have to keep constantly redoing it.
Yeah. Perhaps the angle here to and again another topic we discussed right as as these models are more capable to continually learn or or use a knowledge base that is possibly very complex. Then
you know building that knowledge base for a certain application can also be is not like training weights. It's a bit more efficient. Um but there might be a
more efficient. Um but there might be a lot of kind of uniqueness that you can add to it that might be just make protect you right from let's say someone
who hasn't spent a lot of time to to think carefully about how that interacts with current models and I that capability will only get better. So
perhaps that angle uh is a bit more scalable as well for for kind of a bit of early players in the game. I mean, I guess obviously it seems that there's such a a a compelling path forward on so many of the research directions we've
talked about. What's the capability
talked about. What's the capability you're like least sure how to get to from here? I guess you know where where
from here? I guess you know where where you maybe don't yet see the research path uh but but you think is pretty important.
I think I see the research path um for for quite a few capabilities. I mean I think I mean the one that's fascinated me the most over the years when I especially when I joined uh deep mind in
2016 metalarning or the ability of the models to learn that is I mean that's such a beautiful capability since you work on machine learning so that one is one that I feel like there's a path and
and there's some baseline now and it will keep improving uh but perhaps another one that I feel it might be a bit more I mean there
might be a path but I'm not sure how practical it is at the moment is I think people mention hey like can these models truly innovate um and I think that part
is important because for instance when you work on hey like can you can you come up with new ideas in machine learning and and then we implement them coding is excellent deploy them and so
on right we're experimenting with these um you know many folks are quite a bit truly taking all the knowledge we have
now and innovating um with taste is something that is hard to come by even from for humans is is fairly special and to be honest sometimes random it's not like oh this person is so smart look I
mean you just 10,000 people are trying and you obviously pick the one that was right and then glorify it right so I think that ability is
probably quite important for certain things like self-improvement and Yet I mean it's obviously difficult to to try to even evaluate and when it when something is hard to evaluate it
probably means it's hard to also um heal climb on. So the ability to innovate in
climb on. So the ability to innovate in any aspects but specifically on science for example uh is is a good one um that I think more progress is required. I
mean obviously I feel like move 37 was like a canonical example of this in the previous world. Like is there anything
previous world. Like is there anything you've seen recently that feels closest to this? I mean I think even before we
to this? I mean I think even before we started recording I guess you know openi talked about this this this kind of uh combinatorial geometry problem they they just solved. If I look inward to machine
just solved. If I look inward to machine learning um there is that's that's kind of the point. I don't think I've seen truly kind of outstanding ideas that a
model has generated yet, but I am sure I will very soon. Um because there are some insights and and ways in which the models understand how let's say a model
is being trained that feel superhuman because mechanistically these these models have access to a bandwidth of information we don't. So maybe that's part that part
we don't. So maybe that's part that part has been impressive. uh but I would like to see the at the idea level as well that level of impressiveness and machine learning is the obvious thing I can yeah of course
more accurately uh evaluate right so so yeah more to do yeah how do you reason about like you know when we get to this level of yeah genuine insights into into machine learning research and and kind of this
like world of of recursive self-improvement um I'm curious how you reason about like what that even you know uh or or how you even think about what that looks like over time and you know even just like basic questions like
uh does the bitter lesson still hold or like you know how what happens when like we we get into that world. Uh I I'd love to hear you just riff on that.
There's certain efficiency level that probably will be enhanced. So so I mean there's a level in which you as as the the researcher or engineer use these
tools to enhance your own productivity.
We we've seen that a lot. No, it's
always impressive to talk to someone at the cutting edge of their field and you know they're always like you know uh yeah the numbers always vary but some some pretty large percent improvement uh in in productivity across the board.
Yeah. So I think that one is is already happening and and obviously very powerful but there's going to be certain you know almost physical limitations to
how much this process can keep going right because I mean models need to be trained there's energy hardware limitations
um so I definitely I'm very keen to see what um kinds of problems that are uh to be kind of more let's say automated and enhanced can be done more autonomously
But at the same time um there's probably going to be a natural limit to to the speed at which things can happen and also a natural upper bound right certain things um I mean that was already more
than a year ago someone reflected something on me which I now feel very much which is um I mean at the mo at the point a model um writes English better than you I mean that's maybe too good
and it shouldn't be that good right and and I'm like okay that that's an interesting realization that even if you could improve that capability And if maybe there's no ceiling or the ceiling
is um you know still far away it might not even be that we need to see that ceiling. So there's the performance of
ceiling. So there's the performance of the whole system overall which is very impressive already and there might be upper bounds obvious obvious upper bounds in some cases but yeah I think
that the physical limits on the models and how you train them even if you think we knew exactly the recipe we could iterate very quickly and train the next generation models there is some
acceleration but there's some upper bounds and rate limits that are still fairly fundamental well I always like to end my interviews with uh with a a a quickfire round where I basically just stuff in all the uh
broad questions that I haven't had uh time to fit in elsewhere. And so maybe to to kick things off, I'm curious what what's one thing you've changed your mind on in AI in the last year? what I
have changed my mind. Um
I think the fact that even though I want to believe that um training on a broad distribution is is probably going to
enhance the model um training on narrow kind of po points of of great difficulty like you know maths or coding creates
this generalization. I think that is not
this generalization. I think that is not something I quite predicted to work as well as it as it did. I think demicetio that we're you know at the the foothills of the singularity and and AGI could
come in the next few years. Um do you feel similarly? I feel similarly and
feel similarly? I feel similarly and I'll say more right like if you I mean even with someone in the field like close to these models um and neuronets in general if if seven years ago I mean
I'm using a time that is clearly like pre you know all that happened in LLMs. If seven years ago I had to experiment with a model that we have currently
would I have declared this is AGI right like I mean and and I would say probably yes I mean it's um an ever moving definition is very impressive the the progress yeah so so I think just because now we're
seeing it closer um and it it is a good thing to be more ambitious uh about like what what it is that we're building but again based on different definitions or
perhaps even the expectations we might have had about what AGI meant even only a few years ago. I would say in some way AGI is here, right? So I mean all I'm
saying is like I don't think it is here in the way I want to see it, but it is fairly close. Um, and maybe this ability
fairly close. Um, and maybe this ability for the models to truly learn from experience is what is missing in my mind. But everyone will have their own
mind. But everyone will have their own kind of test or uh or bias I guess on to what the models um still feel like capability gaps exist and we'll get there and then we'll we'll
move the goalpost again uh and have some other some other reason. I think one huge advantage that you all have is is you know certainly uh incredibly bullish on on the models you're building. You
have you know your own hardware. Um, and
I think a question a bunch of I know my listeners will have in the back of their heads, so I'll ask it is I think one thing that you've done that a bunch of people were were curious to better understand was, you know, taking some of the compute you have and selling it to anthropic, right? And I think there's
anthropic, right? And I think there's been this narrative on Twitter of like, well, if you were so bullish on models and the research, like why not just keep all the compute yourself? Um, and so I'm sure our listeners would just love to hear your your perspective on that.
Yeah. How to invest kind of um compute even within, you know, like ourselves, right? like the compute is used to serve
right? like the compute is used to serve um we train small models even smaller models then trying to train frontier models. I think this is all like an you
models. I think this is all like an you know a fine equation to to balance um and I think just in general like uh one
way to think about alphabet is is um there are things that create um revenue and economical impact that then you can reinvest. So it's not just being greedy
reinvest. So it's not just being greedy about hey what should we do now and take all these things together um and and that's it right I think the the strategy
which again like I I I kind of think often about um is just multipronged and and I think the timelines um although we are bullish of course on on the
technology advancing um you just think of like you know revenue streams and so on and I think hardware is a very important um asset and I think um you
know there's probably a trade-off in which you don't use it all but use use it strategically to create um you know to reinvest it basically right and I think that's what the current what makes
what seems to make sense and I mean the the the obviously calculations behind these are complex right so I'm not going to enter into exactly the rational but I think in general it's just a strategic
choice to have different levels of investment and timelines in mind what's so interesting about your position is you are like the like the only frontier model provider with your your own like cutting edge, you know, or
state-of-the-art chip. Um, what does
state-of-the-art chip. Um, what does that like collaboration actually look like? Because it's such a unique motion,
like? Because it's such a unique motion, right? I mean, obviously like Nvidia
right? I mean, obviously like Nvidia works closely with other labs, but like they're not sitting under the same company. And so, uh, what does that look
company. And so, uh, what does that look like when it works really well?
As I was explaining, uh, before I, I I reflected, um, on several moments, right? Um, and and this was early days.
right? Um, and and this was early days.
I mean even even deep learning internally at Google had to be still proven out. I remember it must have been
proven out. I remember it must have been 2013 maybe 2014 where a bunch of us I think it was me uh Jeff Hinton Jeff Dean
and Ilia that were in a room trying to decide hey what what should the servers have I mean how many at the time we had obviously some CPUs some GPUs and you're trying to make a guess based on what you
know about the research where the models are going and you you can literally have that impact but of course there's delayed um rewards because like this this is just an investment and only you
know a few months if if if not years this will materialize in in data centers. So I've been sort of and I
centers. So I've been sort of and I thought that was amazing, right? Like I
mean obviously hard to answer the question. I think we tried I mean you
question. I think we tried I mean you try to predict what's going to happen in research and I mean in the early days that was even harder. Um but I think it's a very privileged position to be
able to really influence um and we certainly do that and and especially with with with Jeff who's obviously been in the um thinking about infrastructure quite a bit for many you know for
basically the existence of Google. It's
very interesting to then think about hey like these models are going this way um and then these investments because they have c certain latency um you know being under the same roof and seeing what we
see just really really helps and again I've seen it in the very scrappy early days and it keeps happening and getting better and of course uncertainly in some way it reduces which makes the job
easier but still a fascinating choice that has you know deep consequences on on then the you know the the the faith of the the company, etc. Well, this has been a a fascinating conversation. I feel like I could talk
conversation. I feel like I could talk to you for, uh, for a long time, but I'd be delaying our our our path toward AGI.
And so, uh, maybe I just want to make sure to leave the the last word to you.
Um, anything you'd like to share with our listeners or research you'd like to point them to, anything in IO? Uh, the
floor is yours.
I think it's a fascinating time as an anything in AI. So, if you're a user, use the models. If you're a builder, use the models to build. anything you do, even if you think there's no remote
connections to to AI. So, please, you know, play play with these models.
They're amazing and they will only get better. Awesome. Well, thank you so
better. Awesome. Well, thank you so much. This has been an awesome
much. This has been an awesome conversation. I'm Jacob Efron and this
conversation. I'm Jacob Efron and this has been Unsupervised Learning, a podcast where I get to talk to the smartest people in AI and ask them tons of questions about what's happening with models and what it means for businesses
in the world. As I hope is clear, I have a ton of fun doing this. It's a nights and weekends project in addition to my day job as an investor at Redpoint. But
our ability to get these incredible guests on really comes from folks like you subscribing to the podcast, sharing it with friends. It's really what ultimately makes this whole thing work.
And so, please consider doing that. And
thank you so much for your support and listening. We'll see you next episode.
listening. We'll see you next episode.
Loading video analysis...