Noam Shazeer and Jack Rae: Scaling Test-time Compute, Reactions to Ilya & AGI
By Unsupervised Learning: Redpoint's AI Podcast
Summary
## Key takeaways - **Test-time compute improves creative tasks**: Integrating test-time compute into models like Gemini has surprisingly improved performance on creative tasks, not just reasoning tasks like math and code. The models showed an ability to generate and revise essays, demonstrating a more engaging and thoughtful output. [02:31] - **Evals must adapt to avoid saturation**: Standard AI evaluation benchmarks quickly become saturated as models improve. Researchers must constantly develop new, more challenging evals, often keeping them private, to accurately measure progress beyond simple memorization or pattern matching. [04:44] - **AI can accelerate scientific discovery**: AI's potential extends beyond mimicking existing knowledge; it can drive novel scientific discovery. By identifying and solving complex, previously unasked mathematical or scientific questions, AI could exponentially advance human understanding. [26:04] - **Open source models are rapidly closing the gap**: The pace at which open-source AI models are catching up to and competing with proprietary models is impressive. Innovations like Gemma 3 and DeepSeek V3 demonstrate that open-source communities, with their creativity and compute resources, can continually innovate. [47:43] - **AI is revolutionizing education**: AI is creating a new paradigm for education, offering personalized, on-demand encyclopedias. Children can now access highly specific information, like the Latin names of plants or detailed lizard classifications, fostering a deeper and more accurate understanding of the world. [50:07] - **Test-time compute won't reach AGI alone**: While test-time compute is powerful, it's not sufficient for achieving Artificial General Intelligence (AGI). Significant advancements are still needed in areas like acting within complex environments and developing robust agentic capabilities. [23:30]
Topics Covered
- Why AI benchmarks are increasingly irrelevant.
- AI writing its next generation is the real milestone.
- AI research is still more alchemy than science.
- AI will enable unprecedented education for the next generation.
- AGI and LLMs are still massively underhyped.
Full Transcript
are there a set of Milestones that are
like meaningful to you I'd say when
Gemini 3.0 writes Gemini
4.0 I was kind of showing this to my M
like a couple of days ago Ms are the
ultimate test of whether like something
has broken the barrier from like the
Twitter sphere to like the real world
the mum Vibe check is a big deal what do
you feel like overhyped in the AI world
today and what's underhyped I have a few
but I think they're all too spicy
there's no such thing as too spicy on my
podcast it's always going to be this
kind of whacka moole of like what is
considered actually interpolating known
ideas versus like creating a completely
novel idea and for that one I'm going to
go to
Nome um gome shazir and Jack Ray really
need no introduction the two are at the
Forefront of Google's Gemini llm efforts
and I've been involved in some of the
most important discoveries in AI in the
last decade know him as one of the
co-inventors of the Transformer and
mixture of experts Jack key part of many
Deep Mind breakthroughs it's a real
privilege of the job to get to sit with
these two and ask them literally every
top of- mind question in AI today we
talked about how far test time compute
will get us and the spaces where it will
and won't work we talked about how the
infrastructure needs will be different
for test time compute versus the large
kind of pre-training Paradigm we also
hit on the impressive Pace at which open
source models have caught up with closed
Source peers and their reactions to deep
seek we talked about their kind of
reception and reaction to Ilia saying
that this test time compute Paradigm
won't get us all the way to AGI as well
as Yan laon saying this current
generation of models can't actually have
any novel thoughts we talked about what
it actually looks like to do Cutting
Edge AI research today and what their
day-to-days look like as well as the
Future model Milestones that actually
matter to them and then we also got
gnome's Reflections on character as well
as both of their responses to what you
know AGI means for the role of humanity
I think folks are going to love this it
was really just a pleasure to get to
speak with both Noom and Jack without
further Ado here they
are well noan Jack thanks so much for uh
for coming on the podcast oh thank you
are we sitting in the the very office
that the Transformer was invented no
this is a new building we were in uh 196
Charleston I think no probably about
half a mile Half Mile right so it's in
the air pretty close pretty close yeah
well many things to dive into today I
mean obviously want to start with you
know some of the latest Gemini 2.0
models um and obviously all the work
you've been doing around test time
compute and uh Gemini 2.0 flash thinking
I guess just at the highest level for
our listeners how do you characterize
where these models work today where they
don't work as well and as you were kind
of experimenting with them what
surprised you most about uh about those
results one surprising thing is kind of
when we started the the the particular
concerted effort to build a lot of uh
kind of uh research and into test time
compute into Gemini and then think about
shipping it is that we were really kind
of focused on uh starting out with
reasoning tasks so math and code were
like big areas of focus and it wasn't
really clear uh you know whilst we're
kind of sprinting in that kind of domain
we obviously want to broaden it uh
naturally over time but it wasn't really
clear like how that would work would
there be any kind of sense of
generalization would thinking be useful
Beyond those reasoning tasks if we're
just concentrating on those as
researchers and I think it was pretty
fun to see uh one of the early kind of
models that we felt like was uh uh had
been trained uh to try and match the
style of Gemini flash so it had been
trained with thinking but then it had
also undergone some kind of training to
to actually be just like generally a
nice style a nice like uh model to talk
with that it was actually like very fun
seeing thinking uh interact and improve
like creative tasks as well you could
ask uh the document to like compose kind
of an essay on a particular topic and
actually like a the thought content was
like very fun to read uh and and it
would go through like various different
ideas and then it would go through like
revisions of the idea or things that it
should cut and that was kind of fun and
then also the output like felt really
nice so that was one thing that kind of
surprised me any surpris for you know
well yeah I mean in general like I'm all
for like generality like let's train
something that that's great at
everything it is it is important I you
know and uh I I was skeptical at first
of like okay this uh intense focus on on
things like math uh but it is very
important to have uh have good
benchmarks that you know are going to uh
encourage you to um uh you know to to be
able to reason about uh difficult uh
difficult tasks uh you know because a
lot of things will drop perplexity um
like add more parameters to the model
and and memorize more so uh so so it's
nice to have the The evals that can uh
uh distinguish better you know some of
the you know more difficult problems I
mean what EV vals are even meaningful to
you at this point I mean obviously I
feel like you people are trying to Hill
Climb the same the same set of of evals
that feel increasingly less relevant to
to day-to-day work what do you guys like
do when you're testing these models or
uh you know how do you Vibe check them
it feels like we keep landing on an eval
we like oh actually we overlooked this
eval even if it's in math it's like okay
we've done a bunch of like math evals
but maybe like I don't know put them
answers only aim they're still
considered challenging and then it's
like done okay they're completely
saturated and we really like don't care
and they're small and why did and we
almost think why did we ever even work
on them and it's easy to like forget
that was so easy what we thought six
months ago would a couple of months ago
it was like considered really hard maybe
too hard for the model and then itly
like snaps to being trivial so right now
I do feel like uh there's always been
like a lot of concerted effort within um
within Deep Mind and within like Google
as a whole to develop useful evals but
um it's kind of very nice to see also
this is like a shared responsibility
across like many different AI labs and
even like scale his like thoughts is
really stepping up and developing like
calling Humanity's last exam was a
dangerous you know if every six months
we think it wasn't too hard it's a
dangerous title the really really last
examp exactly the 73 last examp it's
it's very very challenging in general
because you know EV vals get leaked you
know like once people start talking
about the evals then there's all this
text out there about the evals and um
and you know they're they're no good at
anymore because everyone knows the
problems and you know all the models
will uh know the problems unless you're
very very careful so there's you know I
think there's still a lot of work that
goes on into uh into having evals that
are are um you know that are private are
there a set of Milestones that are like
meaningful to you it's like hey when you
know when Gemini 3.0 can do X like
that's that's a really exciting you know
Milestone whether it's an eal or just
something that you've tried with these
models and they they can't quite do yet
I'd say when Gemini 3.0 writes Gemini
4.0 or I should say Gemini X writes
Gemini X plus one yeah I think I think
these reinforcement Loops are uh are
probably the most important thing to uh
to pay attention to and you know there
there are several reinforcement Loops
going on you know what uh the one I just
mentioned is probably the most important
one that we can actually use the AI
we're building as uh as a tool to make
ourselves more productive at uh at
building AI but then you know then there
are other reinfor enforcement loops
around you know data flywheels like you
know you you get a um you you have
people use use these models and um and
provide feedback and uh and make them
better at the things that uh that people
care about I think we'll see uh huge
acceleration from that and then there's
just the the uh the global excitement
and uh and funding flywheel which is I
think also uh seems to be kicking up in
the last few years yes certainly so um
you to your point I guess of having you
kind of uh armed with like a thousand AI
Engineers by your side uh making you
even more productive like where are we
in you know do you have the equivalent
of a a point. one of an FTE today using
some of these Gemini models alongside
your research yeah I think one benefit
we have at Google is like uh we work in
a very like structured monor repo so uh
and we have a lot of like amazing
tooling around like contributing to the
code base already so there's like lots
of angles where AI is kind of uh like
being pulled in as tooling for our own
development like uh I I think Jeff
quoted the statistic although I don't
know but like the number of what we
could call like pull requests that right
yeah like useful like a just like AI
like bug fixes or uh code reviews
attached that's already like that just
like gets pulled in one day I notice it
there I'm like oh that's cool I I can
now uh like do a lot of like apply fixes
that it already like spots but you know
that's just like one element where we're
already like pulling in AI towards our
own like coding development I think
we're incredibly excited about um like
agentic coding is just defin definely
very important and trying to uh get the
model to be able to um tackle more
open-ended and difficult tasks is is
definitely something we're very excited
about um and and I I think it's just
like in some ways it's very it's a lot
easier to orchestrate when we have like
a very defined way of like uh uh
defining libraries like build we have
these like build Rules and Things and
it's just like everything gels together
in terms of the whole code base very
well so it's I I can imagine it's going
to be like as progress kind of continues
it's going to be a very discreet moment
where suddenly lots of libraries can be
like very quickly iterated on uh within
our code base yeah and you get you get
you got your AI Engineers proposing
experiments to you uh uh what's uh
what's good to try it seems like these
models work super well and easily
verifiable domains and coding and and
math and you have obvious been among
them like how do you think about how for
some of these less easily verifiable
domains these models may end up scaling
and being useful I mean they're getting
better at the that at that stuff too but
um yeah it's uh it's definitely uh
definitely harder um I guess the in
those domains uh you know we're going to
uh need either better ways of verifying
or or more um human Fe feedback loops
yeah I think what's good is that we're
seeing is like even with the Gemini
model series uh they're able to follow
much more uh abstract instructions so uh
being able to try and provide a reward
signal over a qualitative piece of work
which maybe uh if a human were going to
try and give a feedback signal there
would be a quite a broad set of like
rubrics or a grading criteria or maybe
there is even more like a even kind of
simplistic sense of like what is good
style what is like interesting and and
so I I think part of the problem is
really training models to like take on
these like very broad criteria for what
uh how to how to like take in a very
broad set of Criterion and then apply a
reward signal once we have the reward
signal we can train with reinforcement
learning against it and uh and I think
yeah we're already seeing that kind of
like makes makes sense it's not like an
abstract thing anymore it seemed like a
very abstract thing maybe a year ago or
two years ago I'm curious a year or two
ago like did you expect that to work or
do you think this like whole path of of
research was really more toward these
like more easily verifiable domains I
feel like you expect it to work one day
and then it feels like there's a very
there's a very complicated stack of
things that we feel like we need to
solve to get there and then usually the
case it's like oh it turns out there was
a a much simpler
path that's how I feel about it anyway
yeah usually there are good surprises
you know more good surprises than bad
surprises You' released these models out
in the wild um what are some of your
favorite kind of like ways you've seen
people using them um and what would you
like to see more more people like trying
and building with these things actually
there's just like an update today
happening in the Gemini app where we're
putting in a much stronger model and
it's being integrated with basically the
full Suite of like Gemini app tools so
it's like all of the apps that Gemini
app supports uh from things like you
know Maps uh also like search
integration and now has like very long
context all these things it's kind of
like a fully featured Gemini thinking
release in the app and it is actually
you know it's a very enjoyable
experience that's what I think so far
people even for their day-to-day stuff
people
uh it wasn't clear to me whether they
would like to have to pay a bit more
latency to like get the model to think
about stuff before responding it does
seem to be the case that if people are
going to like pull their phone out of
their pocket and then type something in
that actually what we thought was a very
long time maybe a couple of seconds is
like a very small price to pay for
something that they feel like is a
better quality answer and also maybe
they can like sometimes look at the
thoughts and inspect them um like I I
was kind of showing this to my M like a
couple of days ago uh are the ultimate
test of whether like something has
broken the barrier from like the Twitter
sphere to like the real world yeah like
the the vibe the M Vibe check is a big
is a big deal yeah yeah and yeah I guess
she asked a lot of like what I would
consider very generic questions to ask
uh a model like what is the meaning of
life ohing yeah she went for what is the
meaning of life and she really like sat
and read it for a long time and then she
read the thoughts as well and she and
then she kind of like contemplated on it
and she felt like she I don't know she
seemed to kind of very much appreciate
the the presence of like thoughts as to
like how to even go around such like a
an open-ended question so more folks
building philosophical conversations uh
with these uh with these models well I
mean one thing I think is super
impressed about them is the multimodal
capabilities but it still feels like
those are very underexplored uh from an
application perspective not sure totally
why that is but yeah I think right now
uh in some ways we we're quite modest
about the multimodal capabilities of
Gemini I feel like the model has always
been incredibly strong at image input uh
image input plus thinking is like
actually like remarkably good I would
say I say I see a lot of people kind of
like red teaming the model on things
like onx and uh trying like difficult or
challenging like images visual reasoning
problems I think that is working pretty
well there is also like at some point I
some of those are kind of toy vales but
but pulling in multimodal with then
agentic tasks uh is super interesting as
well so like we launched Mariner uh last
December which was like an agent which
like uh uses a browser and things that
has a lot of multimodal like aspects all
built in it was super important that
they could get the model to be
incredibly strong at like like not only
kind of scanning a screen but like
really understanding it and knowing how
to act on many many different types of
websites so like pairing that kind of
capability agentic quite open-ended
maybe you really need visual
understanding of a potentially messy
scenes with with thinking as yeah
something that I think I'm feeling very
excited
about yeah that's fun I mean it made
sense to start with text you know
picture is worth a thousand words but
it's a million pixels so like the text
is still like you know a thousand times
more um more information dense uh you
know just uh to do math on cliches but
um uh and and then there's also like a
lot more uh just in the main training
data for text you know we have so many
examples of text which actually kind of
represents you know the way humans uh
receive and produce information uh um
you know through language but uh you
know we we have a lot fewer examples
where for something like say image
generation because you don't have
examples of people generating images so
so things uh things are a little more
challenging but we uh you know it's as
Jack said uh you know great stuff uh
great stuff happening yeah I mean you
mentioned product Mariner I'm curious
like you know I think you both talked
before about you know to to get really
you know agents more widely used you
kind of need to solve both complexity of
of of of reasoning and also reliability
how would you kind of characterize where
we are today uh in terms of applying you
know Gemini models toward these problems
and like what is the actual path to
getting better uh at a bunch of these
things I mean there are a lot of answers
I mean one is just make the model
smarter that that that will uh that that
will always help um and you know know
most likely the you know um you need
very general General Solutions uh for um
you know for for these uh uh control
problems you know just just like you
need General solutions for the
intelligence uh uh problems because you
know people are going to use these
things in so so many unforeseen ways you
know like you cannot you can't can't
anticipate it the users
are smarter than uh than the developers
in uh in figuring out what uh what the
use cases are you know that no one no
one envisioned like when they invented
the Internet like what the internet was
going to be for no no one
envisioned uh when they invented
computers what computers are going to be
for and then um you know uh uh AI is you
know is getting so so General these days
that it's um it's even more more true
we're just building uh building a
product with with billions and billions
of use cases that are unanticipated so
uh so we will need to build the uh
General solutions for uh for all of it
how far away does it feel like we are
from that kind of like next level of
complexity and reliability I don't want
to give a precise uh like is it 6 months
is it 18 months is it 24 months but um
I think a lot of really the a lot of
really the time actually is not so much
dependent on the in my opinion is like
not really about like the core
algorithmic AI development there's part
part of it is just like there's a lot of
uh even like engineering challenges to
really changing the whole way you train
to being in more complex agentic
environments that that has like some
like almost constant time but
non-trivial cost to like switch how we
do research
um so I feel like agentic
research it seems like a lot of The
Upfront challenges like it's it's no
longer going to be very simple prompts
and responses we're now going to act in
an environment so how you define those
environments that that kind of angle
though is at the very least been
something that deep mine has been bought
in on even since I joined in 2014 it's
always been about uh there's part of
building AGI which is like figuring out
a really good Agent and there's part
which is figuring out a really good in
general environment um I I don't think
we like anyone has solved the perfect
environment yet and uh you know there's
there's some obvious ones there's a
notion of like using like a web UI and
and and like kind of being able to
automate many kind of like web tasks
there's a notion of like having a code
base and being able to like work within
that code base and do many useful things
but I think picking out those like if
you can pick out a really good
environment then we can like accelerate
a lot of agentic research in in that
environment and build really good
algorithms um and I feel like that's as
big a part of the challenge as as any
given breakthrough in like attention or
in Long context or reinforcement
learning how high do you guys think the
ceiling is for continuing on this like
test time compute Vector I mean
obviously youve you know I think uh very
publicly like IL has come out and said
there's this entirely you know New
Direction that's needed to really
Advance AI to the next level you know do
you agree with that yes I I I pretty
much agree because because uh llm
searches too cheap like you know like if
you um what like operations cost you
know under 10 to the negative $18 these
days so like you know if you can uh
infer relatively efficiently even on a
very large model you're you know you're
you're you're you're getting like over a
million tokens per dollar you know if
you know I guess you can just check the
prices on on
Gemini anybody else you know that that
yeah so you're getting millions of
tokens per dollar
that's way orders of magnitude below the
cost of most other things you can think
of like if if you think of like a really
cheap Pastime like go like buy a
paperback book and read it or something
you're paying like 10,000 toket per
dollar so we like a Coupes of orders of
magnitude cheaper than you know uh than
like reading a book and you know the you
know probably you know forers of magnet
cheaper than you know paying uh anybody
to do anything and you know whatever six
six8 orders magnitude cheaper than uh
than than paying a software engineer um
but uh so there's there's a huge huge um
uh margin of uh difference there um to
apply more compute and make the thing
smarter um and uh you know if if the
value is there which I think uh which I
think it is like um you know would you
pay you know uh you know 5 cents an hour
for the uh bad engineer or 10 cents an
hour for the good engineer you know like
it's um there's um so so there's a huge
amount of sort of um uh
unexploited uh flops in there
for to use if we can find ways to use
them and like one way to to use them
straightforwardly is okay just train a
bigger better model um we're already
doing that but um still like um model
training costs uh tend to go up uh
quadratically with uh with you know with
uh with the size of the models so
um so then uh you and you still end up
with um you know uh uh relatively uh
cheap inference if we you do it right so
then of course what what what what
everyone is is doing now is just apply
more compute as at inference time
through this uh Chain of Thought
thinking or uh or any other brilliant
algorithms we can come up with and um so
I think we're just going to start seeing
a uh a scaling curve uh there as well as
we're seeing um you know in a lot of
places right but do that scale us like
all the way to the AGI future people
Envision or is there some you know some
kind of completely adjacent thing uh
that's required to to you know as where
does an asmt yeah I guess that that
that's where we'll uh see whether you
know whether it's the uh the the the
humans that invent the next breakthrough
or the AI but I'm not um I've given up
on like organizing my garage and stuff
like that because I just wait for the
robots yeah you think they're coming I
guess you guys had a big big robotics
release uh yesterday yeah that awesome
yeah is it uh is it ready to to clean
your garage though um not that I know of
but yeah I think I think like the
question of you know like how much more
is there to give in this test time
compute Paradigm is it all the way to
AGI I don't think it's all the way to
AGI I think we already kind of
established there like other components
being able to act in a complex
environment acting is very important
Research into acting agents that's like
a definite like investment um and
there's many other aspects but you know
are we seeing kind of test time compute
ASM toting I think uh like the the kind
of ranan example is is always in my mind
here where it's like I don't want test
time compute to to just for any given
problem like think longer and then
eventually arrive at solution but we
also want it to be able to think like
very deeply and actually like create
maybe useful knowledge that it's going
to like actually incorporate to then
solve further tasks in its thoughts and
and thus improve dramatically improved
data efficiency if you can just have
like one math textbook and you spend
most of your time really just thinking
and playing around with the ideas and
then you can become a world class
mathematician that would be the kind of
thing that I feel like you know should
be and we should strive to achieve with
like a very deep thinking model are we
there yet no like do we have a path
there I think there is like there are
many directions of gradient towards such
a model we've been already seeing and
this is one thing I think people don't
really like talk about too much with
test time compute but we're already
seeing like amazing Improvement in data
efficiency by training the model to like
think deeply with reinforcement learning
when it's solving the task so even if
we're you know going to have a a bunch
of like RL data we're not going to add
any more even just like going the test
time compute uh kind of Paradigm is
allow allowing us to kind of learn a lot
more from that data and I think we could
probably push that much more so there's
kind of many particular research angles
we're interested in and how this model
could think not just in one particular
you know spitting out a couple of
thousand of tokens uh in order to solve
the TX but maybe like think much more
deeply much more kind of like a
researcher might think about a hard and
open-ended problem I think so much of
what you guys have touched on is like
these models increasingly acting like
researchers what are like the early
signs of that that you would look for I
think like uh you know I'll give one
example which is like math so I think
right now math is being treated as a uh
it's a bit strange math is often used as
benchmarks and they kind of like exams
and maybe even like math competitions um
there's going to be like a veryimportant
pivot from the math benchmarks and uh to
being really starting to become about
like actually generating useful math
starting to solve actually important
problems that we really care about um I
think there are ways like I I think it
was very cool this like Frontier math uh
eval that was created which is trying to
provide like kind of a gradient towards
that so maybe like the harder category
basically almost like unpublished math
findings the easier category is supposed
to just be like kind of harder or
trickier you know I don't know whether
that particular eval is the perfect way
but having some kind of ramp of evals
that like Bridges from where we're at
now to like actually useful scientific
contributions this is something I'm like
especially excited in as you know
bringing in professors and and
researchers and saying okay use this
tool in a non-ironic way to actually
accelerate research in this area what
would you do what's missing what do we
need to advance and I think um yeah I
think like this kind of notion of like
incrementally harder benchmarks it might
sound like oh that's like what AI
researchers always say but I think
that's going to basically be my metric
of progress I mean math is a great
example here because I mean this is a
field that you don't actually need more
data like you know like people invented
all this math without without this
external input or or uh lot of times in
like a room just thinking yeah you just
go into a room and then there are
examples where okay there's there's some
data but you know okay Isaac Newton like
takes takes a bunch of astronomical
observations like goes into quarantine
and like you know in his house and then
like in physics or something so okay
that that's an example where there's
data but but nobody actually knows uh
you know uh knows physics and then you
generate physics and then math is even
uh crazier because you start with uh you
know with roughly no data and uh and
invent something useful so um so that
that that's um that kind of could
provide a counterproof to um to the
assertion that hey um this is just
learning to mimic people that that okay
the most we can do with AI is to
um is to relearn what everybody knows
and what we what we've uh the learning
to people critique I feel has been
around for a while like y said a lot
about this like do you feel like that's
entirely disproven at this point
basically the critique that like uh you
know novel Discovery and thinking is
kind of impossible on these C marel
architectures oh well there's definitely
one class of scientific discovery that I
think almost no one could argue against
which is like actually a lot of Sciences
like if you knew about these two
disjoint pieces of information and
thought about the intersection for a
while you would realize a new property
like in kind of material science it
might be like oh turns out just like
associating things way better means
actually you know way more about like
what may you know what new kind of
material may be photovoltaic but also
May uh you know Etc so there's like
interpolation already would actually
completely accelerate science and I
think I'm guessing then it's always
going to be this kind of whacka mole of
like what is considered actually
interpolating know ideas versus like
creating a completely novel idea and for
that one I'm going to go to
know yeah I uh you know okay I I don't
know but I guess I'd just throw it back
at Yan Lun to prove that he generated a
completely novel
idea we'll see I I I don't actually care
about arguing with arguing this stuff uh
let's you know let's just um you know
build AI you know uh greatly increase
the level of technology in the world um
help people that seems good enough for
me even if we go back to the math
example like the thing I'd love to see
is like you know maybe the state of
mathematics right now is kind of like
the the state of kind of geographic
exploration in the 15th century or
something so it's like there's some
known things there's some like fuzzy
area of like we don't really know what's
like Beyond these boundaries and we have
a bit of a guess and then you send a
small number of people to go off in a
boat it's very expensive to try and like
push the boundary of like what what is
known and what is not known and it they
come back with some pretty funny looking
Maps yeah they come back with like a
little bit of extra territory like U
explored and then they they tell people
and that's a little bit like what's
happening right now in mathematics you
have a very small number of like Elite
math professors that are able to really
ask the right questions and then
actually like prove useful qualities and
it kind of grows and it's grown that way
and that's that's how it's been going so
far right if you if we can train a model
that can actually like essentially ask
like it can pose the right questions of
what you could say is like the space of
all uh useful mathematics it's kind of
an infinite space you don't want this to
go and like off like a fractal on on
kind of uninteresting questions but if
if there was some notion of like all the
set of like interesting mathematical
questions if it can like keep posing new
on and then if it's very very strong and
it can solve those then at some point
you know maybe like right now we have a
a pretty strong kind of math uh model
and at some point maybe it will be
Professor level maybe one day it'll be
Terry to level and then you have like a
million of them and then now it's like
now I think then you could hope to maybe
complete the map and then what would
like completing the map look like that
could be one of the greatest
contributions to kind of uh science you
would now physics chemistry You' now
like be able to uh have like a very deep
deep understanding of like any useful
mathematics um I I think that would be
like a very very exciting thing whether
it's possible uh I'm not sure but I
would agree maybe the the the key Crooks
is like can it ask uh in like actually
novel questions the question posing
thing seems to be the hardest part in my
opinion the solving I feel very
confident we will get
there but maybe mathematics is infinite
yeah I mean it's definitely infinite so
then we we can we can do so much better
yeah yeah yeah I guess we just talk
about the culture AI research a little
bit like you know no you're obviously
part of the uh you know a leading part
of the original Transformer paper both
of you have been part of so many
breakthroughs you know people like to
write these thought pieces about you
know the culture that drove these uh
these Innovations curious like what you
think the the main takeaways you have
after you know having done this research
you know some periods of great success
some periods maybe of more frustration
like what lessons do you take away from
what actually you know works from a
cultural and team structure perspective
to drive this stuff forward maybe maybe
uh sort of AI research is kind of where
uh
chemistry was in the 15th century same
more Alchemy it's Alchemy like we don't
know why it works like but it's it's
it's highly experimental uh you know in
terms of you know you get some idea but
uh really the the proof is always in uh
in try it out and then you know and then
all then you you you you have various
observations you come to uh hypothesis
about why you know okay this thing works
you know why what's you know what's the
what's the key thing and you know uh so
sometimes you're right sometimes you're
wrong it usually takes uh takes more um
more experimentation to to find out or
you can just like you know uh just uh
claim that okay it works because of my
magic XYZ and you have your assistant
swallowing frogs or something and
the or or the or the equivalent you know
I I I find you know naturally uh
researchers just uh you know love to
share and get excited about um you know
about what they're doing and you know it
is always important to try to um you
know credit people uh you know liberally
because um you know it's it's uh you
know it's
often very very complicated to uh you
know okay what idea let led to what idea
what was the uh what was the the the key
insight and I think at some point we'll
uh we'll just kind of have to give up on
credit assignment temporarily and uh and
uh take a super intelligence-based
approach of we'll uh wait for the super
intelligence to sort it all
out get credit for once we get it super
intelligence will write some great
thought pieces on the culture that drove
uh drove these Transformations but it is
super fun working at Google with this
group of brilliant people and and
collaborating with everybody and one
thing I'm struck by by the Transformer
story I think is like you know it
involved I think you were like you heard
randomly that some people were working
on you know when when you tell the story
it sounds like wow that could have
easily not happened like I don't know if
you weren't walking down that one day
does it feel like inevitably someone 6
months later would have figured that out
or like how much just random happen
stances there uh that kind of leads us
toward a different path from just you
know people in this in this building
colliding in different ways oh
interesting yeah we would have all been
using lstms or something uh I mean I
guess
maybe it's like okay you asked the same
question about like okay if somebody
hadn't invented like an internal
combustion engine would we still be
using steam engines at this point I mean
I I think someone would have would have
come up with uh something uh like trans
from my vantage point like over in
London it did feel like you were
circling around the there was like
neural GPU there was like which was also
like okay the key thing is we're not
going to have an RNN anymore we're going
to paralyze but it's not just going to
be a confet we're going to have like a
notion of like the I can it like depth
is going to be some kind of function of
your sequence length that that was an
idea in new GPU it didn't quite work out
but it was like the key idea of like Get
rid of rnn's felt like that Vibe was
there yeah um yeah yeah right cuz they
were the all this work on conuts running
around everyone wanted to kill the kill
the lstm uh and yeah tension had been
floating around from uh uh from the
translation model so yeah it's kind of
needed to come
together oh I love that also kind of on
the um you know on the on the culture
side I think one of the interesting
things of how I understand Google works
is basically there's like this Bottoms
Up compute allocation right of of folks
you know getting to do different
projects and convincing other people to
kind of come and and allocate Compu for
that and obviously that's one model
there's other places that are like we're
going to go all in on like one thing and
they much more top down on the compete
side how do you think about like the
trade-offs between those two models and
yeah I mean we we've we've been through
uh both or I've been through both I
guess Google brain when I was uh at
Google previously was you know mostly
bottom up you know as you describe and
then deep mind has been mostly topped
down yeah it was a bit more top downit a
bit more top down and you know different
different philosophies and you know
there there there are pluses and minuses
to uh to both I think you know top down
can be good for getting people to uh to
collaborate um and getting like larger
uh larger training runs working but you
know bottom up I think is also great for
collaboration because then like okay you
bring someone out new on your project uh
it doesn't mean you have fewer resources
per person you have more total resources
so um you know so that that that's great
and and that there there are so many um
abstraction breaking ideas that there's
no great way to categorize them so if
you're saying okay this is the compute
for uh pre-training and this is the
compute for posttraining well okay
you've got something completely
different that doesn't fall fall into
those uh those things uh nicely and then
then then it's then it falls falls
between the crack so we're uh we are
bringing back uh a good measure of uh of
Bottoms Up uh because I think that's
super important feel bance St yeah like
I guess um like I I worked at open a
briefly and I did like the way that
concentrated bets were made and that
like obviously paid off like well for
them in certain areas and that was nice
I also like the way concentrated bets
were made it a deep mind especially kind
of like um yeah I do think there was a
bit of a vision sometimes for like uh
like say say alao I was asked to join
the alphago team right at the beginning
uh uh because they needed an engineer
and I was a research engineer and I I
just didn't get it I was like why would
anyone care about a board game I don't
get it uh so you do really need like
good research leads that have Vision to
drive these things you can't expect
everyone from a like that doesn't have
always the best like vantage point to
like know what's where the impact is
going to be um right now within like for
example thinking as a as like the the
area of research that we kind of work
with in it's like incredibly important
we have a reasonable non-trivial
investment in like just bottomup
research where we're really not
dictating in anything and then it's just
a fun process of it's kind of a fun meta
research process of like how can we make
these people maximally efficient how do
you make your baselines lightweight how
do you make these signal bearing how can
people move as fast as possible and
that's very important and then it's just
very important that it can't all be that
it needs to be and here is a mandate of
top down bets that we have to deliver on
but it's always humbling like I think we
have a very good like scope of like what
everything is happening and we get a
sense of like here are the areas that
are going to be really important and
usually you get humbled by the the
bottomup research of like a thing that
you weren't even thinking about ends up
being way more impactful than you
thought so just like always keeping that
running is super impactful yeah I mean
reflecting back on the past decade like
are there you know some of these like
inflection points or these like decision
points where it maybe like you know hard
decisions 5149 that like ended up being
kind of super impactful I mean I think
for both of us like going in on large
language models was good yeah I'd say
that was probably a good call and and it
kind of seems obvious now but it kind of
was at least I mean I found it was like
in a state of being not obvious to most
people but obvious I felt like to a
small number of collaborators that like
this was definitely going to be a big
thing um and kind of had to go against
the grain for a while yeah that's
true there was definitely a time people
were not excited about language models
like hard remember now feels hard to
remember I mean it always seemed like
the best problem on Earth to me but you
know uh that's uh kind of like the Deep
learning people at the time maybe
Thought Machine translation was still a
bit cooler or computer vision yeah
Vision was exciting for a while I don't
why was everyone in division I don't
know I guess like the imet thing there's
like a picture of a cat or something
like that oh that was that good one yeah
the cat did you work on the cat thing no
no no never never actually did much
in this and now you come full now you
come full circling these Gemini 2.0
models I see all these demos that are
like Vision based and you're you're kind
of doing cooler things in in in Vision
now uh than than certainly identifying
cats I mean it felt it felt like I think
even almost every early llm person World
model is on their mind really like I
actually don't feel like the early llm
researchers were even really from a
language oriented background they
weren't really like linguists it was
more I don't even feel like
understanding language was really part
of the motivation for that early group
it was like train unsupervise learning
at scale do language first cuz it's the
most like knowledge compressed but then
gobble everything up into like a big
generative model and like understand
everything and and uh it's very very
cool to see that just like continually
proving out yesterday we just like
launched native image generation it's
amazing it you know uh I think a lot of
image generation right now is just like
focused purely on getting like
absolutely maximally aesthetic images
but also having native image generation
allows you to like really do a lot more
with images understanding editing uh you
can have like in leave sequence of imags
and text and yeah once again it's just
train a generative model on lots of data
to to arrive at that you guys are
obviously both big Believers and like
these models be you know uh you know
becoming uh more and more general
purpose uh I guess a question some folks
are asking is like you think about
domains you talked about Healthcare
earlier Nome like what you know the the
model that ultimately is going to be
like our AI doctor like is that just
some you know is that just a
continuation what we're doing in some
giant model is there like a healthcare
specific version of the model that ends
up being released that is like you know
uh only has some set of data you know
inputed into it or they're just a bunch
of guard rails or like paint that
picture for me of what you ultimately
think like our you know AI doctor or AI
you know biology researcher yeah Pro I
don't really think you you know would
need very you know specific you know
task specific models for something that
high value cuz you know you probably pay
like $1 a token for talking to your
doctors so it's like the llm is like way
way way uh cheaper uh cheaper at uh at
this point then so mind the only reason
for TAS specific models is price um and
so if there's things that you wouldn't
pay a dollar a token uh yeah yeah like
you more targeted yeah something to
analyze you
know vast quantities of data for
marginal value then maybe you want
something you can test specific there's
always this notion of like there'll be
tons of like negative transfer out there
so like compartment things I don't
really feel like that has ended up being
the case if it could be measured then
that's a good reason to compartmental if
like models if if it's if there's no
negative transfer there's positive
transfer then just have one big model
that's like I think my personal
philosophy this is not like a thing
which people have like uniform agreement
over though it's like a continuous
active area of research like how much do
you want to like specialize and spin off
these expert models but yeah from the
way I see it it's just very simple if
there's positive transfer put it in the
same model as long as it doesn't then
become too expensive to to serve you
know obviously you guys have been at The
Cutting Edge for for a while what's one
thing you've like changed your mind on
in the past year I feel like timelines
have shifted forward and I don't mean
that like uh in a vague sense like I
think the rate of
progress is much faster right
now um I felt like a year go there's
obviously the field is advancing but
whenever you have a new paradigm shift
it creates this sudden acceleration and
also actually no this is a pretty good
one um one thing I've changed my mind on
like my mental model of like the
propagation of information and how like
how people like adopt a scientific
Advance has completely changed so when
the Transformer came out uh uh I think
so I was over in London at Deep Mind
people thought it was a cool paper we're
a bit suspicious uh um I I eventually
implemented it in our like code base
over over the like holiday break
actually uh but it was like 3 months
after the paper had come out I
eventually implemented it and tried it
for like language modeling but it was
like not really getting picked up I
eventually then collaborated with
someone that wanted to use it for
reinforcement learning but I'm going to
say really it was like from paper coming
out maybe it was like 6 to 9 months
before it was like you saw just
transformers dotted around all areas of
Deep Mind and that's within like we're
all within alphabet and it's like much
easier to propagate information um I
would say the the the speed at which the
field has like picked up this test time
compute Paradigm you have like many Labs
have already trained and released models
that like looking very good exploring
the space that was like very surprising
to me the fact that these things can
like if you make an announcement say and
like this is important the fact and it's
just like a blog post or something and
then you'll have people that are able to
like uh uh make breakthroughs in that
space and release models in order of
months that is like that was a wakeup
call for me there's a lot more compute
and a lot more smart people working in
AI I often think of things in a bit of a
rose tinted glasses way and think about
like 2016 or something and I think well
we were very smart then we were very
creative people are very very smart and
creative now and they have way more
compute and there's way more of them and
so if anything is going to like be very
impactful and then it can just like
spread and like this idea can like
spread all across uh all across the
world and people act on it which is kind
of crazy yeah that it is it is kind of
crazy and and just the amount of compute
out there that yeah not like now uh you
know whatever a kid in the garage has
more compute than was necessary to
invent the Transformer so like uh you
know um yeah I mean people always worry
about oh just how much compute do they
have but you know it is it is definitely
possible to
make make make breakthroughs um you know
with uh with way less confute than than
you would imagine yeah no anything
you've changed your mind on in the past
year um I mean i' I've been continuously
uh impressed with uh you know with the
the success of RL you know it's I I'd
never really worked with it much before
like oh that's that's actually pretty
good well I mean you kind of alluded to
this Jack like obviously you know I
think in the uh you know in the reaction
to Deep seek and all these kind of
models that kind of fast followed in the
in the test time compete space um going
forward like do you expect the open
source models to be able to keep up as
you know with with each subsequent
generation of these of these models it
obviously seemed to happen faster than a
lot of people would have expected yeah
that's actually something I'm changing
my mind on I do feel like the open
source like the the ability for open
source models to stay very close and
competitive with the frontier is
persisting I actually feel thought uh we
were getting a maybe a false sense of
assurance that it's it's happening
because maybe um maybe it felt like it
was converging but then but then these
things can pull away again but actually
that seems to been very impressive I I'm
really really impressed that the like
performance of Gemma 3 that just got
released yesterday it's amazing it's
completely incredible the team did a
really good job um and other open source
models they've yeah so uh the you know
deep seek V3 was a very good model when
they released it
um yeah so it seems like people are very
like passionate in the open source space
and they're very creative and smart and
they have computes so I don't really see
why they wouldn't be able to continually
innovate um yeah what what do you think
now yeah I mean it seem to seems like
the time gap between uh between closed
source and open source is you know has
been shrinking um you know I I think
that the uh technology will cons
continue to uh
uh accelerate so I mean could be that
the quality gap will will be large and
the time Gap will be very uh will be
very small um but uh yeah we'll have to
see how it plays out it's it's super
exciting to see to to to see all of
these companies getting getting great
results switching into some of the the
kind of broader implications on all the
a progress we've been talking about for
society I'm curious obviously both of
you in the last year have you know been
you've been impressed with the with the
power of RL you've been surprised by the
kind of pace at which we've scaled you
know a lot of this test time compute um
have you like changed anything in your
own lives like based on kind of this you
know probably more clarity or or or
belief that you know a lot of this AI
driven future is coming like you both
have kids I know like any anything that
that you've adapted uh it sounds like no
I know you don't clean your garage but I
guess that was a prior to this year
actually do that
very does make for a good
podcast you thought about not cleaning
it yeah I don't worry too much about
global warming you know it's have ai to
take care of the carbon stuff soon
enough anyway you know yeah but I guess
you know any anything that you've uh
that you've thought about differently in
your own lives or you know uh and how
you think about the life your kids will
have ai and education is like uh I don't
think people are really talking about it
enough yet my son like under supervision
but he likes to talk to Gemini it's
actually insane like how powerful it is
especially if he can like he goes out to
The Garden he like takes pictures of
plants takes pictures of lizards and you
and like
he now has this like very accurate
personalized encyclopedia which can give
him information and do they adapt to
that like I wasn't sure how that would
work they do my four-year-old son walks
around like talking very detailed about
like the plants he'll use the Latin name
he like they they absorb so much stuff
they're sponge I feel like I'm seeing
kind of like a type of education that I
don't really think has ever existed for
like Humanity happening like Ai and
education is going to be incredible um
he went he went to school and he was
like um he was like oh yeah I called a
lizard to his teacher and his teacher
was like oh that's cool um and he's like
oh that looks like a big lizard he's
like no it's not a big lizard it's a
western fence
lizard like very particular about his
type varieties of lizards and he's like
and I also saw a Blu tailed skink she's
like what's that he's like it's an
amphibious lizard that you know so he
starts like reeling these things out and
uh you know that's just like it it's
just obvious when you see it but
children are very curious they're like
sponges for information and if if you
can like combine that productively with
AI I think that's going to be really
incredible I do feel like the next
generation will just seem like smarter
people that's what I'm feeling hopeful
about now anything you've changed
um yeah it's uh it is extremely hard to
predict what the uh what the future will
uh you know will be like we will you
know we'll all do our best to to uh to
make sure uh you know it's uh that you
know AI will be uh safe and uh and uh
and beneficial um but you know it does
um you know it does make you think that
hey you know what what I do now really
you know really really really matters
you know like okay we we we don't know
if uh if uh you know human labor labor
will be you know like uh materially
necessary uh in the future but uh that
that just means hey you know like uh you
know it makes uh makes more difference
uh if you want if you want to make uh if
you want to do something uh uh that
matters materially go uh go do it now
and then other than that like uh you
know just uh try to be a good person you
know whatever you uh find uh find uh
spiritually uh meaningful uh you know go
do it because that's that maybe that's
uh uh you know the the purpose of
humanity uh uh in the future uh you know
may not be about um about providing for
physical needs so uh um you know we
we've got to you
know um figure out where we you know
where we find meaning in the future but
we'll have have uh plenty of time for
well especially your mom's already
having the Deep philosophical
conversations with the models maybe
we'll we'll be able to reason our way to
to that but I am struck me I feel like
some other people have come on this
podcast um you know Bob mcre chief
resarch officer was like look you know
the humanity is always going to have a
role in in asking the questions the
models will go off and do things but I
think to our conversation earlier it's
like I think the big question is we we
you know uh you know people always be
the best people to ask questions or or
or you know will the will the models
actually ask better questions over time
and obviously has a ton of implications
I feel like every generation thinks
they're living through the most
important moment in history but it does
you know I guess you know you're biased
when you're when you're going through it
but it does feel like we are we are
certainly in that yeah and I think
sometimes like a technological
advancement kind of scares people and it
is people have right to feel trepidation
at this stage as well but there is like
uh I mean this isn't such a good example
but like even like the introduction of
the like television people are like oh
is this going to make us all just like
lose our attention span are we going to
like completely lose the will to like go
out and walk outside and like have
friends and things that was like a thing
people freaked out about I think uh
obviously now we that was like
unnecessary there was like it was a
small piece of technology that was
entertaining maybe even has provided
like a net positive to society I think
it like kind of went okay right now it's
like there's like it's it's like that
but kind of on steroids there's like
very strong signs of how this helps us
there's like very good reason to be
concerned about how this could not help
us so it's kind of like you you have
like very demonstrable reasons that have
already proven out of how it's helping
us you have very like concrete Arguments
for how this could not go well and I
think that makes it a very interesting
time too in some ways you know we have
less uh kind of um meaning than than in
the past in the sense that like uh you
know the in the distance past everyone
was like at the break of starvation and
you're like okay I have meaning in my
work because I have to go work hard
today and get some money so my family
doesn't starve tomorrow and like today
we're living in America nobody nobody's
family is is starving tomorrow
if you don't work hard so that you know
okay that's that's less meaning that
than we used to have and we've found
other uh you know other uh other sources
of meaning um so you know future more of
that in the future as uh as hopefully uh
you know AI improves our uh our physical
uh situation how worried are you both
about AGI risks I would say moderately
yeah
um it it it is hard it's often difficult
to find examples of creating something
which becomes far more intelligent than
its creator but then like still uh like
acts in predictable and useful ways for
its creater and I think that that like
class of argument is is concerning
there's also just like more practical
like uh kind of AI and Society
implications that we've kind of touched
upon but like making sure that AI is
like constructive to the economy and the
like the and people can kind offload
their lifestyle and we don't have like
sharp changes in in like the employment
landscape and things that's super
important both of those are often on my
minds and then there's just very
pragmatic things when we're always
putting more capable models out we're
very excited as technologists to develop
and ship things but we also I think we
have a pretty good balance of like of
then
internally having another group that's
going to be thinking about this much
more holistic basically and like how can
we make this launch safe what are some
unintended
consequences um which I think has been
pretty good and uh and is like super
important so I'm I'm glad that
happens yeah I mean we I I I hey I I
agree we need to yeah we I mean I'm not
afraid but I uh but we we definitely
need to be working on on all the safety
aspects of all of that um and there are
examples of we you know we we create
something that that uh becomes smarter
and uh and more powerful and that you
know we have kids they're going and
they're smarter than us
and detailed encyclopedias of
amphibians yeah then they become
teenagers
and and then you sol the alignment
problem yeah yeah but so I mean
hopefully if we uh respect our parents
and uh you know uh treat treat them well
the AI will uh uh will
learn have to have a lot of tokens on
the internet of people being really
respectful toward their their parents
exactly yeah we have to stop pushing the
robots over respect for creators yeah
exactly that that feels dangerous um you
know one thing I did want to ask you
about gnome is obviously you know uh in
your uh previous in between Google stint
you'd spend a lot of time kind of
building out character and thought a lot
about kind of that product space of what
like you know uh add Companions and and
the ability for folks to to chat with
all sorts of different kinds of people
what do you you know what do you feel
about where that space is today what
kind of like problems do you feel like
we still need to solve there yeah I mean
it's interesting because I like the the
reason I main reason I left Google to
start character was I thought that the
biggest thing that um that that the llm
industry could use was an application
where like anybody can go and um and uh
and interact with LMS and use them and
you know discover use cases that that
were that uh that were good for them um
and you know cuz this was before you
know be before CH GPT launched before uh
uh before Gemini launched um so um I
mean mission accomplished everyone's out
there talking to llms now um but um uh
which which is kind of different from
how uh how things are now I guess the
other thing was um you know going into
character we were not really focused on
hey this is going going to be uh an
entertainment product or something or
something else we we kind of just went
in with an open mind and we're like okay
um you know we're going to just put this
out there in a general way we're going
to help um help people conceptualize
this as you know okay this thing can um
you know take on different personas
meaning it's a you know it's a very very
general technology see what you can what
you can make it do um and definitely we
found a lot of people are using it for
entertainment I think partially because
like at that point in the technology
like okay like nobody's figured out how
to make this thing not hallucinate so
people are going to use it for uh you
know for applications where
Hallucination is actually a feature
U um you know you know such as uh such
as entertainment and you know that
that's uh you know I think I think
that's uh worked pretty well I know a
lot of people like uh like using
character um what wait what do you think
I guess like what do you think the
future of that obviously like there's
this early Behavior a lot of people
interacting what does it look like like
5 10 years from now that's a good
question I I I I uh I I do not know I
mean I do think you know people are
going to I I think people will always
want relationships with with humans
because it it um you know it's uh
spiritually more more meaningful but um
but you know I I think people will you
know like having um um AIS that are kind
of more in in human form for you know
for uh you know for for things uh they
want I mean like imagine you know you
just got get elected president and you
get your your piece in the cabinet to
like advise you like wherever uh you
know wherever you go I mean that could
cabin yeah you get your own personal AI
cabinet you get your whole uh a good AI
summarization of all the secrets
yeah yeah that or like or like maybe
you're um you know you're CEO of your
own uh AI uh AI company so you get a lot
more productivity I guess that's maybe
in a lot of those cases it's uh less
about the um personality and more about
the productivity but I think to the
degree that people um people like
interacting with something that feels
human probably uh
uh probably we'll see a lot of um kind
of AI that that that feels feels more
human in various ways and progress in
that space is just about the models
getting better or is there like a whole
other set of just like you know human
computer interaction uh you know product
questions that need to be solved it's a
good question I mean I think the models
getting better
is uh is is is pretty big and but then
yeah then uh yeah part of it is uh you
know uh for for whoever is um you know
running the application to you know to
decide okay what what are we going to um
you know what what are we going to let
people do um I think I think users are
probably pretty good at we'll be pretty
good at specifying okay what what what
what do I want for uh uh for an
interface and it'll be mostly about uh
okay uh uh do we want to let them
specify that totally well look both of
you uh you know fascinating conversation
we always like to end with just a a
quick fire um to get your you know take
on some overly broad questions that we
cram into the end uh sounds good okay so
maybe to start uh you know what do you
feel like's overhyped in the AI world
today and what's underhyped I personally
feel like the ark AGI eval is overhyped
well that's very spicy yeah I think uh I
don't I think
actually maybe the progress has been
quite slow because a lot of researchers
just don't feel particularly inspired to
do like kind of these very specific
types of puzzles we did a lot of that in
like 2015 2016 and then we kind of felt
like yeah you can if you if you know the
puzzle domain you spend a lot of time
like fixing all the like actual
bottlenecks like maybe acting on these
large grids is like a little bit finicky
and stuff and and then and then you make
a lot of progress but but then you don't
necessarily continue on like building
something that's really AGI and useful
so I feel like I personally had a
transition from going to from like all
these like synthetic tasks like uh to
then going to just like just model
natural language and I felt like that
was a dragging in in that direction was
like a much more AGI thing in in the
long run anything else comes mind for
you know overhyped or underhyped I don't
know I think I think AGI is underhyped
AI LM are just still still massively
massively underhyped I think people are
you know I don't know people are still
still like thinking about it like it's
only like you know that it's going to be
about some some silly trillion dollar
products you
know yeah I heard you said another pod
that a trillion wasn't cool anymore
quadrillion
was yeah exactly you got to put the the
doctor evil up um obviously you know it
would be a gross misallocation of
societal resources to take you away from
building models to like build
applications but I am curious like if
you were to if we were to say today go
build an application like what what do
you think are the most interesting
you've talked about education before but
any other that come to mind that you
think would be fun to to go build you
know apps on top of these models well I
I do think um I think it's actually very
cool how many like apps that have been
trying to like break into this agentic
space and people then uh like expose it
and say oh this is just a rapper around
a known model but like there seems to be
a lot of value in like actually having
the right uh like app experience if you
want to actually have a model do
something useful for you like act and do
something useful for you so I feel like
yeah I feel like uh okay and in the
agentic space I think code is very
crowded now but I do think there's a lot
of other things that I would find it
useful for a model to automate for me
that goes beyond maybe like a chat
experience but it's actually going to be
going out and doing useful
things yeah I I mean I'd say code is
underhyped
I think it's huge it's I think it's huge
because uh you know for one humans
aren't even that good at it um things
like code and math were not super
designed for uh for that and then it's
one of these things that will help self
accelerate you know if build an
automated software engineer researcher
and then it'll build the next uh the
next better AI so uh yeah the
combination of um you know of
engineering and agentic you know
something that can control surfaces
broad uh broad enough to to do the job
of uh of an engineer I think if I were
to focus on you know on applications
that that is what I would be focused on
uh how different will the infrastructure
needs look for test time compute models
um you know versus these massive
pre-training you mean like in terms like
Hardware yeah terms of Hardware
requirements you distributed data
centers all that yeah I mean it's pretty
Rosy story I would say right so if if it
turns out that building AI becomes like
mostly an inference problem an inference
problem that can be much more
distributed than maybe like
large batch training that happens in
pre-training uh I think that uh means
that we can be much more flexible with
with our compute but yeah it's going to
mean we maybe don't uh we don't mind the
model training across data centers as
much uh maybe it can be uh spreading out
actors uh that are going to off go off
and get experience and and things and
and send that experience back from like
like like many many different data
centers because they don't all need to
be like have very strong fast
interconnects um so that is also going
to drive price down as well cuz then we
might start to like really optimize
towards uh such a setup which is
intrinsically cheaper the the cool thing
is that Google we we kind of have like
this codesign link with the TPU team so
we're always like feeding them like our
like the profile of like how we're
spending our compute which allows them
to tweak the chip design and the data
center design you know within a couple
of years time frame which I think is
really really like kind of motivating
the uh the fact that you can be
distributed as Jack said gets better I
mean the thing that gets worse about
inference than training is um is that
you lose uh a lot of the parallelism in
the Transformer that you know that just
naively using Transformer you end up um
you you end up memory bound uh like
looking at your um uh attention keys in
values uh you know for every token that
you're generating so um you know so
there's like uh a lot of great work to
do in um both attacking this from a
model architecture perspective and from
a hardware perspective frankly so uh you
know to uh get ourselves um uh closer to
the point where we're you know where
we're taking like the massive
computational uh Power of uh of the sh
we have and uh and making ourselves able
to like fully apply that to uh to to
inference I want to leave the last word
to you where can they go learn more
about you and and what you guys are
doing yeah well um you know we have a
new and updated uh flash model that
applies thinking that's considerably
stronger than uh the last Model we
released in January it's out on the
Gemini app I would definitely encourage
people to try that out and give us
feedback we have been incorporating
develop a feedback use a feedback into
each model series so like I think that's
a a good like that would be one thing I
would encourage people to
do what Jack said yeah well thank you
both so much seriously it's uh such a
pleasure to be able to do that one all
this with you real pleasure yeah thanks
hey guys this is Jacob just one more
thing before you take off if you enjoyed
that conversation please consider
leaving a festar rating on the show
doing so helps the podcast reach more
listeners and helps us bring on the best
guess this has been an episode of
unsupervised learning an AI podcast by
redo Ventures where we probe the
sharpest Minds in AI about what's real
today what's going to be real in the
future and what it means for businesses
in the world with the fast moving pace
of AI we aim to help you deconstruct and
understand the most important
breakthroughs and see a clearer picture
of reality thank you for listening and
see you next episode
[Music]
Loading video analysis...