Noam Shazeer and Jack Rae: Scaling Test-time Compute, Reactions to Ilya & AGI

By Unsupervised Learning: Redpoint's AI Podcast

Summary

## Key takeaways - **Test-time compute improves creative tasks**: Integrating test-time compute into models like Gemini has surprisingly improved performance on creative tasks, not just reasoning tasks like math and code. The models showed an ability to generate and revise essays, demonstrating a more engaging and thoughtful output. [02:31] - **Evals must adapt to avoid saturation**: Standard AI evaluation benchmarks quickly become saturated as models improve. Researchers must constantly develop new, more challenging evals, often keeping them private, to accurately measure progress beyond simple memorization or pattern matching. [04:44] - **AI can accelerate scientific discovery**: AI's potential extends beyond mimicking existing knowledge; it can drive novel scientific discovery. By identifying and solving complex, previously unasked mathematical or scientific questions, AI could exponentially advance human understanding. [26:04] - **Open source models are rapidly closing the gap**: The pace at which open-source AI models are catching up to and competing with proprietary models is impressive. Innovations like Gemma 3 and DeepSeek V3 demonstrate that open-source communities, with their creativity and compute resources, can continually innovate. [47:43] - **AI is revolutionizing education**: AI is creating a new paradigm for education, offering personalized, on-demand encyclopedias. Children can now access highly specific information, like the Latin names of plants or detailed lizard classifications, fostering a deeper and more accurate understanding of the world. [50:07] - **Test-time compute won't reach AGI alone**: While test-time compute is powerful, it's not sufficient for achieving Artificial General Intelligence (AGI). Significant advancements are still needed in areas like acting within complex environments and developing robust agentic capabilities. [23:30]

Topics Covered

Why AI benchmarks are increasingly irrelevant.
AI writing its next generation is the real milestone.
AI research is still more alchemy than science.
AI will enable unprecedented education for the next generation.
AGI and LLMs are still massively underhyped.

Full Transcript

are there a set of Milestones that are

like meaningful to you I'd say when

Gemini 3.0 writes Gemini

4.0 I was kind of showing this to my M

like a couple of days ago Ms are the

ultimate test of whether like something

has broken the barrier from like the

Twitter sphere to like the real world

the mum Vibe check is a big deal what do

you feel like overhyped in the AI world

today and what's underhyped I have a few

but I think they're all too spicy

there's no such thing as too spicy on my

podcast it's always going to be this

kind of whacka moole of like what is

considered actually interpolating known

ideas versus like creating a completely

novel idea and for that one I'm going to

go to

Nome um gome shazir and Jack Ray really

need no introduction the two are at the

Forefront of Google's Gemini llm efforts

and I've been involved in some of the

most important discoveries in AI in the

last decade know him as one of the

co-inventors of the Transformer and

mixture of experts Jack key part of many

Deep Mind breakthroughs it's a real

privilege of the job to get to sit with

these two and ask them literally every

top of- mind question in AI today we

talked about how far test time compute

will get us and the spaces where it will

and won't work we talked about how the

infrastructure needs will be different

for test time compute versus the large

kind of pre-training Paradigm we also

hit on the impressive Pace at which open

source models have caught up with closed

Source peers and their reactions to deep

seek we talked about their kind of

reception and reaction to Ilia saying

that this test time compute Paradigm

won't get us all the way to AGI as well

as Yan laon saying this current

generation of models can't actually have

any novel thoughts we talked about what

it actually looks like to do Cutting

Edge AI research today and what their

day-to-days look like as well as the

Future model Milestones that actually

matter to them and then we also got

gnome's Reflections on character as well

as both of their responses to what you

know AGI means for the role of humanity

I think folks are going to love this it

was really just a pleasure to get to

speak with both Noom and Jack without

further Ado here they

are well noan Jack thanks so much for uh

for coming on the podcast oh thank you

are we sitting in the the very office

that the Transformer was invented no

this is a new building we were in uh 196

Charleston I think no probably about

half a mile Half Mile right so it's in

the air pretty close pretty close yeah

well many things to dive into today I

mean obviously want to start with you

know some of the latest Gemini 2.0

models um and obviously all the work

you've been doing around test time

compute and uh Gemini 2.0 flash thinking

I guess just at the highest level for

our listeners how do you characterize

where these models work today where they

don't work as well and as you were kind

of experimenting with them what

surprised you most about uh about those

results one surprising thing is kind of

when we started the the the particular

concerted effort to build a lot of uh

kind of uh research and into test time

compute into Gemini and then think about

shipping it is that we were really kind

of focused on uh starting out with

reasoning tasks so math and code were

like big areas of focus and it wasn't

really clear uh you know whilst we're

kind of sprinting in that kind of domain

we obviously want to broaden it uh

naturally over time but it wasn't really

clear like how that would work would

there be any kind of sense of

generalization would thinking be useful

Beyond those reasoning tasks if we're

just concentrating on those as

researchers and I think it was pretty

fun to see uh one of the early kind of

models that we felt like was uh uh had

been trained uh to try and match the

style of Gemini flash so it had been

trained with thinking but then it had

also undergone some kind of training to

to actually be just like generally a

nice style a nice like uh model to talk

with that it was actually like very fun

seeing thinking uh interact and improve

like creative tasks as well you could

ask uh the document to like compose kind

of an essay on a particular topic and

actually like a the thought content was

like very fun to read uh and and it

would go through like various different

ideas and then it would go through like

revisions of the idea or things that it

should cut and that was kind of fun and

then also the output like felt really

nice so that was one thing that kind of

surprised me any surpris for you know

well yeah I mean in general like I'm all

for like generality like let's train

something that that's great at

everything it is it is important I you

know and uh I I was skeptical at first

of like okay this uh intense focus on on

things like math uh but it is very

important to have uh have good

benchmarks that you know are going to uh

encourage you to um uh you know to to be

able to reason about uh difficult uh

difficult tasks uh you know because a

lot of things will drop perplexity um

like add more parameters to the model

and and memorize more so uh so so it's

nice to have the The evals that can uh

uh distinguish better you know some of

the you know more difficult problems I

mean what EV vals are even meaningful to

you at this point I mean obviously I

feel like you people are trying to Hill

Climb the same the same set of of evals

that feel increasingly less relevant to

to day-to-day work what do you guys like

do when you're testing these models or

uh you know how do you Vibe check them

it feels like we keep landing on an eval

we like oh actually we overlooked this

eval even if it's in math it's like okay

we've done a bunch of like math evals

but maybe like I don't know put them

answers only aim they're still

considered challenging and then it's

like done okay they're completely

saturated and we really like don't care

and they're small and why did and we

almost think why did we ever even work

on them and it's easy to like forget

that was so easy what we thought six

months ago would a couple of months ago

it was like considered really hard maybe

too hard for the model and then itly

like snaps to being trivial so right now

I do feel like uh there's always been

like a lot of concerted effort within um

within Deep Mind and within like Google

as a whole to develop useful evals but

um it's kind of very nice to see also

this is like a shared responsibility

across like many different AI labs and

even like scale his like thoughts is

really stepping up and developing like

calling Humanity's last exam was a

dangerous you know if every six months

we think it wasn't too hard it's a

dangerous title the really really last

examp exactly the 73 last examp it's

it's very very challenging in general

because you know EV vals get leaked you

know like once people start talking

about the evals then there's all this

text out there about the evals and um

and you know they're they're no good at

anymore because everyone knows the

problems and you know all the models

will uh know the problems unless you're

very very careful so there's you know I

think there's still a lot of work that

goes on into uh into having evals that

are are um you know that are private are

there a set of Milestones that are like

meaningful to you it's like hey when you

know when Gemini 3.0 can do X like

that's that's a really exciting you know

Milestone whether it's an eal or just

something that you've tried with these

models and they they can't quite do yet

I'd say when Gemini 3.0 writes Gemini

4.0 or I should say Gemini X writes

Gemini X plus one yeah I think I think

these reinforcement Loops are uh are

probably the most important thing to uh

to pay attention to and you know there

there are several reinforcement Loops

going on you know what uh the one I just

mentioned is probably the most important

one that we can actually use the AI

we're building as uh as a tool to make

ourselves more productive at uh at

building AI but then you know then there

are other reinfor enforcement loops

around you know data flywheels like you

know you you get a um you you have

people use use these models and um and

provide feedback and uh and make them

better at the things that uh that people

care about I think we'll see uh huge

acceleration from that and then there's

just the the uh the global excitement

and uh and funding flywheel which is I

think also uh seems to be kicking up in

the last few years yes certainly so um

you to your point I guess of having you

kind of uh armed with like a thousand AI

Engineers by your side uh making you

even more productive like where are we

in you know do you have the equivalent

of a a point. one of an FTE today using

some of these Gemini models alongside

your research yeah I think one benefit

we have at Google is like uh we work in

a very like structured monor repo so uh

and we have a lot of like amazing

tooling around like contributing to the

code base already so there's like lots

of angles where AI is kind of uh like

being pulled in as tooling for our own

development like uh I I think Jeff

quoted the statistic although I don't

know but like the number of what we

could call like pull requests that right

yeah like useful like a just like AI

like bug fixes or uh code reviews

attached that's already like that just

like gets pulled in one day I notice it

there I'm like oh that's cool I I can

now uh like do a lot of like apply fixes

that it already like spots but you know

that's just like one element where we're

already like pulling in AI towards our

own like coding development I think

we're incredibly excited about um like

agentic coding is just defin definely

very important and trying to uh get the

model to be able to um tackle more

open-ended and difficult tasks is is

definitely something we're very excited

about um and and I I think it's just

like in some ways it's very it's a lot

easier to orchestrate when we have like

a very defined way of like uh uh

defining libraries like build we have

these like build Rules and Things and

it's just like everything gels together

in terms of the whole code base very

well so it's I I can imagine it's going

to be like as progress kind of continues

it's going to be a very discreet moment

where suddenly lots of libraries can be

like very quickly iterated on uh within

our code base yeah and you get you get

you got your AI Engineers proposing

experiments to you uh uh what's uh

what's good to try it seems like these

models work super well and easily

verifiable domains and coding and and

math and you have obvious been among

them like how do you think about how for

some of these less easily verifiable

domains these models may end up scaling

and being useful I mean they're getting

better at the that at that stuff too but

um yeah it's uh it's definitely uh

definitely harder um I guess the in

those domains uh you know we're going to

uh need either better ways of verifying

or or more um human Fe feedback loops

yeah I think what's good is that we're

seeing is like even with the Gemini

model series uh they're able to follow

much more uh abstract instructions so uh

being able to try and provide a reward

signal over a qualitative piece of work

which maybe uh if a human were going to

try and give a feedback signal there

would be a quite a broad set of like

rubrics or a grading criteria or maybe

there is even more like a even kind of

simplistic sense of like what is good

style what is like interesting and and

so I I think part of the problem is

really training models to like take on

these like very broad criteria for what

uh how to how to like take in a very

broad set of Criterion and then apply a

reward signal once we have the reward

signal we can train with reinforcement

learning against it and uh and I think

yeah we're already seeing that kind of

like makes makes sense it's not like an

abstract thing anymore it seemed like a

very abstract thing maybe a year ago or

two years ago I'm curious a year or two

ago like did you expect that to work or

do you think this like whole path of of

research was really more toward these

like more easily verifiable domains I

feel like you expect it to work one day

and then it feels like there's a very

there's a very complicated stack of

things that we feel like we need to

solve to get there and then usually the

case it's like oh it turns out there was

a a much simpler

path that's how I feel about it anyway

yeah usually there are good surprises

you know more good surprises than bad

surprises You' released these models out

in the wild um what are some of your

favorite kind of like ways you've seen

people using them um and what would you

like to see more more people like trying

and building with these things actually

there's just like an update today

happening in the Gemini app where we're

putting in a much stronger model and

it's being integrated with basically the

full Suite of like Gemini app tools so

it's like all of the apps that Gemini

app supports uh from things like you

know Maps uh also like search

integration and now has like very long

context all these things it's kind of

like a fully featured Gemini thinking

release in the app and it is actually

you know it's a very enjoyable

experience that's what I think so far

people even for their day-to-day stuff

people

uh it wasn't clear to me whether they

would like to have to pay a bit more

latency to like get the model to think

about stuff before responding it does

seem to be the case that if people are

going to like pull their phone out of

their pocket and then type something in

that actually what we thought was a very

long time maybe a couple of seconds is

like a very small price to pay for

something that they feel like is a

better quality answer and also maybe

they can like sometimes look at the

thoughts and inspect them um like I I

was kind of showing this to my M like a

couple of days ago uh are the ultimate

test of whether like something has

broken the barrier from like the Twitter

sphere to like the real world yeah like

the the vibe the M Vibe check is a big

is a big deal yeah yeah and yeah I guess

she asked a lot of like what I would

consider very generic questions to ask

uh a model like what is the meaning of

life ohing yeah she went for what is the

meaning of life and she really like sat

and read it for a long time and then she

read the thoughts as well and she and

then she kind of like contemplated on it

and she felt like she I don't know she

seemed to kind of very much appreciate

the the presence of like thoughts as to

like how to even go around such like a

an open-ended question so more folks

building philosophical conversations uh

with these uh with these models well I

mean one thing I think is super

impressed about them is the multimodal

capabilities but it still feels like

those are very underexplored uh from an

application perspective not sure totally

why that is but yeah I think right now

uh in some ways we we're quite modest

about the multimodal capabilities of

Gemini I feel like the model has always

been incredibly strong at image input uh

image input plus thinking is like

actually like remarkably good I would

say I say I see a lot of people kind of

like red teaming the model on things

like onx and uh trying like difficult or

challenging like images visual reasoning

problems I think that is working pretty

well there is also like at some point I

some of those are kind of toy vales but

but pulling in multimodal with then

agentic tasks uh is super interesting as

well so like we launched Mariner uh last

December which was like an agent which

like uh uses a browser and things that

has a lot of multimodal like aspects all

built in it was super important that

they could get the model to be

incredibly strong at like like not only

kind of scanning a screen but like

really understanding it and knowing how

to act on many many different types of

websites so like pairing that kind of

capability agentic quite open-ended

maybe you really need visual

understanding of a potentially messy

scenes with with thinking as yeah

something that I think I'm feeling very

excited

about yeah that's fun I mean it made

sense to start with text you know

picture is worth a thousand words but

it's a million pixels so like the text

is still like you know a thousand times

more um more information dense uh you

know just uh to do math on cliches but

um uh and and then there's also like a

lot more uh just in the main training

data for text you know we have so many

examples of text which actually kind of

represents you know the way humans uh

receive and produce information uh um

you know through language but uh you

know we we have a lot fewer examples

where for something like say image

generation because you don't have

examples of people generating images so

so things uh things are a little more

challenging but we uh you know it's as

Jack said uh you know great stuff uh

great stuff happening yeah I mean you

mentioned product Mariner I'm curious

like you know I think you both talked

before about you know to to get really

you know agents more widely used you

kind of need to solve both complexity of

of of of reasoning and also reliability

how would you kind of characterize where

we are today uh in terms of applying you

know Gemini models toward these problems

and like what is the actual path to

getting better uh at a bunch of these

things I mean there are a lot of answers

I mean one is just make the model

smarter that that that will uh that that

will always help um and you know know

most likely the you know um you need

very general General Solutions uh for um

you know for for these uh uh control

problems you know just just like you

need General solutions for the

intelligence uh uh problems because you

know people are going to use these

things in so so many unforeseen ways you

know like you cannot you can't can't

anticipate it the users

are smarter than uh than the developers

in uh in figuring out what uh what the

use cases are you know that no one no

one envisioned like when they invented

the Internet like what the internet was

going to be for no no one

envisioned uh when they invented

computers what computers are going to be

for and then um you know uh uh AI is you

know is getting so so General these days

that it's um it's even more more true

we're just building uh building a

product with with billions and billions

of use cases that are unanticipated so

uh so we will need to build the uh

General solutions for uh for all of it

how far away does it feel like we are

from that kind of like next level of

complexity and reliability I don't want

to give a precise uh like is it 6 months

is it 18 months is it 24 months but um

I think a lot of really the a lot of

really the time actually is not so much

dependent on the in my opinion is like

not really about like the core

algorithmic AI development there's part

part of it is just like there's a lot of

uh even like engineering challenges to

really changing the whole way you train

to being in more complex agentic

environments that that has like some

like almost constant time but

non-trivial cost to like switch how we

do research

um so I feel like agentic

research it seems like a lot of The

Upfront challenges like it's it's no

longer going to be very simple prompts

and responses we're now going to act in

an environment so how you define those

environments that that kind of angle

though is at the very least been

something that deep mine has been bought

in on even since I joined in 2014 it's

always been about uh there's part of

building AGI which is like figuring out

a really good Agent and there's part

which is figuring out a really good in

general environment um I I don't think

we like anyone has solved the perfect

environment yet and uh you know there's

there's some obvious ones there's a

notion of like using like a web UI and

and and like kind of being able to

automate many kind of like web tasks

there's a notion of like having a code

base and being able to like work within

that code base and do many useful things

but I think picking out those like if

you can pick out a really good

environment then we can like accelerate

a lot of agentic research in in that

environment and build really good

algorithms um and I feel like that's as

big a part of the challenge as as any

given breakthrough in like attention or

in Long context or reinforcement

learning how high do you guys think the

ceiling is for continuing on this like

test time compute Vector I mean

obviously youve you know I think uh very

publicly like IL has come out and said

there's this entirely you know New

Direction that's needed to really

Advance AI to the next level you know do

you agree with that yes I I I pretty

much agree because because uh llm

searches too cheap like you know like if

you um what like operations cost you

know under 10 to the negative $18 these

days so like you know if you can uh

infer relatively efficiently even on a

very large model you're you know you're

you're you're you're getting like over a

million tokens per dollar you know if

you know I guess you can just check the

prices on on

Gemini anybody else you know that that

yeah so you're getting millions of

tokens per dollar

that's way orders of magnitude below the

cost of most other things you can think

of like if if you think of like a really

cheap Pastime like go like buy a

paperback book and read it or something

you're paying like 10,000 toket per

dollar so we like a Coupes of orders of

magnitude cheaper than you know uh than

like reading a book and you know the you

know probably you know forers of magnet

cheaper than you know paying uh anybody

to do anything and you know whatever six

six8 orders magnitude cheaper than uh

than than paying a software engineer um

but uh so there's there's a huge huge um

uh margin of uh difference there um to

apply more compute and make the thing

smarter um and uh you know if if the

value is there which I think uh which I

think it is like um you know would you

pay you know uh you know 5 cents an hour

for the uh bad engineer or 10 cents an

hour for the good engineer you know like

it's um there's um so so there's a huge

amount of sort of um uh

unexploited uh flops in there

for to use if we can find ways to use

them and like one way to to use them

straightforwardly is okay just train a

bigger better model um we're already

doing that but um still like um model

training costs uh tend to go up uh

quadratically with uh with you know with

uh with the size of the models so

um so then uh you and you still end up

with um you know uh uh relatively uh

cheap inference if we you do it right so

then of course what what what what

everyone is is doing now is just apply

more compute as at inference time

through this uh Chain of Thought

thinking or uh or any other brilliant

algorithms we can come up with and um so

I think we're just going to start seeing

a uh a scaling curve uh there as well as

we're seeing um you know in a lot of

places right but do that scale us like

all the way to the AGI future people

Envision or is there some you know some

kind of completely adjacent thing uh

that's required to to you know as where

does an asmt yeah I guess that that

that's where we'll uh see whether you

know whether it's the uh the the the

humans that invent the next breakthrough

or the AI but I'm not um I've given up

on like organizing my garage and stuff

like that because I just wait for the

robots yeah you think they're coming I

guess you guys had a big big robotics

release uh yesterday yeah that awesome

yeah is it uh is it ready to to clean

your garage though um not that I know of

but yeah I think I think like the

question of you know like how much more

is there to give in this test time

compute Paradigm is it all the way to

AGI I don't think it's all the way to

AGI I think we already kind of

established there like other components

being able to act in a complex

environment acting is very important

Research into acting agents that's like

a definite like investment um and

there's many other aspects but you know

are we seeing kind of test time compute

ASM toting I think uh like the the kind

of ranan example is is always in my mind

here where it's like I don't want test

time compute to to just for any given

problem like think longer and then

eventually arrive at solution but we

also want it to be able to think like

very deeply and actually like create

maybe useful knowledge that it's going

to like actually incorporate to then

solve further tasks in its thoughts and

and thus improve dramatically improved

data efficiency if you can just have

like one math textbook and you spend

most of your time really just thinking

and playing around with the ideas and

then you can become a world class

mathematician that would be the kind of

thing that I feel like you know should

be and we should strive to achieve with

like a very deep thinking model are we

there yet no like do we have a path

there I think there is like there are

many directions of gradient towards such

a model we've been already seeing and

this is one thing I think people don't

really like talk about too much with

test time compute but we're already

seeing like amazing Improvement in data

efficiency by training the model to like

think deeply with reinforcement learning

when it's solving the task so even if

we're you know going to have a a bunch

of like RL data we're not going to add

any more even just like going the test

time compute uh kind of Paradigm is

allow allowing us to kind of learn a lot

more from that data and I think we could

probably push that much more so there's

kind of many particular research angles

we're interested in and how this model

could think not just in one particular

you know spitting out a couple of

thousand of tokens uh in order to solve

the TX but maybe like think much more

deeply much more kind of like a

researcher might think about a hard and

open-ended problem I think so much of

what you guys have touched on is like

these models increasingly acting like

researchers what are like the early

signs of that that you would look for I

think like uh you know I'll give one

example which is like math so I think

right now math is being treated as a uh

it's a bit strange math is often used as

benchmarks and they kind of like exams

and maybe even like math competitions um

there's going to be like a veryimportant

pivot from the math benchmarks and uh to

being really starting to become about

like actually generating useful math

starting to solve actually important

problems that we really care about um I

think there are ways like I I think it

was very cool this like Frontier math uh

eval that was created which is trying to

provide like kind of a gradient towards

that so maybe like the harder category

basically almost like unpublished math

findings the easier category is supposed

to just be like kind of harder or

trickier you know I don't know whether

that particular eval is the perfect way

but having some kind of ramp of evals

that like Bridges from where we're at

now to like actually useful scientific

contributions this is something I'm like

especially excited in as you know

bringing in professors and and

researchers and saying okay use this

tool in a non-ironic way to actually

accelerate research in this area what

would you do what's missing what do we

need to advance and I think um yeah I

think like this kind of notion of like

incrementally harder benchmarks it might

sound like oh that's like what AI

researchers always say but I think

that's going to basically be my metric

of progress I mean math is a great

example here because I mean this is a

field that you don't actually need more

data like you know like people invented

all this math without without this

external input or or uh lot of times in

like a room just thinking yeah you just

go into a room and then there are

examples where okay there's there's some

data but you know okay Isaac Newton like

takes takes a bunch of astronomical

observations like goes into quarantine

and like you know in his house and then

like in physics or something so okay

that that's an example where there's

data but but nobody actually knows uh

you know uh knows physics and then you

generate physics and then math is even

uh crazier because you start with uh you

know with roughly no data and uh and

invent something useful so um so that

that that's um that kind of could

provide a counterproof to um to the

assertion that hey um this is just

learning to mimic people that that okay

the most we can do with AI is to

um is to relearn what everybody knows

and what we what we've uh the learning

to people critique I feel has been

around for a while like y said a lot

about this like do you feel like that's

entirely disproven at this point

basically the critique that like uh you

know novel Discovery and thinking is

kind of impossible on these C marel

architectures oh well there's definitely

one class of scientific discovery that I

think almost no one could argue against

which is like actually a lot of Sciences

like if you knew about these two

disjoint pieces of information and

thought about the intersection for a

while you would realize a new property

like in kind of material science it

might be like oh turns out just like

associating things way better means

actually you know way more about like

what may you know what new kind of

material may be photovoltaic but also

May uh you know Etc so there's like

interpolation already would actually

completely accelerate science and I

think I'm guessing then it's always

going to be this kind of whacka mole of

like what is considered actually

interpolating know ideas versus like

creating a completely novel idea and for

that one I'm going to go to

know yeah I uh you know okay I I don't

know but I guess I'd just throw it back

at Yan Lun to prove that he generated a

completely novel

idea we'll see I I I don't actually care

about arguing with arguing this stuff uh

let's you know let's just um you know

build AI you know uh greatly increase

the level of technology in the world um

help people that seems good enough for

me even if we go back to the math

example like the thing I'd love to see

is like you know maybe the state of

mathematics right now is kind of like

the the state of kind of geographic

exploration in the 15th century or

something so it's like there's some

known things there's some like fuzzy

area of like we don't really know what's

like Beyond these boundaries and we have

a bit of a guess and then you send a

small number of people to go off in a

boat it's very expensive to try and like

push the boundary of like what what is

known and what is not known and it they

come back with some pretty funny looking

Maps yeah they come back with like a

little bit of extra territory like U

explored and then they they tell people

and that's a little bit like what's

happening right now in mathematics you

have a very small number of like Elite

math professors that are able to really

ask the right questions and then

actually like prove useful qualities and

it kind of grows and it's grown that way

and that's that's how it's been going so

far right if you if we can train a model

that can actually like essentially ask

like it can pose the right questions of

what you could say is like the space of

all uh useful mathematics it's kind of

an infinite space you don't want this to

go and like off like a fractal on on

kind of uninteresting questions but if

if there was some notion of like all the

set of like interesting mathematical

questions if it can like keep posing new

on and then if it's very very strong and

it can solve those then at some point

you know maybe like right now we have a

a pretty strong kind of math uh model

and at some point maybe it will be

Professor level maybe one day it'll be

Terry to level and then you have like a

million of them and then now it's like

now I think then you could hope to maybe

complete the map and then what would

like completing the map look like that

could be one of the greatest

contributions to kind of uh science you

would now physics chemistry You' now

like be able to uh have like a very deep

deep understanding of like any useful

mathematics um I I think that would be

like a very very exciting thing whether

it's possible uh I'm not sure but I

would agree maybe the the the key Crooks

is like can it ask uh in like actually

novel questions the question posing

thing seems to be the hardest part in my

opinion the solving I feel very

confident we will get

there but maybe mathematics is infinite

yeah I mean it's definitely infinite so

then we we can we can do so much better

yeah yeah yeah I guess we just talk

about the culture AI research a little

bit like you know no you're obviously

part of the uh you know a leading part

of the original Transformer paper both

of you have been part of so many

breakthroughs you know people like to

write these thought pieces about you

know the culture that drove these uh

these Innovations curious like what you

think the the main takeaways you have

after you know having done this research

you know some periods of great success

some periods maybe of more frustration

like what lessons do you take away from

what actually you know works from a

cultural and team structure perspective

to drive this stuff forward maybe maybe

uh sort of AI research is kind of where

uh

chemistry was in the 15th century same

more Alchemy it's Alchemy like we don't

know why it works like but it's it's

it's highly experimental uh you know in

terms of you know you get some idea but

uh really the the proof is always in uh

in try it out and then you know and then

all then you you you you have various

observations you come to uh hypothesis

about why you know okay this thing works

you know why what's you know what's the

what's the key thing and you know uh so

sometimes you're right sometimes you're

wrong it usually takes uh takes more um

more experimentation to to find out or

you can just like you know uh just uh

claim that okay it works because of my

magic XYZ and you have your assistant

swallowing frogs or something and

the or or the or the equivalent you know

I I I find you know naturally uh

researchers just uh you know love to

share and get excited about um you know

about what they're doing and you know it

is always important to try to um you

know credit people uh you know liberally

because um you know it's it's uh you

know it's

often very very complicated to uh you

know okay what idea let led to what idea

what was the uh what was the the the key

insight and I think at some point we'll

uh we'll just kind of have to give up on

credit assignment temporarily and uh and

uh take a super intelligence-based

approach of we'll uh wait for the super

intelligence to sort it all

out get credit for once we get it super

intelligence will write some great

thought pieces on the culture that drove

uh drove these Transformations but it is

super fun working at Google with this

group of brilliant people and and

collaborating with everybody and one

thing I'm struck by by the Transformer

story I think is like you know it

involved I think you were like you heard

randomly that some people were working

on you know when when you tell the story

it sounds like wow that could have

easily not happened like I don't know if

you weren't walking down that one day

does it feel like inevitably someone 6

months later would have figured that out

or like how much just random happen

stances there uh that kind of leads us

toward a different path from just you

know people in this in this building

colliding in different ways oh

interesting yeah we would have all been

using lstms or something uh I mean I

guess

maybe it's like okay you asked the same

question about like okay if somebody

hadn't invented like an internal

combustion engine would we still be

using steam engines at this point I mean

I I think someone would have would have

come up with uh something uh like trans

from my vantage point like over in

London it did feel like you were

circling around the there was like

neural GPU there was like which was also

like okay the key thing is we're not

going to have an RNN anymore we're going

to paralyze but it's not just going to

be a confet we're going to have like a

notion of like the I can it like depth

is going to be some kind of function of

your sequence length that that was an

idea in new GPU it didn't quite work out

but it was like the key idea of like Get

rid of rnn's felt like that Vibe was

there yeah um yeah yeah right cuz they

were the all this work on conuts running

around everyone wanted to kill the kill

the lstm uh and yeah tension had been

floating around from uh uh from the

translation model so yeah it's kind of

needed to come

together oh I love that also kind of on

the um you know on the on the culture

side I think one of the interesting

things of how I understand Google works

is basically there's like this Bottoms

Up compute allocation right of of folks

you know getting to do different

projects and convincing other people to

kind of come and and allocate Compu for

that and obviously that's one model

there's other places that are like we're

going to go all in on like one thing and

they much more top down on the compete

side how do you think about like the

trade-offs between those two models and

yeah I mean we we've we've been through

uh both or I've been through both I

guess Google brain when I was uh at

Google previously was you know mostly

bottom up you know as you describe and

then deep mind has been mostly topped

down yeah it was a bit more top downit a

bit more top down and you know different

different philosophies and you know

there there there are pluses and minuses

to uh to both I think you know top down

can be good for getting people to uh to

collaborate um and getting like larger

uh larger training runs working but you

know bottom up I think is also great for

collaboration because then like okay you

bring someone out new on your project uh

it doesn't mean you have fewer resources

per person you have more total resources

so um you know so that that that's great

and and that there there are so many um

abstraction breaking ideas that there's

no great way to categorize them so if

you're saying okay this is the compute

for uh pre-training and this is the

compute for posttraining well okay

you've got something completely

different that doesn't fall fall into

those uh those things uh nicely and then

then then it's then it falls falls

between the crack so we're uh we are

bringing back uh a good measure of uh of

Bottoms Up uh because I think that's

super important feel bance St yeah like

I guess um like I I worked at open a

briefly and I did like the way that

concentrated bets were made and that

like obviously paid off like well for

them in certain areas and that was nice

I also like the way concentrated bets

were made it a deep mind especially kind

of like um yeah I do think there was a

bit of a vision sometimes for like uh

like say say alao I was asked to join

the alphago team right at the beginning

uh uh because they needed an engineer

and I was a research engineer and I I

just didn't get it I was like why would

anyone care about a board game I don't

get it uh so you do really need like

good research leads that have Vision to

drive these things you can't expect

everyone from a like that doesn't have

always the best like vantage point to

like know what's where the impact is

going to be um right now within like for

example thinking as a as like the the

area of research that we kind of work

with in it's like incredibly important

we have a reasonable non-trivial

investment in like just bottomup

research where we're really not

dictating in anything and then it's just

a fun process of it's kind of a fun meta

research process of like how can we make

these people maximally efficient how do

you make your baselines lightweight how

do you make these signal bearing how can

people move as fast as possible and

that's very important and then it's just

very important that it can't all be that

it needs to be and here is a mandate of

top down bets that we have to deliver on

but it's always humbling like I think we

have a very good like scope of like what

everything is happening and we get a

sense of like here are the areas that

are going to be really important and

usually you get humbled by the the

bottomup research of like a thing that

you weren't even thinking about ends up

being way more impactful than you

thought so just like always keeping that

running is super impactful yeah I mean

reflecting back on the past decade like

are there you know some of these like

inflection points or these like decision

points where it maybe like you know hard

decisions 5149 that like ended up being

kind of super impactful I mean I think

for both of us like going in on large

language models was good yeah I'd say

that was probably a good call and and it

kind of seems obvious now but it kind of

was at least I mean I found it was like

in a state of being not obvious to most

people but obvious I felt like to a

small number of collaborators that like

this was definitely going to be a big

thing um and kind of had to go against

the grain for a while yeah that's

true there was definitely a time people

were not excited about language models

like hard remember now feels hard to

remember I mean it always seemed like

the best problem on Earth to me but you

know uh that's uh kind of like the Deep

learning people at the time maybe

Thought Machine translation was still a

bit cooler or computer vision yeah

Vision was exciting for a while I don't

why was everyone in division I don't

know I guess like the imet thing there's

like a picture of a cat or something

like that oh that was that good one yeah

the cat did you work on the cat thing no

no no never never actually did much

in this and now you come full now you

come full circling these Gemini 2.0

models I see all these demos that are

like Vision based and you're you're kind

of doing cooler things in in in Vision

now uh than than certainly identifying

cats I mean it felt it felt like I think

even almost every early llm person World

model is on their mind really like I

actually don't feel like the early llm

researchers were even really from a

language oriented background they

weren't really like linguists it was

more I don't even feel like

understanding language was really part

of the motivation for that early group

it was like train unsupervise learning

at scale do language first cuz it's the

most like knowledge compressed but then

gobble everything up into like a big

generative model and like understand

everything and and uh it's very very

cool to see that just like continually

proving out yesterday we just like

launched native image generation it's

amazing it you know uh I think a lot of

image generation right now is just like

focused purely on getting like

absolutely maximally aesthetic images

but also having native image generation

allows you to like really do a lot more

with images understanding editing uh you

can have like in leave sequence of imags

and text and yeah once again it's just

train a generative model on lots of data

to to arrive at that you guys are

obviously both big Believers and like

these models be you know uh you know

becoming uh more and more general

purpose uh I guess a question some folks

are asking is like you think about

domains you talked about Healthcare

earlier Nome like what you know the the

model that ultimately is going to be

like our AI doctor like is that just

some you know is that just a

continuation what we're doing in some

giant model is there like a healthcare

specific version of the model that ends

up being released that is like you know

uh only has some set of data you know

inputed into it or they're just a bunch

of guard rails or like paint that

picture for me of what you ultimately

think like our you know AI doctor or AI

you know biology researcher yeah Pro I

don't really think you you know would

need very you know specific you know

task specific models for something that

high value cuz you know you probably pay

like $1 a token for talking to your

doctors so it's like the llm is like way

way way uh cheaper uh cheaper at uh at

this point then so mind the only reason

for TAS specific models is price um and

so if there's things that you wouldn't

pay a dollar a token uh yeah yeah like

you more targeted yeah something to

analyze you

know vast quantities of data for

marginal value then maybe you want

something you can test specific there's

always this notion of like there'll be

tons of like negative transfer out there

so like compartment things I don't

really feel like that has ended up being

the case if it could be measured then

that's a good reason to compartmental if

like models if if it's if there's no

negative transfer there's positive

transfer then just have one big model

that's like I think my personal

philosophy this is not like a thing

which people have like uniform agreement

over though it's like a continuous

active area of research like how much do

you want to like specialize and spin off

these expert models but yeah from the

way I see it it's just very simple if

there's positive transfer put it in the

same model as long as it doesn't then

become too expensive to to serve you

know obviously you guys have been at The

Cutting Edge for for a while what's one

thing you've like changed your mind on

in the past year I feel like timelines

have shifted forward and I don't mean

that like uh in a vague sense like I

think the rate of

progress is much faster right

now um I felt like a year go there's

obviously the field is advancing but

whenever you have a new paradigm shift

it creates this sudden acceleration and

also actually no this is a pretty good

one um one thing I've changed my mind on

like my mental model of like the

propagation of information and how like

how people like adopt a scientific

Advance has completely changed so when

the Transformer came out uh uh I think

so I was over in London at Deep Mind

people thought it was a cool paper we're

a bit suspicious uh um I I eventually

implemented it in our like code base

over over the like holiday break

actually uh but it was like 3 months

after the paper had come out I

eventually implemented it and tried it

for like language modeling but it was

like not really getting picked up I

eventually then collaborated with

someone that wanted to use it for

reinforcement learning but I'm going to

say really it was like from paper coming

out maybe it was like 6 to 9 months

before it was like you saw just

transformers dotted around all areas of

Deep Mind and that's within like we're

all within alphabet and it's like much

easier to propagate information um I

would say the the the speed at which the

field has like picked up this test time

compute Paradigm you have like many Labs

have already trained and released models

that like looking very good exploring

the space that was like very surprising

to me the fact that these things can

like if you make an announcement say and

like this is important the fact and it's

just like a blog post or something and

then you'll have people that are able to

like uh uh make breakthroughs in that

space and release models in order of

months that is like that was a wakeup

call for me there's a lot more compute

and a lot more smart people working in

AI I often think of things in a bit of a

rose tinted glasses way and think about

like 2016 or something and I think well

we were very smart then we were very

creative people are very very smart and

creative now and they have way more

compute and there's way more of them and

so if anything is going to like be very

impactful and then it can just like

spread and like this idea can like

spread all across uh all across the

world and people act on it which is kind

of crazy yeah that it is it is kind of

crazy and and just the amount of compute

out there that yeah not like now uh you

know whatever a kid in the garage has

more compute than was necessary to

invent the Transformer so like uh you

know um yeah I mean people always worry

about oh just how much compute do they

have but you know it is it is definitely

possible to

make make make breakthroughs um you know

with uh with way less confute than than

you would imagine yeah no anything

you've changed your mind on in the past

year um I mean i' I've been continuously

uh impressed with uh you know with the

the success of RL you know it's I I'd

never really worked with it much before

like oh that's that's actually pretty

good well I mean you kind of alluded to

this Jack like obviously you know I

think in the uh you know in the reaction

to Deep seek and all these kind of

models that kind of fast followed in the

in the test time compete space um going

forward like do you expect the open

source models to be able to keep up as

you know with with each subsequent

generation of these of these models it

obviously seemed to happen faster than a

lot of people would have expected yeah

that's actually something I'm changing

my mind on I do feel like the open

source like the the ability for open

source models to stay very close and

competitive with the frontier is

persisting I actually feel thought uh we

were getting a maybe a false sense of

assurance that it's it's happening

because maybe um maybe it felt like it

was converging but then but then these

things can pull away again but actually

that seems to been very impressive I I'm

really really impressed that the like

performance of Gemma 3 that just got

released yesterday it's amazing it's

completely incredible the team did a

really good job um and other open source

models they've yeah so uh the you know

deep seek V3 was a very good model when

they released it

um yeah so it seems like people are very

like passionate in the open source space

and they're very creative and smart and

they have computes so I don't really see

why they wouldn't be able to continually

innovate um yeah what what do you think

now yeah I mean it seem to seems like

the time gap between uh between closed

source and open source is you know has

been shrinking um you know I I think

that the uh technology will cons

continue to uh

uh accelerate so I mean could be that

the quality gap will will be large and

the time Gap will be very uh will be

very small um but uh yeah we'll have to

see how it plays out it's it's super

exciting to see to to to see all of

these companies getting getting great

results switching into some of the the

kind of broader implications on all the

a progress we've been talking about for

society I'm curious obviously both of

you in the last year have you know been

you've been impressed with the with the

power of RL you've been surprised by the

kind of pace at which we've scaled you

know a lot of this test time compute um

have you like changed anything in your

own lives like based on kind of this you

know probably more clarity or or or

belief that you know a lot of this AI

driven future is coming like you both

have kids I know like any anything that

that you've adapted uh it sounds like no

I know you don't clean your garage but I

guess that was a prior to this year

actually do that

very does make for a good

podcast you thought about not cleaning

it yeah I don't worry too much about

global warming you know it's have ai to

take care of the carbon stuff soon

enough anyway you know yeah but I guess

you know any anything that you've uh

that you've thought about differently in

your own lives or you know uh and how

you think about the life your kids will

have ai and education is like uh I don't

think people are really talking about it

enough yet my son like under supervision

but he likes to talk to Gemini it's

actually insane like how powerful it is

especially if he can like he goes out to

The Garden he like takes pictures of

plants takes pictures of lizards and you

and like

he now has this like very accurate

personalized encyclopedia which can give

him information and do they adapt to

that like I wasn't sure how that would

work they do my four-year-old son walks

around like talking very detailed about

like the plants he'll use the Latin name

he like they they absorb so much stuff

they're sponge I feel like I'm seeing

kind of like a type of education that I

don't really think has ever existed for

like Humanity happening like Ai and

education is going to be incredible um

he went he went to school and he was

like um he was like oh yeah I called a

lizard to his teacher and his teacher

was like oh that's cool um and he's like

oh that looks like a big lizard he's

like no it's not a big lizard it's a

western fence

lizard like very particular about his

type varieties of lizards and he's like

and I also saw a Blu tailed skink she's

like what's that he's like it's an

amphibious lizard that you know so he

starts like reeling these things out and

uh you know that's just like it it's

just obvious when you see it but

children are very curious they're like

sponges for information and if if you

can like combine that productively with

AI I think that's going to be really

incredible I do feel like the next

generation will just seem like smarter

people that's what I'm feeling hopeful

about now anything you've changed

um yeah it's uh it is extremely hard to

predict what the uh what the future will

uh you know will be like we will you

know we'll all do our best to to uh to

make sure uh you know it's uh that you

know AI will be uh safe and uh and uh

and beneficial um but you know it does

um you know it does make you think that

hey you know what what I do now really

you know really really really matters

you know like okay we we we don't know

if uh if uh you know human labor labor

will be you know like uh materially

necessary uh in the future but uh that

that just means hey you know like uh you

know it makes uh makes more difference

uh if you want if you want to make uh if

you want to do something uh uh that

matters materially go uh go do it now

and then other than that like uh you

know just uh try to be a good person you

know whatever you uh find uh find uh

spiritually uh meaningful uh you know go

do it because that's that maybe that's

uh uh you know the the purpose of

humanity uh uh in the future uh you know

may not be about um about providing for

physical needs so uh um you know we

we've got to you

know um figure out where we you know

where we find meaning in the future but

we'll have have uh plenty of time for

well especially your mom's already

having the Deep philosophical

conversations with the models maybe

we'll we'll be able to reason our way to

to that but I am struck me I feel like

some other people have come on this

podcast um you know Bob mcre chief

resarch officer was like look you know

the humanity is always going to have a

role in in asking the questions the

models will go off and do things but I

think to our conversation earlier it's

like I think the big question is we we

you know uh you know people always be

the best people to ask questions or or

or you know will the will the models

actually ask better questions over time

and obviously has a ton of implications

I feel like every generation thinks

they're living through the most

important moment in history but it does

you know I guess you know you're biased

when you're when you're going through it

but it does feel like we are we are

certainly in that yeah and I think

sometimes like a technological

advancement kind of scares people and it

is people have right to feel trepidation

at this stage as well but there is like

uh I mean this isn't such a good example

but like even like the introduction of

the like television people are like oh

is this going to make us all just like

lose our attention span are we going to

like completely lose the will to like go

out and walk outside and like have

friends and things that was like a thing

people freaked out about I think uh

obviously now we that was like

unnecessary there was like it was a

small piece of technology that was

entertaining maybe even has provided

like a net positive to society I think

it like kind of went okay right now it's

like there's like it's it's like that

but kind of on steroids there's like

very strong signs of how this helps us

there's like very good reason to be

concerned about how this could not help

us so it's kind of like you you have

like very demonstrable reasons that have

already proven out of how it's helping

us you have very like concrete Arguments

for how this could not go well and I

think that makes it a very interesting

time too in some ways you know we have

less uh kind of um meaning than than in

the past in the sense that like uh you

know the in the distance past everyone

was like at the break of starvation and

you're like okay I have meaning in my

work because I have to go work hard

today and get some money so my family

doesn't starve tomorrow and like today

we're living in America nobody nobody's

family is is starving tomorrow

if you don't work hard so that you know

okay that's that's less meaning that

than we used to have and we've found

other uh you know other uh other sources

of meaning um so you know future more of

that in the future as uh as hopefully uh

you know AI improves our uh our physical

uh situation how worried are you both

about AGI risks I would say moderately

yeah

um it it it is hard it's often difficult

to find examples of creating something

which becomes far more intelligent than

its creator but then like still uh like

acts in predictable and useful ways for

its creater and I think that that like

class of argument is is concerning

there's also just like more practical

like uh kind of AI and Society

implications that we've kind of touched

upon but like making sure that AI is

like constructive to the economy and the

like the and people can kind offload

their lifestyle and we don't have like

sharp changes in in like the employment

landscape and things that's super

important both of those are often on my

minds and then there's just very

pragmatic things when we're always

putting more capable models out we're

very excited as technologists to develop

and ship things but we also I think we

have a pretty good balance of like of

then

internally having another group that's

going to be thinking about this much

more holistic basically and like how can

we make this launch safe what are some

unintended

consequences um which I think has been

pretty good and uh and is like super

important so I'm I'm glad that

happens yeah I mean we I I I hey I I

agree we need to yeah we I mean I'm not

afraid but I uh but we we definitely

need to be working on on all the safety

aspects of all of that um and there are

examples of we you know we we create

something that that uh becomes smarter

and uh and more powerful and that you

know we have kids they're going and

they're smarter than us

and detailed encyclopedias of

amphibians yeah then they become

teenagers

and and then you sol the alignment

problem yeah yeah but so I mean

hopefully if we uh respect our parents

and uh you know uh treat treat them well

the AI will uh uh will

learn have to have a lot of tokens on

the internet of people being really

respectful toward their their parents

exactly yeah we have to stop pushing the

robots over respect for creators yeah

exactly that that feels dangerous um you

know one thing I did want to ask you

about gnome is obviously you know uh in

your uh previous in between Google stint

you'd spend a lot of time kind of

building out character and thought a lot

about kind of that product space of what

like you know uh add Companions and and

the ability for folks to to chat with

all sorts of different kinds of people

what do you you know what do you feel

about where that space is today what

kind of like problems do you feel like

we still need to solve there yeah I mean

it's interesting because I like the the

reason I main reason I left Google to

start character was I thought that the

biggest thing that um that that the llm

industry could use was an application

where like anybody can go and um and uh

and interact with LMS and use them and

you know discover use cases that that

were that uh that were good for them um

and you know cuz this was before you

know be before CH GPT launched before uh

uh before Gemini launched um so um I

mean mission accomplished everyone's out

there talking to llms now um but um uh

which which is kind of different from

how uh how things are now I guess the

other thing was um you know going into

character we were not really focused on

hey this is going going to be uh an

entertainment product or something or

something else we we kind of just went

in with an open mind and we're like okay

um you know we're going to just put this

out there in a general way we're going

to help um help people conceptualize

this as you know okay this thing can um

you know take on different personas

meaning it's a you know it's a very very

general technology see what you can what

you can make it do um and definitely we

found a lot of people are using it for

entertainment I think partially because

like at that point in the technology

like okay like nobody's figured out how

to make this thing not hallucinate so

people are going to use it for uh you

know for applications where

Hallucination is actually a feature

U um you know you know such as uh such

as entertainment and you know that

that's uh you know I think I think

that's uh worked pretty well I know a

lot of people like uh like using

character um what wait what do you think

I guess like what do you think the

future of that obviously like there's

this early Behavior a lot of people

interacting what does it look like like

5 10 years from now that's a good

question I I I I uh I I do not know I

mean I do think you know people are

going to I I think people will always

want relationships with with humans

because it it um you know it's uh

spiritually more more meaningful but um

but you know I I think people will you

know like having um um AIS that are kind

of more in in human form for you know

for uh you know for for things uh they

want I mean like imagine you know you

just got get elected president and you

get your your piece in the cabinet to

like advise you like wherever uh you

know wherever you go I mean that could

cabin yeah you get your own personal AI

cabinet you get your whole uh a good AI

summarization of all the secrets

yeah yeah that or like or like maybe

you're um you know you're CEO of your

own uh AI uh AI company so you get a lot

more productivity I guess that's maybe

in a lot of those cases it's uh less

about the um personality and more about

the productivity but I think to the

degree that people um people like

interacting with something that feels

human probably uh

uh probably we'll see a lot of um kind

of AI that that that feels feels more

human in various ways and progress in

that space is just about the models

getting better or is there like a whole

other set of just like you know human

computer interaction uh you know product

questions that need to be solved it's a

good question I mean I think the models

getting better

is uh is is is pretty big and but then

yeah then uh yeah part of it is uh you

know uh for for whoever is um you know

running the application to you know to

decide okay what what are we going to um

you know what what are we going to let

people do um I think I think users are

probably pretty good at we'll be pretty

good at specifying okay what what what

what do I want for uh uh for an

interface and it'll be mostly about uh

okay uh uh do we want to let them

specify that totally well look both of

you uh you know fascinating conversation

we always like to end with just a a

quick fire um to get your you know take

on some overly broad questions that we

cram into the end uh sounds good okay so

maybe to start uh you know what do you

feel like's overhyped in the AI world

today and what's underhyped I personally

feel like the ark AGI eval is overhyped

well that's very spicy yeah I think uh I

don't I think

actually maybe the progress has been

quite slow because a lot of researchers

just don't feel particularly inspired to

do like kind of these very specific

types of puzzles we did a lot of that in

like 2015 2016 and then we kind of felt

like yeah you can if you if you know the

puzzle domain you spend a lot of time

like fixing all the like actual

bottlenecks like maybe acting on these

large grids is like a little bit finicky

and stuff and and then and then you make

a lot of progress but but then you don't

necessarily continue on like building

something that's really AGI and useful

so I feel like I personally had a

transition from going to from like all

these like synthetic tasks like uh to

then going to just like just model

natural language and I felt like that

was a dragging in in that direction was

like a much more AGI thing in in the

long run anything else comes mind for

you know overhyped or underhyped I don't

know I think I think AGI is underhyped

AI LM are just still still massively

massively underhyped I think people are

you know I don't know people are still

still like thinking about it like it's

only like you know that it's going to be

about some some silly trillion dollar

products you

know yeah I heard you said another pod

that a trillion wasn't cool anymore

quadrillion

was yeah exactly you got to put the the

doctor evil up um obviously you know it

would be a gross misallocation of

societal resources to take you away from

building models to like build

applications but I am curious like if

you were to if we were to say today go

build an application like what what do

you think are the most interesting

you've talked about education before but

any other that come to mind that you

think would be fun to to go build you

know apps on top of these models well I

I do think um I think it's actually very

cool how many like apps that have been

trying to like break into this agentic

space and people then uh like expose it

and say oh this is just a rapper around

a known model but like there seems to be

a lot of value in like actually having

the right uh like app experience if you

want to actually have a model do

something useful for you like act and do

something useful for you so I feel like

yeah I feel like uh okay and in the

agentic space I think code is very

crowded now but I do think there's a lot

of other things that I would find it

useful for a model to automate for me

that goes beyond maybe like a chat

experience but it's actually going to be

going out and doing useful

things yeah I I mean I'd say code is

underhyped

I think it's huge it's I think it's huge

because uh you know for one humans

aren't even that good at it um things

like code and math were not super

designed for uh for that and then it's

one of these things that will help self

accelerate you know if build an

automated software engineer researcher

and then it'll build the next uh the

next better AI so uh yeah the

combination of um you know of

engineering and agentic you know

something that can control surfaces

broad uh broad enough to to do the job

of uh of an engineer I think if I were

to focus on you know on applications

that that is what I would be focused on

uh how different will the infrastructure

needs look for test time compute models

um you know versus these massive

pre-training you mean like in terms like

Hardware yeah terms of Hardware

requirements you distributed data

centers all that yeah I mean it's pretty

Rosy story I would say right so if if it

turns out that building AI becomes like

mostly an inference problem an inference

problem that can be much more

distributed than maybe like

large batch training that happens in

pre-training uh I think that uh means

that we can be much more flexible with

with our compute but yeah it's going to

mean we maybe don't uh we don't mind the

model training across data centers as

much uh maybe it can be uh spreading out

actors uh that are going to off go off

and get experience and and things and

and send that experience back from like

like like many many different data

centers because they don't all need to

be like have very strong fast

interconnects um so that is also going

to drive price down as well cuz then we

might start to like really optimize

towards uh such a setup which is

intrinsically cheaper the the cool thing

is that Google we we kind of have like

this codesign link with the TPU team so

we're always like feeding them like our

like the profile of like how we're

spending our compute which allows them

to tweak the chip design and the data

center design you know within a couple

of years time frame which I think is

really really like kind of motivating

the uh the fact that you can be

distributed as Jack said gets better I

mean the thing that gets worse about

inference than training is um is that

you lose uh a lot of the parallelism in

the Transformer that you know that just

naively using Transformer you end up um

you you end up memory bound uh like

looking at your um uh attention keys in

values uh you know for every token that

you're generating so um you know so

there's like uh a lot of great work to

do in um both attacking this from a

model architecture perspective and from

a hardware perspective frankly so uh you

know to uh get ourselves um uh closer to

the point where we're you know where

we're taking like the massive

computational uh Power of uh of the sh

we have and uh and making ourselves able

to like fully apply that to uh to to

inference I want to leave the last word

to you where can they go learn more

about you and and what you guys are

doing yeah well um you know we have a

new and updated uh flash model that

applies thinking that's considerably

stronger than uh the last Model we

released in January it's out on the

Gemini app I would definitely encourage

people to try that out and give us

feedback we have been incorporating

develop a feedback use a feedback into

each model series so like I think that's

a a good like that would be one thing I

would encourage people to

do what Jack said yeah well thank you

both so much seriously it's uh such a

pleasure to be able to do that one all

this with you real pleasure yeah thanks

hey guys this is Jacob just one more

thing before you take off if you enjoyed

that conversation please consider

leaving a festar rating on the show

doing so helps the podcast reach more

listeners and helps us bring on the best

guess this has been an episode of

unsupervised learning an AI podcast by

redo Ventures where we probe the

sharpest Minds in AI about what's real

today what's going to be real in the

future and what it means for businesses

in the world with the fast moving pace

of AI we aim to help you deconstruct and

understand the most important

breakthroughs and see a clearer picture

of reality thank you for listening and

see you next episode

[Music]

Loading...

Loading video analysis...