Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333

By Lex Fridman

Summary

## Key takeaways - **Neural nets: simple math, surprising power**: Neural networks are essentially simple mathematical expressions with many 'knobs' (parameters) that, when trained on complex problems, exhibit surprising emergent behaviors. [01:05], [04:09] - **Biology vs. AI: different optimization paths**: While inspired by the brain, artificial neural networks evolve through a different optimization process than biological brains, making direct analogies misleading. [06:15], [06:53] - **Transformers: general-purpose, adaptable computers**: The Transformer architecture is a powerful, general-purpose 'computer' that is expressive, optimizable, and efficient, enabling it to process diverse data like video, images, and text. [34:35], [36:03] - **Language models learn world knowledge via next-word prediction**: By training on vast internet text to predict the next word, language models implicitly learn about chemistry, physics, and human nature, exhibiting surprising emergent properties. [42:15], [44:01] - **Software 2.0: programming via data and objectives**: The future of software development ('Software 2.0') involves programming not with explicit code, but by curating data sets and defining objectives to train neural networks. [01:06:19], [01:10:37] - **Vision is sufficient and necessary for driving**: Cameras are a high-bandwidth, cost-effective sensor, and since the world is designed for human vision, relying solely on cameras for driving is both necessary and sufficient. [01:19:09], [01:32:34]

Topics Covered

Why modern AI is an "alien artifact," not a brain copy.
Humans are biological bootloaders for universe-solving AI.
Can AI discover "exploits" within the universe's physics?
The Transformer: Expressive, Optimizable, and Efficient.
Software 2.0: AI writes code using neural network weights.

Full Transcript

think it's possible that physics has

exploits and we should be trying to find

them arranging some kind of a crazy

quantum mechanical system that somehow

gives you buffer overflow somehow gives

you a rounding error in the floating

Point synthetic intelligences are kind

of like the next stage of development

and I don't know where it leads to like

at some point I suspect

the universe is some kind of a puzzle

these synthetic AIS will uncover that

puzzle and

solve it

the following is a conversation with

Andre capothy previously the director of

AI at Tesla and before that at open Ai

and Stanford he is one of the greatest

scientists engineers and Educators in

the history of artificial intelligence

this is the Lex Friedman podcast to

support it please check out our sponsors

and now dear friends here's Andre

capathi

what is a neural network and why does it

seem to uh do such a surprisingly good

job of learning what is a neural network

it's a mathematical abstraction of

the brain I would say that's how it was

originally developed

at the end of the day it's a

mathematical expression and it's a

fairly simple mathematical expression

when you get down to it it's basically a

sequence of Matrix multiplies which are

really dot products mathematically and

some nonlinearities thrown in and so

it's a very simple mathematical

expression and it's got knobs in it many

knobs many knobs and these knobs are

Loosely related to basically the

synapses in your brain they're trainable

they're modifiable and so the idea is

like we need to find the setting of The

Knobs that makes the neural nut do

whatever you want it to do like classify

images and so on and so there's not too

much mystery I would say in it like

um you might think that basically don't

want to endow it with too much meaning

with respect to the brain and how it

works it's really just a complicated

mathematical expression with knobs and

those knobs need a proper setting for it

to do something uh desirable yeah but

poetry is just the collection of letters

with spaces but it can make us feel a

certain way and in that same way when

you get a large number of knobs together

whether it's in a inside the brain or

inside a computer they seem to they seem

to surprise us with the with their power

yeah I think that's fair so basically

I'm underselling it by a lot because you

definitely do get very surprising

emergent behaviors out of these neurons

when they're large enough and trained on

complicated enough problems like say for

example the next uh word prediction in a

massive data set from the internet and

then these neurons take on a pretty

surprising magical properties yeah I

think it's kind of interesting how much

you can get out of even very simple

mathematical formalism when your brain

right now I was talking is it doing next

word prediction

or is it doing something more

interesting well definitely some kind of

a generative model that's a gpt-like and

prompted by you

um yeah so you're giving me a prompt and

I'm kind of like responding to it in a

generative way and by yourself perhaps a

little bit like are you adding extra

prompts from your own memory inside your

head

automatically feels like you're

referencing some kind of a declarative

structure of like memory and so on and

then uh you're putting that together

with your prompt and giving away some

messages like how much of what you just

said has been said by you before

uh nothing basically right no but if you

actually look at all the words you've

ever said in your life and you do a

search you'll probably said a lot of the

same words in the same order before yeah

it could be I mean I'm using phrases

that are common Etc but I'm remixing it

into a pretty uh sort of unique sentence

at the end of the day but you're right

definitely there's like a ton of

remixing what you didn't you it's like

Magnus Carlsen said uh I'm I'm rated

2900 whatever which is pretty decent I

think you're talking very uh you're not

giving enough credit to neural Nets here

why do they seem to

what's your best intuition

about this emergent Behavior I mean it's

kind of interesting because I'm

simultaneously underselling them but I

also feel like there's an element to

which I'm over like it's actually kind

of incredible that you can get so much

emergent magical Behavior out of them

despite them being so simple

mathematically so I think those are kind

of like two surprising statements that

are kind of just juxtapose together

and I think basically what it is is we

are actually fairly good at optimizing

these neural Nets and when you give them

a hard enough problem they are forced to

learn very interesting Solutions in the

optimization and those solution

basically have these immersion

properties that are very interesting

there's wisdom and knowledge

in the knobs

and so what's this representation that's

in the knobs does it make sense to you

intuitively the large number of knobs

can hold the representation that

captures some deep wisdom about the data

it has looked at

it's a lot of knobs it's a lot of knobs

and somehow you know so speaking

concretely

um one of the neural Nets that people

are very excited about right now are are

gpts which are basically just next word

prediction networks so you consume a

sequence of words from the internet and

you try to predict the next word and uh

once you train these on a large enough

data set

um they you can basically uh prompt

these neural amounts in arbitrary ways

and you can ask them to solve problems

and they will so you can just tell them

you can you can make it look like you're

trying to um

solve some kind of a mathematical

problem and they will continue what they

think is the solution based on what

they've seen on the internet and very

often those Solutions look very

remarkably consistent look correct

potentially

do you still think about the brain side

of it so as neural Nets is an

abstraction or mathematical abstraction

of the brain you still draw wisdom

from from the biological neural networks

or even the bigger question so you're a

big fan of biology and biological

computation

what impressive thing is biology do

doing to you that computers are not yet

that Gap I would say I'm definitely on

I'm much more hesitant with the

analogies to the brain than I think you

would see potentially in the field

um and I kind of feel like

certainly the way neural network started

is everything stemmed from inspiration

by the brain but at the end of the day

the artifacts that you get after

training they are arrived at by a very

different optimization process than the

optimization process that gave rise to

the brain and so I think uh

I kind of think of it as a very

complicated alien artifact

um it's something different I'm not

sorry the uh the neuralness that we're

training okay they are complicated uh

Alien artifact uh I do not make

analogies to the brain because I think

the optimization process that gave rise

to it is very different from the brain

so there was no multi-agent self-play

kind of uh setup uh and evolution it was

an optimization that is basically a what

amounts to a compression objective on a

massive amount of data okay so

artificial neural networks are doing

compression

and biological neural networks

are not to survive and they're not

really doing any they're they're an

agent in a multi-agent self-place system

that's been running for a very very long

time that said Evolution has found that

it is very useful to to predict and have

a predictive model in the brain and so I

think our brain utilizes something that

looks like that as a part of it but it

has a lot more you know gadgets and

gizmos and uh value functions and

ancient nuclei that are all trying to

like make a survive and reproduce and

everything else and the whole thing

through embryogenesis is built from a

single cell I mean it's just the code is

inside the DNA and it just builds it up

like the entire organism

yes and like it does it pretty well

it should not be possible so there's

some learning going on there's some

there's some there's some kind of

computation going through that building

process I mean I I don't know where

if you were just to look at the entirety

of history of life on Earth

where do you think is the most

interesting invention is it the origin

of life itself

is it just jumping to eukaryotes is it

mammals is it humans themselves Homo

sapiens the the origin of intelligence

or highly complex intelligence

or or is it all just in continuation the

same kind of process

certainly I would say it's an extremely

remarkable story that I'm only like

briefly learning about recently all the

way from

um actually like you almost have to

start at the formation of Earth and all

of its conditions and the entire solar

system and how everything is arranged

with Jupiter and Moon and the habitable

zone and everything and then you have an

active Earth

that's turning over material

and um and then you start with a

biogenesis and everything and so it's

all like a pretty remarkable story I'm

not sure that

I can pick like a single Unique Piece of

it that I find most interesting

um

I guess for me as an artificial

intelligence researcher it's probably

the last piece we have lots of animals

that uh you know are are not building

technological Society but we do and um

it seems to have happened very quickly

it seems to have happened very recently

and uh

something very interesting happened

there that I don't fully understand I

almost understand everything else kind

of I think intuitively uh but I don't

understand exactly that part and how

quick it was both explanations would be

interesting one is that this is just a

continuation of the same kind of process

there's nothing special about humans

that would be deeply understanding that

would be very interesting that we think

of ourselves as special but it was

obvious all it was already written in

the in the code that you would have

greater and greater intelligence

emerging and then the other explanation

which is something truly special

happened something like a rare event

whether it's like crazy rare event like

uh Space Odyssey what would it be see if

you say like the invention of Fire

or

the uh as Richard rangham says the beta

males deciding a clever way to kill the

alpha males by collaborating so just

optimizing the collaborations really the

multi-agent aspect of the multi-agent

and that really being constrained on

resources and trying to survive

the collaboration aspect is what created

the complex intelligence but it seems

like it's a natural outgrowth of the

evolution process like what could

possibly be a magical thing that

happened like a rare thing that would

say that humans are actually human level

intelligence is actually a really rare

thing in the universe

yeah I'm hesitant to say that it is rare

by the way but it definitely seems like

it's kind of like a punctuated

equilibrium where you have lots of

exploration and then you have certain

leaps sparse leaps in between so of

course like origin of life would be one

um you know DNA sex eukaryotic system

eukaryotic life

um the endosymbiosis event or the

archaeon 8 little bacteria you know just

the whole thing and then of course

emergence of Consciousness and so on so

it seems like definitely there are

sparse events where mass amount of

progress was made but yeah it's kind of

hard to pick one so you don't think

humans are unique gotta ask you how many

intelligent aliens civilizations do you

think are out there and uh is there

intelligence

different or similar to ours

yeah I've been preoccupied with this

question quite a bit recently uh

basically the for me Paradox and just

thinking through and and the reason

actually that I am very interested in uh

the origin of life is fundamentally

trying to understand how common it is

that there are technological societies

out there uh

um in space and the more I study it the

more I think that um

uh there should be quite a few quite a

lot why haven't we heard from them

because I I agree with you it feels like

I just don't see

why what we did here on Earth is so

difficult to do yeah and especially when

you get into the details of it I used to

think origin of life was very

um

it was this magical rare event but then

you read books like for example McLean

um uh the vital question a life

ascending Etc and he really gets in and

he really makes you believe that this is

not that rare basic chemistry you have

an active Earth and you have your

alkaline Vents and you have lots of

alkaline Waters mixing whether it's a

devotion and you have your proton

gradients and you have the little porous

pockets of these alkaline vents that

concentrate chemistry and um basically

as he steps through all of these little

pieces you start to understand that

actually this is not that crazy you

could see this happen on other systems

um and he really takes you from just a

geology to primitive life and he makes

it feel like it's actually pretty

plausible and also like the origin of

life

um didn't uh was actually fairly fast

after formation of Earth

um if I remember correctly just a few

hundred million years or something like

that after basically when it was

possible life actually arose and so that

makes me feel like that is not the

constraint that is not the limiting

variable and that life should actually

be fairly common

um and then it you know where the

drop-offs are is very

um is very interesting to think about I

currently think that there's no major

drop-offs basically and so there should

be quite a lot of life and basically

what it where that brings me to then is

the only way to reconcile the fact that

we haven't found anyone and so on is

that um we just can't we can't see them

we can't observe them just a quick brief

comment Nick Lane and a lot of

biologists I talked to they really seem

to think that the jump from bacteria to

more complex organisms is the hardest

jump the eukaryotic glyphosis yeah which

I don't I get it they're much more

knowledgeable uh than me about like the

intricacies of biology but that seems

like crazy because how much how many

single cell organisms are there like and

how much time you have surely it's not

that difficult like in a billion years

it's not even that long

of a time really just all these bacteria

under constrained resources battling it

out I'm sure they can invent more

complex again I don't understand it's

like how to move from a hello world

program to like like invent a function

or something like that I don't yeah I I

so I don't yeah so I'm with you I just

feel like I don't see any if the origin

of life that would be my intuition

that's the hardest thing but if that's

not the hardest thing because it happens

so quickly then it's got to be

everywhere and yeah maybe we're just too

dumb to see it well it's just we don't

have really good mechanisms for seeing

this life I mean uh by what

how um so I'm not an expert just to

preface this but just said it was I want

to meet an expert on alien intelligence

and how to communicate I'm very

suspicious of our ability to to find

these intelligences out there and to

find these Earths like radio waves for

example are are terrible uh their power

drops off as basically one over R square

uh so I remember reading that our

current radio waves would not be uh the

ones that we we are broadcasting would

not be uh measurable by our devices

today only like was it like one tenth of

a light year away like not even

basically tiny distance because you

really need like a targeted transmission

of massive power directed somewhere for

this to be picked up on long distances

and so I just think that our ability to

measure is um is not amazing I think

there's probably other civilizations out

there and then the big question is why

don't they build binomial probes and why

don't they Interstellar travel across

the entire galaxy and my current answer

is it's probably Interstellar travel is

like really hard uh you have the

interstellar medium if you want to move

at closer speed of light you're going to

be encountering bullets along the way

because even like tiny hydrogen atoms

and little particles of dust are

basically have like massive kinetic

energy at those speeds and so basically

you need some kind of shielding you need

you have all the cosmic radiation uh

it's just like brutal out there it's

really hard and so my thinking is maybe

Interstellar travel is just extremely

hard

to build hard

it feels like uh it feels like we're not

a billion years away from doing that it

just might be that it's very you have to

go very slowly potentially as an example

through space

um right as opposed to close the speed

of light so I'm suspicious basically of

our ability to measure life and I'm

suspicious of the ability to um just

permeate all of space in the Galaxy or

across galaxies and that's the only way

that I can certainly I can currently see

a way around it yeah it's kind of

mind-blowing to think that there's

trillions of intelligent alien

civilizations out there kind of slowly

traveling through space

to meet each other and some of them meet

some of them go to war some of them

collaborate or they're all just uh

independent they are all just like

little pockets I don't know well

statistically if there's like

if it's there's trillions of them surely

some of them some of the pockets are

close enough to get some of them happen

to be close yeah in the close enough to

see each other and then once you see

once you see something that is

definitely complex life like if we see

something yeah we're probably going to

be severe like intensely aggressively

motivated to figure out what the hell

that is and try to meet them what would

be your first instinct to try to like at

a generational level meet them or defend

against them or what would be your uh

Instinct as a president of the United

States

and the scientists

I don't know which hat you prefer in

this question

yeah I think the the question it's

really hard

um

I will say like for example for us

um we have lots of primitive life forms

on Earth

um next to us we have all kinds of ants

and everything else and we share space

with them and we are hesitant to impact

on them and to we are and we're trying

to protect them by default because they

are amazing interesting dynamical

systems that took a long time to evolve

and they are interesting and special and

I don't know that you want to

um destroy that by default and so I like

complex dynamical systems that took a

lot of time to evolve I think

I'd like to I like to preserve it if I

can afford to and I'd like to think that

the same would be true about uh the

galactic resources and that uh they

would think that we're kind of

incredible interesting story that took

time it took a few billion years to

unravel and you don't want to just

destroy it I could see two aliens

talking about Earth right now and saying

uh I'm I'm a big fan of complex

dynamical systems so I think it was a

value to preserve these and who

basically are a video game they watch or

show a TV show that they watch

yeah I think uh you would need like a

very good reason I think to

to destroy it uh like why don't we

destroy these ant farms and so on it's

because we're not actually like really

in direct competition with them right

now uh we do it accidentally and so on

but

um

there's plenty of resources and so why

would you destroy something that is so

interesting and precious well from a

scientific perspective you might probe

it yeah you might interact with it later

you might want to learn something from

it right so I wonder there's could be

certain physical phenomena that we think

is a physical phenomena but it's

actually interacting with us to like

poke the finger and see what happens I

think it should be very interesting to

scientists other alien scientists what

happened here

um and you know it's a what we're seeing

today is a snapshot basically it's a

result of a huge amount of computation

uh of over like billion years or

something like that so it could have

been initiated by aliens this could be a

computer running a program like when

okay if you had the power to do this

when you okay for sure at least I would

I would pick uh a Earth-like planet that

has the conditions based my

understanding of the chemistry

prerequisites for life

and I would see it with life and run it

right like yeah wouldn't you 100 do that

and observe it and then protect

I mean that that's not just a hell of a

good TV show it's it's a good scientific

experiment yeah and

that in his it's physical simulation

right maybe maybe the evolution is the

most like actually running it

uh is the most efficient way to uh

understand computation or to compute

stuff or to understand life or you know

what life looks like and uh what

branches it can take it does make me

kind of feel weird that we're part of a

science experiment but maybe it's

everything's a science experiments how

to does that change anything for us for

a science experiment

um I don't know two descendants of Apes

talking about being inside of a science

experience I'm suspicious of this idea

of like a deliberate Pence Premiere as

you described it service and I don't see

a divine intervention in some way in the

in the historical record right now I do

feel like

um the story in these in these books

like Nick Lane's books and so on sort of

makes sense uh and it makes sense how

life arose on Earth uniquely and uh yeah

I don't need a I need I don't need to

reach for more exotic explanations right

now sure but NPCs inside a video game

don't

don't don't observe any divine

intervention either and we might just be

all NPCs running a kind of code maybe

eventually they will currently NPCs are

really dumb but once they're running

gpts um maybe they will be like hey this

is really suspicious what the hell so

you uh famously tweeted it looks like if

you bombard Earth with photons for a

while you can emit A roadster

so if like an Hitchhiker's Guide to the

Galaxy we would summarize the story of

Earth so in in that book it's mostly

harmless

uh what do you think is all the possible

stories like a paragraph long or a

sentence long

that Earth could be summarized as once

it's done it's computation so like all

the Possible full

if Earth is a book right yeah uh

probably there has to be an ending I

mean there's going to be an end to Earth

and it could end in all kinds of ways it

could end soon it can end later what do

you think are the possible stories well

definitely there seems to be

yeah you're sort of

it's pretty incredible that these

self-replicating systems will basically

arise from the Dynamics and then they

perpetuate themselves and become more

complex and eventually become conscious

and build a society and I kind of feel

like in some sense it's kind of like a

deterministic wave uh that you know that

kind of just like happens on any you

know any sufficiently well arranged

system like Earth

and so I kind of feel like there's a

certain sense of inevitability in it

um and it's really beautiful and it ends

somehow right so it's a it's a

chemically

a diverse environment

where complex dynamical systems can

evolve and become more more further and

further complex but then there's a

certain

um

what is it there's certain terminating

conditions yeah I don't know what the

terminating conditions are but

definitely there's a trend line of

something and we're part of that story

and like where does that where does it

go so you know we're famously described

often as a biological Bootloader for AIS

and that's because humans I mean you

know we're an incredible

uh biological system and we're capable

of computation and uh you know and love

and so on

um but we're extremely inefficient as

well like we're talking to each other

through audio it's just kind of

embarrassing honestly they were

manipulating like seven symbols uh

serially we're using vocal chords it's

all happening over like multiple seconds

yeah it's just like kind of embarrassing

when you step down to the

uh frequencies at which computers

operate or are able to cooperate on and

so basically it does seem like

um synthetic intelligences are kind of

like the next stage of development and

um I don't know where it leads to like

at some point I suspect uh the universe

is some kind of a puzzle

and these synthetic AIS will uncover

that puzzle and um solve it

and then what happens after right like

what because if you just like Fast

Forward Earth many billions of years

it's like uh it's quiet and then it's

like to tourmal you see like city lights

and stuff like that and then what

happens like at the end like is it like

a

is it or is it like a calming is it

explosion is it like Earth like open

like a giant because you said emit

Roasters like well let's start emitting

like like a giant number of Like

Satellites yes it's some kind of a crazy

explosion and we're living we're like

we're stepping through a explosion and

we're like living day to day and it

doesn't look like it but it's actually

if you I saw a very cool animation of

Earth uh and life on Earth and basically

nothing happens for a long time and then

the last like two seconds like basically

cities and everything and just in the

low earth orbit just gets cluttered and

just the whole thing happens in the last

two seconds and you're like this is

exploding this is a statement explosion

so if you play

yeah yeah if you play it at normal speed

yeah it'll just look like an explosion

it's a firecracker we're living in a

firecracker where it's going to start

emitting all kinds of interesting things

yeah and then so explosion doesn't it

might actually look like a little

explosion with with lights and fire and

energy emitted all that kind of stuff

but when you look inside the details of

the explosion there's actual complexity

happening where there's like uh yeah

human life or some kind of life we hope

it's not destructive firecracker it's

kind of like a constructive uh

firecracker all right so given that I

think uh hilarious disgusting it is a

really interesting to think about like

what the puzzle of the universe is that

the creator of the universe uh give us a

message like for example in the book

contact

UM Carl Sagan uh there's a message for

Humanity for any civilization in uh

digits in the expansion of Pi and base

11 eventually which is kind of

interesting thought uh maybe maybe we're

supposed to be giving a message to our

creator maybe we're supposed to somehow

create some kind of a quantum mechanical

system that alerts them to our

intelligent presence here because if you

think about it from their perspective

it's just say like Quantum field Theory

massive like cellular automaton like

thing and like how do you even notice

that we exist you might not even be able

to pick us up in that simulation and so

how do you uh how do you prove that you

exist that you're intelligent and that

you're a part of the universe so this is

like a touring test for intelligence

from Earth yeah the Creator is uh I mean

maybe this is uh like trying to complete

the next word in a sentence this is a

complicated way of that like Earth is

just is basically sending a message back

yeah the puzzle is basically like

alerting the Creator that we exist or

maybe the puzzle is just to just break

out of the system and just uh you know

stick it to the Creator in some way uh

basically like if you're playing a video

game you can um

you can somehow find an exploit and find

a way to execute on the host machine in

the arbitrary code there's some for

example I believe someone got Mario a

game of Mario to play Pong just by uh

exploiting it and then

um creating a basically writing writing

code and being able to execute arbitrary

code in the game and so maybe we should

be maybe that's the puzzle is that we

should be um

uh find a way to exploit it so so I

think like some of these synthetic ads

will eventually find the universe to be

some kind of a puzzle and then solve it

in some way and that's kind of like the

end game somehow do you often think

about it as a as a simulation so as the

universe being a kind of computation

that has might have bugs and exploits

yes yeah I think so is that what physics

is essentially I think it's possible

that physics has exploits and we should

be trying to find them arranging some

kind of a crazy quantum mechanical

system that somehow gives you buffer

overflow somehow gives you a rounding

error and a floating Point

uh yeah that's right and like more and

more sophisticated exploits like those

are jokes but that could be actually

very close yeah we'll find some way to

extract infinite energy for example when

you train a reinforcement learning

agents um and physical simulations and

you ask them to say run quickly on the

flat ground they'll end up doing all

kinds of like weird things

um in part of that optimization right

they'll get on their back leg and they

will slide across the floor and it's

because of the optimization the

enforcement learning optimization on

that agent has figured out a way to

extract infinite energy from the

friction forces and basically their poor

implementation and they found a way to

generate infinite energy and just slide

across the surface and it's not what you

expected it's just a it's sort of like a

perverse solution and so maybe we can

find something like that maybe we can be

that little

dog in this physical simulation the the

cracks or escapes the intended

consequences of the physics that the

Universe came up with we'll figure out

some kind of shortcut to some weirdness

yeah and then oh man but see the problem

with that weirdness is the first person

to discover the weirdness like sliding

in the back legs

that's all we're going to do yeah it's

very quickly because everybody does that

thing so like the the paper clip

maximizer is a ridiculous idea but that

very well you know could be what then

we'll just uh we'll just all switch that

because it's so fun well no person will

Discover it I think by the way I think

it's going to have to be uh some kind of

a super intelligent AGI of a third

generation

like we're building the first generation

AGI you know

third generation yeah so the the

Bootloader for an AI the that AI yeah

will be a Bootloader for another AI yeah

and then there's no way for us to

introspect like what that might even uh

I think it's very likely that these

things for example like say you have

these agis it's very like for example

they will be completely inert I like

these kinds of sci-fi books sometimes

where these things are just completely

inert they don't interact with anything

and I find that kind of beautiful

because uh they probably they've

probably figured out the meta game of

the universe in some way potentially

they're they're doing something

completely beyond our imagination

um and uh they don't interact with

simple chemical life forms like why

would you do that so I find those kinds

of ideas compelling what's their source

of fun what are they what are they doing

what's the source of solving in the

universe

but inert so can you define what it

means inert so they escape

as in um

they will behave in some very like

strange way to us because they're uh

they're beyond they're playing The Meta

game uh and The Meta game is probably

say like arranging quantum mechanical

systems in some very weird ways to

extract Infinite Energy uh solve the

digital expansion of Pi to whatever

amount uh they will build their own like

little Fusion reactors or something

crazy like they're doing something

Beyond Comprehension and uh not

understandable to us and actually

brilliant under the hood what if quantum

mechanics itself is the system and we're

just thinking it's physics

but we're really parasites on on or not

parasite we're not really hurting

physics we're just living on this

organisms this organism and we're like

trying to understand it but really it is

an organism and with a deep deep

intelligence maybe physics itself is

uh the the organism that's doing a super

interesting thing and we're just like

one little thing yeah ant sitting on top

of it trying to get energy from it we're

just kind of like these particles in a

wave that I feel like is mostly

deterministic and takes uh Universe from

some kind of a big bang to some kind of

a super intelligent replicator some kind

of a stable point in the universe given

these laws of physics you don't think uh

as Einstein said God doesn't play dice

so you think it's mostly deterministic

there's no Randomness in the thing I

think it's deterministic oh there's tons

of uh well I'm I want to be careful with

Randomness pseudo random yeah I don't

like random uh I think maybe the laws of

physics are deterministic

um yeah I think they're determinants

just got really uncomfortable with this

question

do you have anxiety about whether the

universe is random or not

what's there's no Randomness uh you say

you like Goodwill Hunting it's not your

fault Andre it's not it's not your fault

man

um so you don't like Randomness uh yeah

I think it's uh unsettling I think it's

a deterministic system I think that

things that look random like say the uh

collapse of the wave function Etc I

think they're actually deterministic

just entanglement uh and so on and uh

some kind of a Multiverse Theory

something something okay so why does it

feel like we have a free will like if I

raise the hand I chose to do this now

um what

that doesn't feel like a deterministic

thing it feels like I'm making a choice

it feels like it okay so it's all

feelings it's just feelings yeah so when

an RL agent is making a choice is that

um

it's not really making a choice the

choices are all already there yeah

you're interpreting the choice and

you're creating a narrative for or

having made it yeah and now we're

talking about the narrative it's very

meta looking back what is the most

beautiful or surprising idea in deep

learning or AI in general that you've

come across you've seen this field

explode

and grow in interesting ways just what

what cool ideas like like we made you

sit back and go hmm small big or small

well the one that I've been thinking

about recently the most probably is the

the Transformer architecture

um so basically uh neural networks have

a lot of architectures that were trendy

have come and gone for different sensory

modalities like for Vision Audio text

you would process them with different

looking neural nuts and recently we've

seen these convergence towards one

architecture the Transformer and you can

feed it video or you can feed it you

know images or speech or text and it

just gobbles it up and it's kind of like

a bit of a general purpose uh computer

that is also trainable and very

efficient to run on our Hardware

and so uh this paper came out in 2016 I

want to say

um attention is all you need attention

is all you need you criticize the paper

title in retrospect that it wasn't

um it didn't foresee the bigness of the

impact yeah that it was going to have

yeah I'm not sure if the authors were

aware of the impact that that paper

would go on to have probably they

weren't but I think they were aware of

some of the motivations and design

decisions beyond the Transformer and

they chose not to I think expand on it

in that way in the paper and so I think

they had an idea that there was more

um than just the surface of just like oh

we're just doing translation and here's

a better architecture you're not just

doing translation this is like a really

cool differentiable optimizable

efficient computer that you've proposed

and maybe they didn't have all of that

foresight but I think is really

interesting isn't it funny sorry to

interrupt that title is memeable that

they went for such a profound idea they

went with the I don't think anyone used

that kind of title before right

protection is all you need yeah it's

like a meme or something exactly it's

not funny that one like uh maybe if it

was a more serious title it wouldn't

have the impact honestly I yeah there is

an element of me that honestly agrees

with you and prefers it this way yes

if it was two grand it would over

promise and then under deliver

potentially so you want to just uh meme

your way to greatness

that should be a t-shirt so you you

tweeted the Transformers the Magnificent

neural network architecture because it

is a general purpose differentiable

computer it is simultaneously expressive

in the forward pass optimizable via back

propagation gradient descent and

efficient High parallelism compute graph

can you discuss some of those details

expressive optimizable efficient

yeah for memory or or in general

whatever comes to your heart you want to

have a general purpose computer that you

can train on arbitrary problems like say

the task of next word prediction or

detecting if there's a cat in the image

or something like that

and you want to train this computer so

you want to set its weights and I think

there's a number of design criteria that

sort of overlap in the Transformer

simultaneously that made it very

successful and I think the authors were

kind of uh deliberately trying to make

this really powerful architecture and um

so basically it's very powerful in the

forward pass because it's able to

express

um very general computation as a sort of

something that looks like message

passing you have nodes and they all

store vectors and these nodes get to

basically look at each other and it's

each other's vectors and they get to

communicate and basically notes get to

broadcast hey I'm looking for certain

things and then other nodes get to

broadcast hey these are the things I

have those are the keys and the values

so it's not just the tension yeah

exactly Transformer is much more than

just the attention component it's got

many pieces architectural that went into

it the residual connection of the way

it's arranged there's a multi-layer

perceptron in there the way it's stacked

and so on

um but basically there's a message

passing scheme where nodes get to look

at each other decide what's interesting

and then update each other and uh so I

think the um when you get to the details

of it I think it's a very expressive

function uh so it can express lots of

different types of algorithms and

forward paths not only that but the way

it's designed with the residual

connections layer normalizations the

softmax attention and everything it's

also optimizable this is a really big

deal because there's lots of computers

that are powerful that you can't

optimize or they're not easy to optimize

using the techniques that we have which

is back propagation and gradient and

send these are first order methods very

simple optimizers really and so

um you also need it to be optimizable

um and then lastly you want it to run

efficiently in the hardware our Hardware

is a massive throughput machine like

gpus they prefer lots of parallelism so

you don't want to do lots of sequential

operations you want to do a lot of

operations serially and the Transformer

is designed with that in mind as well

and so it's designed for our hardware

and it's designed to both be very

expressive in a forward pass but also

very optimizable in the backward pass

and you said that uh the residual

connections support a kind of ability to

learn short algorithms fast them first

and then gradually extend them longer

during training yeah what's what's the

idea of learning short algorithms right

think of it as a so basically a

Transformer is a series of uh blocks

right and these blocks have attention

and a little multi-layer perceptron and

so you you go off into a block and you

come back to this residual pathway and

then you go off and you come back and

then you have a number of layers

arranged sequentially and so the way to

look at it I think is because of the

residual pathway in the backward path

the gradients uh sort of flow along it

uninterrupted because addition

distributes the gradient equally to all

of its branches so the gradient from the

supervision at the top uh just floats

directly to the first layer and the all

the residual connections are arranged so

that in the beginning during

initialization they contribute nothing

to the residual pathway

um so what it kind of looks like is

imagine the Transformer is kind of like

a uh python uh function like a death

and um you get to do various kinds of

like lines of code say you have a

hundred layers deep Transformer

typically they would be much shorter say

20. so if 20 lines of code then you can

do something in them and so think of

during the optimization basically what

it looks like is first you optimize the

first line of code and then the second

line of code can kick in and the third

line of code can and I kind of feel like

because of the residual pathway and the

Dynamics of the optimization you can

sort of learn a very short algorithm

that gets the approximate tensor but

then the other layers can sort of kick

in and start to create a contribution

and at the end of it you're you're

optimizing over an algorithm that is 20

lines of code

except these lines of code are very

complex because it's an entire block of

a transformer you can do a lot in there

what's really interesting is that this

Transformer architecture actually has

been a remarkably resilient basically

the Transformer that came out in 2016 is

the Transformer you would use today

except you reshuffle some of the layer

norms the layer normalizations have been

reshuffled to a pre-norm formulation and

so it's been remarkably stable but

there's a lot of bells and whistles that

people have attached on and try to uh

improve it I do think that basically

it's a it's a big step in simultaneously

optimizing for lots of properties of a

desirable neural network architecture

and I think people have been trying to

change it but it's proven remarkably

resilient but I do think that there

should be even better architectures

potentially but it's uh your you admire

the resilience here yeah there's

something profound about this

architecture that that at least so maybe

we can everything can be turned into a

uh into a problem that Transformers can

solve currently definitely looks like

the Transformers taking over Ai and you

can feed basically arbitrary problems

into it and it's a general

differentiable computer and it's

extremely powerful and uh this

convergence in AI has been really

interesting to watch uh for me

personally what else do you think could

be discovered here about Transformers

like what's surprising thing or or is it

a stable

um

I went to stable place is there

something interesting we might discover

about Transformers like aha moments

maybe has to do with memory uh maybe

knowledge representation that kind of

stuff

definitely the Zeitgeist today is just

pushing like basically right now this ad

guys is do not touch the Transformer

touch everything else yes so people are

scaling up the data sets making them

much much bigger they're working on the

evaluation making the evaluation much

much bigger and uh

um they're basically keeping the

architecture unchanged and that's how

we've um that's the last five years of

progress in AI kind of

what do you think about one flavor of it

which is language models

have you been surprised

uh

has your sort of imagination been

captivated by you mentioned GPT and all

the bigger and bigger and bigger

language models

and uh what are the limits

of those models do you think

so just let the task of natural language

basically the way GPT is trained right

is you just download a mass amount of

text Data from the internet and you try

to predict the next word in a sequence

roughly speaking you're predicting will

work chunks but uh roughly speaking

that's it and what's been really

interesting to watch is

uh basically it's a language model

language models have actually existed

for a very long time

um there's papers on language modeling

from 2003 even earlier can you explain

that case what a language model is uh

yeah so language model just basically

the rough idea is um just predicting the

next word in a sequence roughly speaking

uh so there's a paper from for example

bengio and the team from 2003 where for

the first time they were using a neural

network to take say like three or five

words and predict the

um next word and they're doing this on

much smaller data sets and the neural

net is not a Transformer it's a multiple

error perceptron but but it's the first

time that a neural network has been

applied in that setting but even before

neural networks there were language

models except they were using engram

models so engram models are just a count

based models so

um if you try to if you start to take

two words and predict the third one you

just count up how many times you've seen

any two word combinations and what came

next and what you predict that's coming

next is just what you've seen the most

of in the training set

and so language modeling has been around

for a long time neural networks have

done language modeling for a long time

so really what's new or interesting or

exciting is just realizing that when you

scale it up

with a powerful enough neural net

Transformer you have all these emergent

properties where basically what happens

is if you have a large enough data set

of text

you are in the task of predicting the

next word you are multitasking a huge

amount of different kinds of problems

you are multitasking understanding of

you know chemistry physics human nature

lots of things are sort of clustered in

that objective it's a very simple

objective but actually you have to

understand a lot about the world to make

that prediction you just said the U word

understanding uh are you in terms of

chemistry and physics and so on what do

you feel like it's doing is it searching

for the right context

uh in in like what what is it what is

the actual process Happening Here Yeah

so basically it gets a thousand words

and it's trying to predict a thousand at

first and uh in order to do that very

very well over the entire data set

available on the internet you actually

have to basically kind of understand the

context of of what's going on in there

yeah

um and uh it's a sufficiently hard

problem that you uh if you have a

powerful enough computer like a

Transformer you end up with uh

interesting Solutions and uh you can ask

it uh to all do all kinds of things and

um it it shows a lot of emerging

properties like in context learning that

was the big deal with GPT and the

original paper when they published it is

that you can just sort of uh prompt it

in various ways and ask it to do various

things and it will just kind of complete

the sentence but in the process of just

completing the sentence it's actually

solving all kinds of really uh

interesting problems that we care about

do you think it's doing something like

understanding

like and when we use the word

understanding for us humans

I think it's doing some understanding it

in its weights it understands I think a

lot about the world and it has to in

order to predict the next word in a

sequence

so let's train on the data from the

internet

uh what do you think about this this

approach in terms of data sets of using

data from the internet do you think the

internet has enough structured data to

teach AI about human civilization

yeah so I think the internet has a huge

amount of data I'm not sure if it's a

complete enough set I don't know that uh

text is enough for having a sufficiently

powerful AGI as an outcome

um of course there is audio and video

and images and all that kind of stuff

yeah so text by itself I'm a little bit

suspicious about there's a ton of things

we don't put in text in writing uh just

because they're obvious to us about how

the world works and the physics of it

and the Things fall we don't put that

stuff in text because why would you we

share that understanding

and so Texas communication medium

between humans and it's not a

all-encompassing medium of knowledge

about the world but as you pointed out

we do have video and we have images and

we have audio and so I think that that

definitely helps a lot but we haven't

trained models uh sufficiently uh across

both across all those modalities yet so

I think that's what a lot of people are

interested in but I wonder what that

shared understanding of like well we

might call Common Sense

has to be learned

inferred in order to complete the

sentence correctly so maybe the fact

that it's implied on the internet the

model is going to have to learn that not

by reading about it by inferring it in

the representation so like common sense

just like we I don't think we learn

common sense like nobody says

tells us explicitly we just figure it

all out by interacting with the world

right so here's a model of reading about

the way people interact with the world

it might have to infer that

I wonder yeah uh you you briefly worked

on a project called the world of bits

training in our RL system to take

actions on the internet

versus just consuming the internet like

we talked about do you think there's a

future for that kind of system

interacting with the internet to help

the learning yes I think that's probably

the uh the final frontier for a lot of

these models because

um so as you mentioned I was at open AI

I was working on this project world of

bits and basically it was the idea of

giving neural networks access to a

keyboard and a mouse and the idea could

possibly go wrong so basically you um

you perceive the input of the screen

pixels

and basically the state of the computer

is sort of visualized for human

consumption in images of the web browser

and stuff like that and then you give

the neural network the ability to press

keyboards and use the mouse and we're

trying to get it to for example complete

bookings and you know interact with user

interfaces and um what did you learn

from that experience like what was some

fun stuff this is super cool idea yeah I

mean it's like

uh yeah I mean the the step between

Observer to actor yeah is a super

fascinating step yeah well the universal

interface in the digital realm I would

say and there's a universal interface in

like the Physical Realm which in my mind

is a humanoid form factor kind of thing

we can later talk about Optimus and so

on but I feel like there's a

they're kind of like a similar

philosophy in some way where the human

the world the physical world is designed

for the human form and the digital world

is designed for the human form of seeing

the screen and using keyword keyboard

and mouse and so as the universal

interface that can basically uh command

the digital infrastructure we've built

up for ourselves and so it feels like a

very powerful interface to to command

and to build on top of now to your

question as to like what I learned from

that it's interesting because the world

of bits was basically uh too early I

think at open AI at the time

this is around 2015 or so and the

Zeitgeist at that time was very

different in AI from the Zeitgeist today

at the time everyone was super excited

about reinforcement learning from

scratch this is the time of the Atari

paper where uh neural networks were

playing Atari games and beating humans

in some cases uh alphago and so on so

everyone's very excited about train

training neural networks from scratch

using reinforcement learning

um directly

it turns out that reinforcement learning

is extremely inefficient way of training

neural networks because you're taking

all these actions and all these

observations and you get some sparse

rewards once in a while so you do all

this stuff based on all these inputs and

once in a while you're like told you did

a good thing you did a bad thing and

it's just an extremely hard problem you

can't learn from that you can burn

forest and you can sort of Brute Force

through it and we saw that I think with

uh you know with uh go and DOTA and so

on and it does work but it's extremely

inefficient uh and not how you want to

approach problems uh practically

speaking and so that's the approach that

at the time we also took to World of

bits we would uh have an agent

initialize randomly so with keyboard

mash and mouse mash and try to make a

booking and it's just like revealed the

insanity of that approach very quickly

where you have to stumble by the correct

booking in order to get a reward of you

did it correctly and you're never going

to stumble by it by chance at random

so even with a simple web interface

there's too many options there's just

too many options uh and uh it's two

sparse of reward signal and you're

starting from scratch at the time and so

you don't know how to read you don't

understand pictures images buttons you

don't understand what it means to like

make a booking but now what's happened

is uh it is time to revisit that and

open your eyes interested in this uh

companies like Adept are interested in

this and so on and uh the idea is coming

back because the interface is very

powerful but now you're not training an

agent from scratch you are taking the

GPT as an initialization so GPT is

pre-trained on all of text and it

understands what's a booking it

understands what's a submit it

understands um quite a bit more and so

it already has those representations

they are very powerful and that makes

all the training significantly more

efficient and makes the problem

tractable should the interaction be with

like the way humans see it with the

buttons and the language or it should be

with the HTML JavaScript and this and

the CSS what's what do you think is the

better so today all this interest is

mostly on the level of HTML CSS and so

on that's done because of computational

constraints but I think ultimately

everything is designed for human visual

consumption and so at the end of the day

there's all the additional information

is in the layout of the web page and

what's next to you and what's a red

background and all this kind of stuff

and what it looks like visually so I

think that's the final frontier as we

are taking in pixels and we're giving

out keyboard mouse commands but I think

it's impractical still today do you

worry about bots on the internet

given given these ideas given how

exciting they are do you worry about

bots on Twitter being not the the stupid

boss that we see now with the cryptobots

but the Bots that might be out there

actually that we don't see that they're

interacting in interesting ways so this

kind of system feels like it should be

able to pass the I'm not a robot click

button whatever

um which you actually understand how

that test works I don't quite like

there's there's a there's a check box or

whatever that you click it's presumably

tracking oh I see like Mouse movement

and the timing and so on yeah so exactly

this kind of system we're talking about

should be able to pass that so yeah what

do you feel about

um Bots that are language models Plus

have some interact ability and are able

to tweet and reply and so on do you

worry about that world

uh yeah I think it's always been a bit

of an arms race uh between sort of the

attack and the defense uh so the attack

will get stronger but the defense will

get stronger as well our ability to

detect that how do you defend how do you

detect how do you know that your karpate

account on Twitter is is human

how do you approach that like if people

were claim you know uh how would you

defend yourself in the court of law that

I'm a human

um this account is yeah at some point I

think uh it might be I think the society

Society will evolve a little bit like we

might start signing digitally signing uh

some of our correspondents or you know

things that we create uh right now it's

not necessary but maybe in the future it

might be I do think that we are going

towards the world where we share

we share the digital space with uh AIS

synthetic beings yeah and uh they will

get much better and they will share our

digital realm and they'll eventually

share our Physical Realm as well it's

much harder uh but that's kind of like

the world we're going towards and most

of them will be benign and awful and

some of them will be malicious and it's

going to be an arms race trying to

detect them so I mean the worst isn't

the AI is the worst is the AIS

pretending to be human so mine I don't

know if it's always malicious there's

obviously a lot of malicious

applications but yeah it could also be

you know if I was an AI I would try very

hard to pretend to be human because

we're in a human world yeah I wouldn't

get any respect as an AI yeah I want to

get some love and respect I don't think

the problem is intractable people are

people are thinking about the proof of

personhood yes and uh we might start

digitally signing our stuff and we might

all end up having like uh

yeah basically some some solution for

proof of personhood it doesn't seem to

be intractable it's just something that

we haven't had to do until now but I

think once the need like really starts

to emerge which is soon I think when

people think about it much more so but

that too will be a race because

um obviously you can probably uh spoof

or fake the the the proof of

personhood so you have to try to figure

out how to probably I mean it's weird

that we have like Social Security

numbers and like passports and stuff

it seems like it's harder to fake stuff

in the physical space than the residual

space it just feels like it's going to

be very tricky very tricky to out

um because it seems to be pretty low

cost fake stuff what are you gonna put

an AI in jail for like trying to use a

fake fake personhood proof you can I

mean okay fine you'll put a lot of AIS

in jail but there'll be more ai's

arbitrary like exponentially more the

cost of creating a bot is very low

uh unless there's some kind of way

to track accurately

like you're not allowed to create any

program without showing uh tying

yourself to that program like you any

program that runs on the internet you'll

be able to uh Trace every single human

program that was involved with that

program yeah maybe you have to start

declaring when uh you know we have to

start drawing those boundaries and

keeping track of okay uh what our

digital entities versus

human entities and uh what is the

ownership of human entities and digital

entities and uh

something like that

um

I don't know but I think I'm optimistic

that this is uh this is uh possible and

at some in some sense we're currently in

like the worst time of it because

um all these Bots suddenly have become

very capable but we don't have defenses

yet built up as a society and but I

think uh that doesn't seem to be

intractable it's just something that we

have to deal with it seems weird that

the Twitter but like really crappy

Twitter Bots are so numerous like is it

so I presume that the engineers at

Twitter are very good

so it seems like what I would infer from

that

uh is it seems like a hard problem it

they're probably catching all right if I

were to sort of steal them on the case

it's a hard problem and there's a huge

cost to uh false positive

to to removing a post by somebody that's

not a bot because creates a very bad

user experience so they're very cautious

about removing so maybe it's uh and

maybe the boss are really good at

learning what gets removed and not

such that they can stay ahead of the

removal process very quickly my

impression of it honestly is there's a

lot of blowing for it I mean yeah just

that's what I it's not subtle it's my

impression of it it's not so but you

have to yeah that's my impression as

well but it feels like maybe you're

seeing the the tip of the iceberg maybe

the number of bots is in like the

trillions and you have to like

just it's a constant assault of bots and

yeah you yeah I don't know

um you have to still man the case

because the boss I'm seeing are pretty

like obvious I could write a few lines

of code that catch these Bots I mean

definitely there's a lot of longing

fruit but I will say I agree that if you

are a sophisticated actor you could

probably create a pretty good bot right

now

um you know using tools like gpts

because it's a language model you can

generate faces that look quite good now

uh and you can do this at scale and so I

think um yeah it's quite plausible and

it's going to be hard to defend there

was a Google engineer that claimed that

the Lambda was sentient do you think

there's any inkling of Truth

to what he felt and more importantly to

me at least do you think language models

will achieve sentence or the illusion of

sentience

soonish fish yeah to me it's a little

bit of a canary Nicole mine kind of

moment honestly a little bit because uh

so this engineer spoke to like a chatbot

at Google and uh became convinced that

uh this bot is sentient yeah as there's

some existential philosophical questions

and it gave like reasonable answers and

looked real and uh and so on so to me

it's a uh

he was he was uh he wasn't sufficiently

trying to stress the system I think and

uh exposing the truth of it as it is

today

um

but uh I think this will be increasingly

harder over time uh so uh yeah I think

more and more people will basically uh

become

um

yeah I think more and more there will be

more people like that over time as this

gets better like form an emotional

connection to to an AI yeah perfectly

plausible in my mind I think these AIS

are actually quite good at human human

connection human emotion a ton of text

on the Internet is about humans and

connection and love and so on so I think

they have a very good understanding in

some in some sense of of how people

speak to each other about this and um

they're very capable of creating a lot

of that kind of text the um

there's a lot of like sci-fi from 50s

and 60s that imagined AIS in a very

different way they are calculating cold

vulcan-like machines that's not what

we're getting today we're getting pretty

emotional AIS that actually uh are very

competent and capable of generating

you know possible sounding text with

respect to all of these topics see I'm

really hopeful about AI systems that are

like companions that help you grow

develop as a human being help you

maximize long-term happiness but I'm

also very worried about AI systems that

figure out from the internet the humans

get attracted to drama and so these

would just be like shit talking AIS

that's just constantly did you hear like

they'll do gossip they'll do uh they'll

try to plant seeds of Suspicion to like

other humans that you love and trust and

just kind of mess with people uh in the

you know because because that's going to

get a lot of attention so drama maximize

drama on the path to maximizing uh

engagement and US humans will feed into

that machine yeah and get it'll be a

giant drama shitstorm

so I'm worried about that so it's the

objective function really defines the

way that human civilization progresses

with AIS in it yeah I think right now at

least today they are not sort of it's

not correct to really think of them as

goal seeking agents that want to do

something they have no long-term memory

or anything they it's literally a good

approximation of it is you get a

thousand words and you're trying to

predict a thousand at first and then you

continue feeding it in and you are free

to prompt it in whatever way you want so

in text so you say okay you are a

psychologist and you are very good and

you love humans and here's a

conversation between you and another

human human colon Something you

something and then it just continues the

pattern and suddenly you're having a

conversation with a fake psychologist

who's not trying to help you and so it's

still kind of like in a realm of a tool

it is a um people can prompt their

arbitrary ways and it can create really

incredible text but it doesn't have

long-term goals over long periods of

time it doesn't try to uh so it doesn't

look that way right now yeah but you can

do short-term goals that have long-term

effects so if my prompting

short-term goal is to get Andre capacity

to respond to me on Twitter when I

like I think AI might that's the goal

but he might figure out that talking

shit to you it would be the best in a

highly sophisticating interesting way

and then you build up a relationship

when you respond once and then it

like over time it gets to not be

sophisticated and just

like just talk shit

and okay maybe you won't get to Andre

but it might get to another celebrity it

might get into other big accounts and

then it'll just so with just that simple

goal get them to respond yeah maximize

the probability of actual response yeah

I mean you could prompt a uh powerful

model like this with their its opinion

about how to do any possible thing

you're interested in so they will

discuss they're kind of on track to

become these oracles I could I sort of

think of it that way they are oracles uh

currently is just text but they will

have calculators they will have access

to Google search they will have all

kinds of couches and gizmos they will be

able to operate the internet and find

different information and

um

yeah in some sense

that's kind of like currently what it

looks like in terms of the development

do you think it'll be an improvement

eventually over what Google is for

access to human knowledge like it'll be

a more effective search engine to access

human knowledge I think there's definite

scope in building a better search engine

today and I think Google they have all

the tools all the people they have

everything they need they have all the

puzzle pieces they have people training

Transformers at scale they have all the

data uh it's just not obvious if they

are capable as an organization to

innovate on their search engine right

now and if they don't someone else will

there's absolute scope for building a

significantly better search engine built

on these tools it's so interesting a

large company where the search there's

already an infrastructure it works as it

brings out a lot of money so where

structurally inside a company is their

motivation to Pivot yeah to say we're

going to build a new search engine yep

that's really hard so it's usually going

to come from a startup right that's um

that would be yeah or some other more

competent organization

um so uh I don't know so currently for

example maybe Bing has another shot at

it you know so Microsoft Edge because

we're talking offline

um I mean I definitely it's really

interesting because search engines used

to be about okay here's some query

here's here's here's web pages that look

like the stuff that you have but you

could just directly go to answer and

then have supporting evidence

um and these uh these models basically

they've read all the texts and they've

read all the web pages and so sometimes

when you see yourself going over to

search results and sort of getting like

a sense of like the average answer to

whatever you're interested in uh like

that just directly comes out you don't

have to do that work

um

so they're kind of like uh

yeah I think they have a way to this of

distilling all that knowledge into

like some level of insight basically do

you think of prompting as a kind of

teaching and learning like this whole

process like another layer

you know because maybe that's what

humans are we already have that

background model and then your the world

is prompting you yeah exactly I think

the way we are programming these

computers now like gpts is is converging

to how you program humans I mean how do

I program humans via prompt I go to

people and I I prompt them to do things

I prompt them from information and so uh

natural language prompt is how we

program humans and we're starting to

program computers directly in that

interface it's like pretty remarkable

honestly so you've spoken a lot about

the idea of software 2.0

um all good ideas

become like cliches so quickly like the

terms it's kind of hilarious

um it's like I think Eminem once said

that like if he gets annoyed by a song

He's written very quickly that means

it's going to be a big hit because it's

it's too catchy but uh can you describe

this idea and how you're thinking about

it has evolved over the months and years

since since you coined it yeah

yeah so I had a blog post on software

2.0 I think several years ago now

um

and the reason I wrote that post is

because I kept I kind of saw something

remarkable happening in

like software development and how a lot

of code was being transitioned to be

written not in sort of like C plus and

so on but it's written in the weights of

a neural net basically just saying that

neural Nets are taking over software the

realm of software and uh taking more and

more tasks and at the time I think not

many people understood uh this uh deeply

enough that this is a big deal it's a

big transition uh neural networks were

seen as one of multiple classification

algorithms you might use for your data

set problem on kaggle like this is not

that this is a change in how we program

computers

and I saw neural Nets as uh this is

going to take over the way we program

computers is going to change is not

going to be people writing a software in

C plus or something like that and

directly programming the software it's

going to be accumulating training sets

and data sets and crafting these

objectives by which we train these

neural Nets and at some point there's

going to be a compilation process from

the data sets and the objective and the

architecture specification into the

binary which is really just uh the

neural nut you know weights and the

forward pass of the neural net and then

you can deploy that binary and so I was

talking about that sort of transition

and uh that's what the post is about and

I saw this sort of play out in a lot of

fields uh you know autopilot being one

of them but also just a simple image

classification people thought originally

you know in the 80s and so on that they

would write the algorithm for detecting

a dog in an image and they had all these

ideas about how the brain does it and

first we detected corners and then we

detect lines and then we stitched them

up and they were like really going at it

they were like thinking about how

they're going to write the algorithm and

this is not the way you build it

and there was a smooth transition where

okay first we thought we were going to

build everything then we were building

the features uh so like Hawk features

and things like that that detect these

little statistical patterns from image

patches and then there was a little bit

of learning on top of it like a support

Vector machine or binary classifier for

cat versus dog and images on top of the

features so we wrote the features but we

trained the last layer sort of the the

classifier and then people are like

actually let's not even design the

features because we can't honestly we're

not very good at it so let's also learn

the features and then you end up with

basically a convolutional neural net

where you're learning most of it you're

just specifying the architecture and the

architecture has tons of fill in the

blanks which is all the knobs and you

let the optimization write most of it

and so this transition is happening

across the industry everywhere and uh

suddenly we end up with a ton of code

that is written in neural net weights

and I was just pointing out that the

analogy is actually pretty strong and we

have a lot of developer environments for

software 1.0 like we have Ides

um how you work with code how you debug

code how do you how do you run code how

do you maintain code we have GitHub so I

was trying to make those analogies in

the new realm like what is the GitHub or

software 2.0 it turns out that something

that looks like hugging face right now

uh you know and so I think some people

took it seriously and built cool

companies and uh many people originally

attacked the post it actually was not

well received when I wrote it and I

think maybe it has something to do with

the title but the post was not well

received and I think more people sort of

have been coming around to it over time

yeah so you were the director of AI at

Tesla where I think this idea

was really implemented at scale which is

how you have engineering teams doing

software 2.0 so can you sort of Linger

on that idea of I think we're in the

really early stages of everything you

just said which is like GitHub Ides

like how do we build engineering teams

that that work in software 2.0 systems

and and the the data collection and the

data annotation which is

all part of that software 2.0 like what

do you think is the task of programming

a software 2.0 is it debugging in the

space of hyper parameters or is it also

debugging the space of data yeah the way

by which you program the computer and

influence its algorithm is not by

writing the commands yourself you're

changing mostly the data set uh you're

changing the um loss functions of like

what the neural net is trying to do how

it's trying to predict things but yeah

basically the data sets and the

architectures of the neural net and um

so in the case of the autopilot a lot of

the data sets have to do with for

example detection of objects and Lane

line markings and traffic lights and so

on So You accumulate massive data sets

of here's an example here's the desired

label and then uh here's roughly how the

architect here's roughly what the

algorithm should look like and that's a

conclusional neural net so the

specification of the architecture is

like a hint as to what the algorithm

should roughly look like and then to

fill in the blanks process of

optimization is the training process

and then you take your neural nut that

was trained it gives all the right

answers on your data set and you deploy

it

so there's in that case perhaps it all

machine learning cases there's a lot of

tasks

so is coming up formulating a task like

uh for a multi-headed neural network is

formulating a task part of the

programming yeah very much so how you

break down a problem into a set of tasks

yeah

I'm on a high level I would say if you

look at the software running in in the

autopilot I gave a number of talks on

this topic I would say originally a lot

of it was written in software 1.0

there's imagine lots of C plus plus all

right and then gradually there was a

tiny neural net that was for example

predicting given a single image is there

like a traffic light or not or is there

a landline marking or not and this

neural net didn't have too much to do in

this in the scope of the software it was

making tiny predictions on individual

little image and then the rest of the

system stitched it up so okay we're

actually we don't have just a single

camera with eight cameras we actually

have eight cameras over time and so what

do you do with these predictions how do

you put them together how do you do the

fusion of all that information and how

do you act on it all of that was written

by humans um in C plus

and then we decided okay we don't

actually want uh to do all of that

Fusion in C plus code because we're

actually not good enough to write that

algorithm we want the neural Nets to

write the algorithm and we want to Port

uh all of that software into the 2.0

stack

and so then we actually had neural Nets

that now take all the eight camera

images simultaneously and make

predictions for all of that

so

um and and actually they don't make

predictions in a in the space of images

they now make predictions directly in 3D

and actually they don't in three

dimensions around the car and now

actually we don't

um manually fuse the predictions over in

3D over time we don't trust ourselves to

write that tracker so actually we give

the neural net uh the information over

time so it takes these videos now and

makes those predictions and so your sort

of just like putting more and more power

into the neural network processing and

at the end of it the eventual sort of

goal is to have most of the software

potentially be in the 2.0 land

um because it works significantly better

humans are just not very good at writing

software basically so the prediction is

space happening in this like 4D land

yeah was three-dimensional world over

time yeah how do you

do annotation in that world what what

have you as it's just a data annotation

whether it's self-supervised or manual

by humans is um is a big part of this

software 2.0 world right I would say by

far in the industry if you're like

talking about the industry and how what

is the technology of what we have

available everything is supervised

learning so you need data sets of input

desired output and you need lots of it

and um there are three properties of it

that you need you need it to be very

large you need it to be accurate No

mistakes and you need it to be diverse

you don't want to uh just have a lot of

correct examples of one thing you need

to really cover the space of possibility

as much as you can and the more you can

cover the space of possible inputs the

better the algorithm will work at the

end now once you have really good data

sets that you're collecting curating

um and cleaning you can train uh your

neural net

um on top of that so a lot of the work

goes into cleaning those data sets now

as you pointed out it's probably it

could be the question is how do you

achieve a ton of uh if you want to

basically predict in 3D you need data in

3D to back that up so in this video we

have eight videos coming from all the

cameras of the system and this is what

they saw and this is the truth of what

actually was around there was this car

there was this car this car these are

the lane line markings this is geometry

of the road there's a traffic light in

this three-dimensional position you need

the ground truth

um and so the big question that the team

was solving of course is how do you how

do you arrive at that ground truth

because once you have a million of it

and it's large clean and diverse then

training a neural network on it works

extremely well and you can ship that

into the car

and uh so there's many mechanisms by

which we collected that training data

you can always go for human annotation

you can go for simulation as a source of

ground truth you can also go for what we

call the offline tracker

um

that we've spoken about at the AI day

and so on which is basically an

automatic reconstruction process for

taking those videos and recovering the

three-dimensional sort of reality of

what was around that car so basically

think of doing like a three-dimensional

reconstruction as an offline thing and

then understanding that okay there's 10

seconds of video this is what we saw and

therefore here's all the lane last cars

and so on and then once you have that

annotation you can train your neural

Nets to imitate it and how difficult is

the reconstruct the 3D reconstruction

it's difficult but it can be done so

there's so the there's overlap between

the cameras and you do the

Reconstruction and there's uh

perhaps if there's any inaccuracy so

that's caught in The annotation step

uh yes the nice thing about The

annotation is that it is fully offline

you have infinite time you have a chunk

of one minute and you're trying to just

offline in a super computer somewhere

figure out where were the positions of

all the cars all the people and you have

your full one minute of video from all

the Angles and you can run all the

neural Nets you want and they can be

very efficient massive neural Nets there

can be neural Nets that can't even run

in the car later at this time so they

can be even more powerful neurons than

what you can eventually deploy so you

can do anything you want

three-dimensional reconstruction neural

Nets uh anything you want just to

recover that truth and then you

supervise that truth

what have you learned you said no

mistakes about humans

doing annotation because I assume humans

are uh there's like a range of things

they're good at in terms of clicking

stuff on screen it's not how interesting

is that to you of a problem of designing

an annotator where humans are accurate

enjoy it like what are they even the

metrics are efficient or productive all

that kind of stuff yeah so uh I grew The

annotation team at Tesla from basically

zero to a thousand uh while I was there

that was really interesting you know my

background is a PhD student researcher

so growing that common organization was

pretty crazy uh but uh yeah I think it's

extremely interesting and part of the

design process very much behind the

autopilot as to where you use humans

humans are very good at certain kinds of

annotations they're very good for

example at two-dimensional annotations

of images they're not good at annotating

uh cars over time in three-dimensional

space very very hard and so that's why

we were very careful to design the tasks

that are easy to do for humans versus

things that should be left to the

offline tracker like maybe the maybe the

computer will do all the triangulation

and 3D reconstruction but the human will

say exactly these pixels of the image

are car exactly these pixels are human

and so co-designing the the data

annotation pipeline was very much bread

and butter was what I was doing daily do

you think there's still a lot of open

problems in that space

um just in general annotation where the

stuff the machines are good at machines

do and the humans do what they're good

at and there's maybe some iterative

process right I think to a very large

extent we went through a number of

iterations and we learned a ton about

how to create these data sets I'm not

seeing big open problems like originally

when I joined I was like I was really

not sure how this would turn out yeah

but by the time I left I was much more

secure in actually we sort of understand

the philosophy of how to create these

data sets and I was pretty comfortable

with where that was at the time so what

are strengths and limitations of cameras

for the driving test in your

understanding when you formulate the

driving task as a vision task with eight

cameras

you've seen that the entire you know

most of the history of the computer

vision field when it has to do with

neural networks what just if you step

back what are the strengths and

limitations of pixels of using pixels to

drive yeah pixels I think are a

beautiful sensory beautiful sensor I

would say the thing is like cameras are

very very cheap and they provide a ton

of information ton of bits uh so it's uh

extremely cheap sensor for a ton of bits

and each one of these bits as a

constraint on the state of the world and

so you get lots of megapixel images uh

very cheap and it just gives you all

these constraints for understanding

what's actually out there in the world

so vision is probably the highest

bandwidth sensor

it's a very high bandwidth sensor and um

I love that pixels it is a is a

constraint on the world This is highly

complex

uh high bandwidth constraint in the

world on the stage of the world that's

fascinating it's not just that but again

this real real importance of

it's the sensor that humans use

therefore everything is designed for

that sensor yeah the text the writing

the flashing signs everything is

designed for vision and so and you just

find it everywhere and so that's why

that is the interface you want to be in

um talking again about these Universal

interfaces and uh that's where we

actually want to measure the world as

well and then develop software uh for

that sensor but there's other

constraints on the state of the world

that humans use to understand the world

I mean Vision ultimately is the main one

but we're like we're like referencing

our understanding of human behavior and

some common sense

physics that could be inferred from

vision from from a perception

perspective but it feels like we're

using some kind of reasoning

to predict the world yeah not just the

pixels I mean you have a powerful prior

uh sorry right for how the world evolves

over time Etc so it's not just about the

likelihood term coming up from the data

itself telling you about what you are

observing but also the prior term of

like where where are the likely things

to see and how do they likely move and

so on and the question is how complex is

the uh

the the range of possibilities that

might happen in the driving task right

that's still is is that to you still an

open problem of how difficult is driving

like philosophically speaking

like do you all the time you've worked

on driving do you understand how hard

driving is yeah driving is really hard

because it has to do with the

predictions of all these other agents

and the theory of mind and you know what

they're gonna do and are they looking at

you are they where are they looking what

are they thinking yeah there's a lot

that goes there at the at the full tail

of you know the the expansion of the

nines that we have to be comfortable

with eventually the final problems are

of that form I don't think those are the

problems that are very common uh I think

eventually they're important but it's

like really in the tail end in the tail

and the rare edge cases

from the vision perspective what are the

toughest parts of the vision problem of

driving

um

well basically the sensor is extremely

powerful but you still need to process

that information

um and so going from brightnesses of

these pixel values to hey here the

three-dimensional world is extremely

hard and that's what the neural networks

are fundamentally doing and so

um the difficulty really is in just

doing an extremely good job of

engineering the entire pipeline uh the

entire data engine having the capacity

to train these neural nuts having the

ability to evaluate the system and

iterate on it uh so I would say just

doing this in production at scale is

like the hard part it's an execution

problem so the data engine but also the

um the sort of deployment of the system

such that has low latency performance so

it has to do all these steps yeah for

the neural net specifically just making

sure everything fits into the chip on

the car yeah and uh you have a finite

budget of flops that you can perform and

uh and memory bandwidth and other

constraints and you have to make sure it

flies and you can squeeze in as much

compute as you can into the tiny what

have you learned from that process

because it maybe that's one of the

bigger like new things coming from a

research background

where there's there's a system that has

to run under heavily constrained

resources right has to run really fast

what what kind of insights have you uh

learned from that

yeah I'm not sure if it's if there's too

many insights you're trying to create a

neural net that will fit in what you

have available and you're always trying

to optimize it and we talked a lot about

it on the AI day and uh basically the

the triple backflips that the team is

doing to make sure it all fits and

utilizes the engine uh so I think it's

extremely good engineering

um and then there's also all kinds of

little insights peppered in on how to do

it properly let's actually zoom out

because I don't think we talked about

the data engine the entirety of the

layout of this idea that I think is just

beautiful with humans in the loop can

you describe the data engine

yeah the data engine is what I call the

almost biological feeling like process

by which you uh perfect the training

sets for these neural networks

um so because most of the programming

now is in the level of these data sets

and make sure they're large diverse and

clean oh basically you have a data set

that you think is good you train your

neural net you deploy it and then you

observe how well it's performing and

you're trying to uh always increase the

quality of your data set so you're

trying to catch scenarios basically

there are basically rare and uh it is in

these scenarios that the neural Nets

will typically struggle in because they

weren't told what to do in those rare

cases in the data set but now you can

close the loop because if you can now

collect all those at scale you can then

feed them back into the Reconstruction

process I described and uh reconstruct

the truth in those cases and add it to

the data set and so the whole thing ends

up being like a staircase of improvement

of perfecting your training set and you

have to go through deployments so that

you can mine uh the parts that are not

yet represented well in the data set so

your data set is basically imperfect it

needs to be diverse it has pockets there

are missing and you need to pad out the

pockets you can sort of think of it that

way

in the data what role do humans play in

this so what's the uh this biological

system like a human body is made up of

cells what what role like how do you

optimize the human uh system the the

multiple Engineers collaborating

figuring out what to focus on what to

contribute which which task to optimize

in this neural network

uh who's in charge of figuring out which

task needs more data

can you speak to the hyper parameters

the human uh system right it really just

comes down to extremely good execution

from an engineering team and does what

they're doing they understand

intuitively the philosophical insights

underlying the data engine and the

process by which the system improves and

uh how to again like delegate the

strategy of the data collection and how

that works and then just making sure

it's all extremely well executed and

that's where most of the work is is not

even the philosophizing or the research

or the ideas of it it's just extremely

good execution it's so hard when you're

dealing with data at that scale so your

role in the data engine executing well

on it it is difficult and extremely

important is there a priority of like uh

like a vision board of saying like

we really need to get better at stop

lights

yeah like the the prioritization of

tasks is that essentially and that comes

from the data that comes to um a very

large extent to what we are trying to

achieve in the product for a map where

we're trying to the release we're trying

to get out

um in the feedback from the QA team

worth it where the system is struggling

or not the things we're trying to

improve and the QA team gives some

signal some information

in aggregate about the performance of

the system in various conditions and

then of course all of us drive it and we

can also see it it's really nice to work

with the system that you can also

experience yourself you know it drives

you home it's is there some insight you

can draw from your individual experience

that you just can't quite get from an

aggregate statistical analysis of data

yeah it's so weird right yes it's it's

not scientific in a sense because you're

just one anecdotal sample yeah I think

there's a ton of uh it's a source of

truth it's your interaction with the

system yeah and you can see it you can

play with it you can perturb it you can

get a sense of it you have an intuition

for it I think numbers just like have a

way of numbers and plots and graphs are

you know much harder yeah it hides a lot

of it's like if you train a language

model

it's a really powerful way is by you

interacting with it yeah 100 try to

build up an intuition yeah I think like

Elon also like he always wanted to drive

the system himself he drives a lot and

uh I'm gonna say almost daily so uh he

also sees this as a source of Truth you

driving the system uh and it performing

and yeah so what do you think tough

questions here uh so Tesla last year

removed radar from um from the sensor

suite and now just announced that it's

going to remove all ultrasonic sensors

relying solely on Vision so camera only

does that make the perception problem

harder or easier

I would almost reframe the question in

some way so the thing is basically you

would think that additional sensors by

the way can I just interrupt good I

wonder if a language model will ever do

that if you prompt it let me reframe

your question that would be epic this is

the wrong problem sorry it's like a

little bit of a wrong question because

basically you would think that these

sensors are an asset to you yeah but if

you fully consider the entire product in

its entirety

these sensors are actually potentially

reliability

because these sensors aren't free they

don't just appear on your car you need

something you need to have an entire

supply chain you have people procuring

it there can be problems with them they

may need replacement they are part of

the manufacturing process they can hold

back the line in production you need to

Source them you need to maintain them

you have to have teams that write the

firmware all of it and then you also

have to incorporate and fuse them into

the system in some way and so it

actually like bloats the organ the a lot

of it and I think Elon is really good at

simplify simplified best part is no part

and he always tries to throw away things

that are not essential because he

understands the entropy in organizations

and approach and I think uh in this case

the cost is high and you're not

potentially seeing it if you're just a

computer vision engineer and I'm just

trying to improve my network and you

know is it more useful or less useful

how useful is it and the thing is if

once you consider the full cost of a

sensor it actually is potentially a

liability and you need to be really sure

that it's giving you extremely useful

information in this case we looked at

using it or not using it and the Delta

was not massive and so it's not useful

is it also blow in the data engine like

having more sensors

is a distraction and these sensors you

know they can change over time for

example you can have one type of say

radar you can have other type of radar

they change over time I suddenly need to

worry about it now suddenly you have a

column in your sqlite telling you oh

which sensor type was it and they all

have different distributions and then uh

they can they just they contribute noise

and entropy into everything and they

bloat stuff and also organizationally

has been really fascinating to me that

it can be very distracting

um if you if all if you only want to get

to work is Vision all the resources are

on it and you're building out a data

engine and you're actually making

forward progress because that is the the

sensor with the most bandwidth the most

constraints on the world and you're

investing fully into that and you can

make that extremely good if you're uh

you're only a finite amount of sort of

spend of focus across different facets

of the system and uh this kind of

reminds me of Rich Sutton's a bitter

lesson it just seems like simplifying

the system yeah

in the long run now of course you don't

know what the long run it seems to be

always the right solution yeah yes in

that case it was 4rl but it seems to

apply generally across all systems that

do computation yeah so where uh what do

you think about the lidar as a crutch

debate

uh the battle between point clouds and

pixels

yeah I think this debate is always like

slightly confusing to me because it

seems like the actual debate should be

about like do you have the fleet or not

that's like the really important thing

about whether you can achieve a really

good functioning of an AI system at this

scale so data collection systems yeah do

you have a fleet or not it's

significantly more important whether you

have lidar or not it's just another

sensor

um and uh

yeah I think similar to the radar

discussion basically I um

but yeah I don't think it it um

basically doesn't offer extra extra

information is extremely costly it has

all kinds of problems you have to worry

about it you have to calibrate it Etc it

creates bloat and entropy you have to be

really sure that you need this uh this

um sensor in this case I basically don't

think you need it and I think honestly I

will make a stronger statement I think

the others some of the other uh

companies are using it are probably

going to drop it yeah so you have to

consider the sensor in the full

in considering can you build a big Fleet

that collects a lot of data and can you

integrate that sensor with that that

data and that sensor into a data engine

that's able to quickly find different

parts of the data that then continuously

improves whatever the model that you're

using yeah another way to look at it is

like vision is necessary in a sense that

uh the drive the world is designed for

human visual consumption so you need

vision is necessary and then also it is

sufficient because it has all the

information that you that you need for

driving and humans obviously is a vision

to drive so it's both necessary and

sufficient so you want to focus

resources and you have to be really sure

if you're going to bring in other

sensors you could you could you could

add sensors to Infinity at some point

you need to draw the line and I think in

this case you have to really consider

the full cost of any One sensor that

you're adopting and do you really need

it and I think the answer in this case

is no so what do you think about the

idea of the that the other companies

are forming high resolution maps and

constraining heavily the geographic

regions in which they operate is that

approach not in your in your view

um not going to scale over time to the

entirety of the United States I think

I'll take two as you mentioned like they

pre-map all the environments and they

need to refresh the map and they have a

perfect centimeter level accuracy map of

everywhere they're going to drive it's

crazy how are you going to

when we're talking about autonomy

actually changing the world we're

talking about the deployment

on a on a global scale of autonomous

systems for transportation and if you

need to maintain a centimeter accurate

map for Earth or like for many cities

and keep them updated it's a huge

dependency that you're taking on huge

dependency

it's a massive massive dependency and

now you need to ask yourself do you

really need it

and humans don't need it

um right so it's it's very useful to

have a low-level map of like okay the

connectivity of your road you know that

there's a fork coming up when you drive

an environment you sort of have that

high level understanding it's like a

small Google Map and Tesla uses Google

Map like similar kind of resolution

information in the system but it will

not pre-map environments to send me a

level accuracy it's a crutch it's a

distraction it costs entropy and it

diffuses the team it dilutes the team

and you're not focusing on what's

actually necessary which is the computer

vision problem

what did you learn about machine

learning about engineering about life

about yourself as one human being from

working with Elon Musk

I think the most I've learned is about

how to sort of run organizations

efficiently and how to

create efficient organizations and how

to fight entropy in an organization so

human Engineering in the fight against

entropy yeah there's a there's a I think

Elon is a very efficient warrior in the

fight against entropy in organizations

what is the entropy in an organization

look like exactly it's process it's

it's process and inefficiencies and that

kind of stuff yeah meetings he hates

meetings he keeps telling people to skip

meetings if they're not useful

um he basically runs the world's biggest

uh startups I would say uh Tesla SpaceX

are the world's biggest startups Tesla

actually has multiple startups I think

it's better to look at it that way and

so I think he's he's extremely good at

uh at that and uh yeah he's a very good

intuition for streamline processes

making everything efficient uh best part

is no part uh simplifying focusing

um and just kind of removing barriers uh

moving very quickly making big moves all

this is a very startupy sort of seeming

things but at scale so strong drive to

simplify for me from your perspective I

mean that

um that also probably applies to just

designing systems and machine learning

and otherwise yeah like simplify

simplify yes

what do you think is the secret to

maintaining the startup culture in a

company that grows is there

can you introspect that

I do think you need someone in a

powerful position with a big hammer like

Elon who's like the cheerleader for that

idea and ruthless ruthlessly pursues it

if no one has a big enough Hammer

everything turns into committees

democracy within the company uh process

talking to stakeholders decision making

just everything just crumbles yeah if

you have a big person who's also really

smart and has a big hammer things move

quickly so you said your favorite scene

in interstellar is the intense docking

scene with the AI and Cooper talking

saying uh Cooper what are you doing

docking it's not possible no it's

necessary

such a good line by the way just so many

questions there why in AI

in that scene presumably is supposed to

be

able to compute a lot more than the

human is saying it's not optimal why the

human I mean that's a movie but

shouldn't they AI know much better than

the human

anyway uh what do you think is the value

of setting seemingly impossible goals

so like uh

our initial intuition which seems like

something that

you have taken on that Elon espouses

that where the initial intuition of the

community might say this is very

difficult and then you take it on anyway

with a crazy deadline you're just from a

human engineering perspective

um

uh have you seen the value of that

I wouldn't say that setting impossible

goals exactly is is a good idea but I

think setting very ambitious goals is a

good idea I think there's a what I call

sublinear scaling of difficulty uh which

means that 10x problems are not 10x hard

usually 10x 10x harder problem is like 2

or 3x harder to execute on because if

you want to actually like if you want to

improve the system by 10 it costs some

amount of work and if you want to 10x

improve the system it doesn't cost you

know 100x amount of the work and it's

because you fundamentally change the

approach and it if you start with that

constraint then some approaches are

obviously dumb and not going to work and

it forces you to reevaluate

um and I think it's a very interesting

way of approaching problem solving but

it requires a weird kind of thinking

it's just going back to your like PhD

days it's like how do you think which

ideas in in the machine Learning

Community are solvable yes it's uh it

requires what is that I mean there's the

cliche of first prince people's thinking

but like it requires to basically ignore

what the community is saying because

doesn't the community doesn't a

community in science usually draw lines

of what isn't isn't possible right and

like it's very hard to break out of that

without going crazy yep I mean I think a

good example here is you know the Deep

learning revolution in some sense

because you could be in computer vision

at that time when during the Deep

learning sort of revolution of 2012 and

so on uh you could be improving your

computer vision stack by 10 or we can

just be saying actually all this is

useless and how do I do 10x better

computer vision well it's not probably

by tuning a hog feature detector I need

a different approach

um I need something that is scalable

going back to uh Richard Sutton's um and

understanding sort of like the

philosophy of the uh bitter lesson and

then being like actually I need a much

more scalable system like a neural

network that in principle works and then

having some deep Believers that can

actually execute on that mission and

make it work so that's the 10x solution

what do you think is the timeline to

solve the problem of autonomous driving

this still in part an open question

yeah I think the tough thing with

timelines of self-driving obviously is

that no one has created self-driving

yeah so it's not like what do you think

is a timeline to build this bridge well

we've built million Bridges before

here's how long that takes it's it you

know it's uh no one has built autonomy

it's not obvious uh some parts turn out

to be much easier than others so it's

really hard to forecast you do your best

based on trend lines and so on and based

on intuition but that's why

fundamentally it's just really hard to

forecast this no one has even still like

being inside of it is hard to uh to do

yes some things turn out to be much

harder and some things turn out to be

much easier

do you try to avoid making forecasts

because like Elon doesn't avoid them

right and heads of car companies in the

past have not avoided it either uh Ford

and other places have made predictions

that we're going to solve at level four

driving by 2020 2021 whatever and now

they're all kind of Backtrack on that

prediction

IU as a

as an AI person

do you free yourself privately make

predictions or do they get in the way of

like your actual ability to think about

a thing

yeah I would say like what's easy to say

is that this problem is tractable and

that's an easy prediction to make

extractable it's going to work yes it's

just really hard some things turn out to

be harder than some things turn out to

be easier uh so uh but it definitely

feels tractable and it feels like at

least the team at Tesla which is what I

saw internally is definitely on track to

that how do you form

a uh strong representation that allows

you to make a prediction about

tractability so like you're the leader

of a lot a lot of humans

you have to kind of say this is actually

possible

like how do you build up that intuition

it doesn't have to be even driving it

could be other tasks it could be um and

I wonder what difficult tasks did you

work on in your life I mean

classification

achieving certain just an image that

certain level of superhuman level

performance yeah expert intuition

it's just intuition it's belief

so just like thinking about it long

enough like studying looking at sample

data like you said driving

uh my intuition has really flawed on

this like I don't have a good intuition

about tractability it could be either it

could be anything it could be solvable

like uh you know the driving task could

could be simplified into something quite

trivial like uh the solution to the

problem would be quite trivial and at

scale more and more cars driving

perfectly

might make the problem much easier Yeah

the more cars you have driving like

people learn how to drive correctly not

correctly but in a way that's more

optimal for a heterogeneous system of

autonomous and semi-autonomous and

manually driven cars that could change

stuff then again also I've spent a

ridiculous number of hours just staring

at pedestrians crossing streets thinking

about humans and it feels like the way

we use our eye contact

it sends really strong signals and

there's certain quirks and edge cases of

behavior and of course a lot of the

fatalities that happen have to do with

drunk driving and

um both on The Pedestrian side and the

driver's side so there's that problem of

driving at night and all that kind of

yeah so I wonder you know it's like the

space

of possible solution to autonomous

driving includes so many human factor

issues

that it's almost impossible to predict

there could be super clean nice

Solutions yeah I would say definitely

like to use a game analogy there's some

fog of War but you definitely also see

the frontier of improvement and you can

measure historically how much you've

made progress and I think for example at

least what I've seen in uh roughly five

years at Tesla when I joined it barely

kept laying on the highway I think going

up from Palo Alto to SF was like three

or four interventions anytime the road

would do anything geometrically or turn

too much it would just like not work and

so going from that to like a pretty

competent system in five years and

seeing what happens also under the hood

and what the scale which the team is

operating now with respect to data and

compute and everything else uh is just a

massive progress

so there's a you're climbing a mountain

and it's fog but you're making a lot of

progress fog you're making progress and

you see what the next directions are and

you're looking at some of the remaining

challenges and they're not like uh

they're not perturbing you and they're

not changing your philosophy and you're

not contorting yourself you're like

actually these are the things that we

still need to do yeah the fundamental

components of solving the problems seem

to be there for the data engine to the

compute to the the computer on the car

to the compute for the training all that

kind of stuff

so you've done

uh over the years you've been a test

you've done a lot of amazing uh

breakthrough ideas and Engineering all

of it

um from the data engine to The Human

Side all of it can you speak to why you

chose to leave Tesla basically as I

described that ran I think over time

during those five years I've kind of uh

gotten myself into a little bit of a

managerial position most of my days were

you know meetings and growing the

organization and making decisions about

sort of high level strategic decisions

about the team and what it should be

working on and so on and uh

it's kind of like a corporate executive

role and I can do it I think I'm okay at

it but it's not like fundamentally what

I what I enjoy and so I think uh when I

joined uh there was no computer vision

team because Tesla was just going from

the transition of using mobileye a

third-party vendor for all of its

computer vision to having to build its

computer vision system so when I showed

up there were two people training deep

neural networks and they were training

them at a computer at their at their

legs like uh

kind of basic classification task yeah

and so

I kind of like grew that into what I

think is a fairly respectable deep

learning team a massive compute cluster

a very good um data annotation

organization and uh I was very happy

with where that was it became quite

autonomous and so I kind of stepped away

and I uh you know I'm very excited to do

much more technical things again

yeah and kind of like we focus on AGI

what was this soul searching like

because you took a little time off and

think like what um how many mushrooms

did you take no I'm just uh I mean what

what was going through your mind the

human lifetime is finite yeah he did a

few incredible things you're you're one

of the best teachers of AI in the world

you're one of the best and I don't mean

that I mean that in the best possible

way you're one of the best tinkerers in

the AI world meaning like understanding

the fundamental fundamentals of how

something works by building it from

scratch and playing with it with the

basic intuitions it's like Einstein

feinmen were all really good at this

kind of stuff like a small example of a

thing to to play with it to try to

understand it uh so that and obviously

now with us that you help build a team

of machine learning

um uh like engineers and a system that

actually accomplishes something in the

real world so given all that like what

was the soul searching like

well it was hard because obviously I

love the company a lot and I love I love

Elon I love Tesla I want um

it was hard to leave I love the team

basically

um but

yeah I think actually I would

potentially like interested in

revisiting it maybe coming back at some

point uh working in Optimus working in

AGI at Tesla uh I think Tesla is going

to do incredible things it's basically

like

uh it's a massive large-scale robotics

kind of company with a ton of In-House

talent for doing really incredible

things and I think uh

human robots are going to be amazing I

think autonomous transportation is going

to be amazing all this is happening at

Tesla so I think it's just a really

amazing organization so being part of it

and helping it along I think was very

basically I enjoyed that a lot yeah it

was basically difficult for those

reasons because I love the company uh

but you know I'm happy to potentially at

some point come back for act two but I

felt like at this stage

I built the team it felt autonomous and

uh I became a manager and I wanted to do

a lot more technical stuff I wanted to

learn stuff I wanted to teach stuff and

uh I just kind of felt like it was a

good time for for a change of pace a

little bit what do you think is uh the

best movie sequel of all time speaking

of part two because like because most of

them suck in movie sequels yeah and you

tweet about movies so just in a tiny

tangent is there what's your what was

like a favorite movie sequel

Godfather Part Two

um are you a fan of Godfather because

you didn't even tweet or mention the

Godfather yeah I don't love that movie I

know it hasn't edit that out we're gonna

edit out the hate towards the Godfather

how dare you just I think I will make a

strong statement I don't know why I

don't know why but I basically don't

like any movie before 1995

something like that didn't you mention

Terminator two okay okay that's like uh

Terminator 2 was a little bit later 1990

no I think Terminator 2 was a name I

like Terminator one as well so okay so

like a few exceptions but by and large

for some reason I don't like movies

before 1995 or something they feel very

slow the camera is like zoomed out it's

boring it's kind of naive it's kind of

weird and also Terminator was very much

ahead of its time yes and The Godfather

there's like no AGI

[Laughter]

I mean but you have Good Will Hunting

was one of the movies you mentioned and

that doesn't have any AGI either I guess

that's mathematics yeah I guess

occasionally I do enjoy movies that

don't feature or like Anchorman that has

no that's the increment it's so good I

don't understand

um speaking of AGI because I don't

understand why Will Ferrell is so funny

it doesn't make sense it doesn't compute

there's just something about him and

he's a singular human because you don't

get that many comedies

these days and I wonder if it has to do

about the culture uh or the like the

machine of Hollywood or does it have to

do with just we got lucky with certain

people and comedy it came together

because he is a singular human

that was a ridiculous tangent I

apologize but you mentioned humanoid

robot so what do you think about Optimus

about Tesla bot do you think we'll have

robots in the factory in in the home in

10 20 30 40 50 years yeah I think it's a

very hard project I think it's going to

take a while but who else is going to

build humano robots at scale yeah and I

think it is a very good form factor to

go after because like I mentioned the

the world is designed for humanoid form

factor these things would be able to

operate our machines they would be able

to sit down in chairs uh potentially

even drive cars uh basically the world

is designed for humans that's the form

factor you want to invest into and make

work over time uh I think you know

there's another school of thought which

is okay pick a problem and design a

robot to it but actually designing a

robot and getting a whole data engine

and everything behind it to work is

actually an incredibly hard problem so

it makes sense to go after General

interfaces that uh okay they are not

perfect for any one given task but they

actually have the generality of just

with a prompt with English able to do

something across and so I think it makes

a lot of sense to go after a general uh

interface

um in the physical world and I think

it's a very difficult project I think

it's going to take time but I see no

other no other company that can execute

on that Vision I think it's going to be

amazing like uh basically physical labor

like if you think transportation is a

large Market try physical labor insane

well but it's not just physical labor to

me the thing that's also exciting is the

social robotics so the the relationship

we'll have on different levels with

those robots that's why I was really

excited to see Optimus like um people

have criticized me for the excitement

but I've I've worked with uh uh a lot of

research Labs that do humanoid legged

robots Boston Dynamics unitary a lot

there's a lot of companies that do

legged robots but that's the the

Elegance of the movement is a tiny tiny

part of the big picture so integrating

the two big exciting things to me about

Tesla doing humanoid or any Lego robots

is

clearly integrating it into the data

engine so the the data engine aspect so

the actual intelligence for the

perception and the and the control and

the planning and all that kind of stuff

integrating into this huge the fleet

that you mentioned right

um and then speaking of Fleet the second

thing is the mass manufacturers Just

knowing

uh culturally

uh driving towards a simple robot that's

cheap to produce at scale yeah and doing

that well having experience to do that

well that changes everything that's why

that's a very different culture and

style than Boston Dynamics who by the

way those those robots are just the the

way they move it's uh like it'll be a

very long time before Tesla could

achieve the smoothness of movement but

that's not what it's about it's it's

about uh it's about the entirety of the

system like we talked about the data

engine and the fleet that's super

exciting even the initial sort of models

uh but that too was really surprising

that in a few months you can get a

prototype yep and the reason that

happened very quickly is as you alluded

to there's a ton of copy based from

what's happening in the autopilot yes a

lot the amount of expertise that like

came out of the Woodworks at Tesla for

building the human robot was incredible

to see like basically Elon said at one

point we're doing this and then

next day basically like all these CAD

models started to appear and people talk

about like the supply chain and

Manufacturing and uh people showed up

with like screwdrivers and everything

like the other day and started to like

put together the body and I was like

whoa like all these people exist at

Tesla and fundamentally building a car

is actually not that different from

building a robot the same and that is

true uh not just for uh the hardware

pieces and also let's not forget

Hardware not just for a demo but

manufacturing of that Hardware at scale

is like a whole different thing but for

software as well basically this robot

currently thinks it's a car

uh it's gonna have a midlife crisis at

some point it thinks it's a car

um some of the earlier demos actually we

were talking about potentially doing

them outside in the parking lot because

that's where all of the computer vision

that was like working out of the box

instead of like in inside

um but all the operating system

everything just copy pastes uh computer

vision mostly copy paste I mean you have

to retrain the neural Nets but the

approach and everything in data engine

and offline trackers and the way we go

about the occupancy tracker and so on

everything copy paste you just need to

retrain the neural Lots uh and then the

planning control of course has to change

quite a bit but there's a ton of copy

paste from what's happening at Tesla and

so if you were to if you were to go with

goal of like okay let's build a million

human robots and you're not Tesla that's

that's a lot to ask if you're a Tesla

it's actually like

it's not it's not that crazy and then

the the follow-up question is and how

difficult just like we're driving how

difficult is the manipulation task uh

such that it can have an impact at scale

I think

depending on the context the really nice

thing about robotics is the um unless

you do a manufacturing that kind of

stuff is there's more room for error

driving is so safety critical and so

that and also time critical robot is

allowed to move slower which is nice yes

I think it's going to take a long time

but the way you want to structure the

development is you need to say okay it's

going to take a long time how can I set

up the uh product development roadmap so

that I'm making Revenue along the way

I'm not setting myself up for a zero one

loss function where it doesn't work

until it works you don't want to be in

that position you want to make it useful

almost immediately and then you want to

slowly deploy it uh and uh at scale and

you want to set up your data engine your

improvement Loops the Telemetry the

evaluation the harness and everything

and you want to improve the product over

time incorrectly and you're making

Revenue along the way that's extremely

important because otherwise you cannot

build these these uh large undertakings

just like don't make sense economically

and also from the point of view of the

team working on it they need the

dopamine along the way they're not just

going to make a promise about this being

useful this is going to change the world

in 10 years when it works this is not

where you want to be you want to be in a

place like I think autopilot is today

where it's offering increased safety and

um and uh convenience of driving today

people pay for it people like it people

purchase it and then you also have the

greater mission that you're working

towards

and you see that so the dopamine for the

team that that was a source of Happiness

yes you're deploying this people like it

people drive it people pay for it they

care about it there's all these YouTube

videos your grandma drives it she gives

you feedback people like it people

engage with it you engage with it huge

do uh people that drive Teslas like

recognize you and give you love like uh

like hey thanks for the for the this

nice feature that it's doing yeah I

think the tricky thing is like some

people really love you some people

unfortunately like you're working on

something that you think is extremely

valuable useful Etc some people do hate

you there's a lot of people who like

hate me and the team and whatever the

whole project and I think they have

Tesla drivers uh many cases they're not

actually yeah that's that's actually

makes me sad about humans or the current

the ways that humans interact I think

that's actually fixable I think humans

want to be good to each other I think

Twitter and social media is part of the

mechanism that actually somehow makes

the negativity more viral but it doesn't

deserve like disproportionately uh add

of like a viral viral boost yeah

negativity but like I wish people would

just get excited about uh so suppress

some of the jealousy some of the ego and

just get excited for others and then

there's a Karma aspect to that you get

excited for others they'll get excited

for you same thing in Academia if you're

not careful there's a like a dynamical

system there if you if you think of in

silos and get jealous of somebody else

being successful that actually perhaps

counterintuitively uh leads the less

productivity of you as a community and

you individually I feel like if you keep

celebrating others that actually makes

you more successful yeah I think people

haven't in depending on the industry

haven't quite learned that yet yeah some

people are also very negative and very

vocal so they're very prominently

featured but actually there's a ton of

people who are cheerleaders but they're

silent cheerlead cheerleaders and uh

when you talk to people just in the

world they will all tell you it's

amazing it's great especially like

people who understand how difficult it

is to get this stuff working like people

who have built products and makers

entrepreneur entrepreneurs like make

making this work and changing something

is is incredibly hard those people are

more likely to cheerlead you well one of

the things that makes me sad is some

folks in the robotics Community uh don't

do the cheerleading and they should

there's uh because they know how

difficult it is well they actually

sometimes don't know how difficult it is

to create a product at scale right they

actually deploy in the real world a lot

of the

development of robots and AI systems is

done on very specific small benchmarks

um and as opposed to real world

conditions yes

yeah I think it's really hard to work on

robotics in academic setting or AI

systems that apply in the real world you

you've criticized you uh flourished and

loved for time the imagenet the famed

image in that data set and I've recently

had some words uh of criticism that the

academic research ml Community gives a

little too much love still to the

imagenet or like those kinds of

benchmarks can you speak to the

strengths and weaknesses of data sets

used in machine learning research

actually I don't know that I recall the

specific instance where I was uh unhappy

or criticizing imagenet I think imagenet

has been extremely valuable uh it was

basically a benchmark that allowed the

Deep Learning Community to demonstrate

that deep neural networks actually work

it was uh there's a massive value in

that um so I think imagenet was useful

but um basically it's become a bit of an

eminist at this point so eminist is like

the 228 by 28 grayscale digits there's

kind of a joke data set that everyone

like just crushes if there's no Papers

written on MNS though right maybe they

should have strong papers like papers

that focus on like how do we learn with

a small amount of data that kind of

stuff yeah I could see that being

helpful but not in sort of like Mainline

computer vision research anymore of

course I think the way I've heard you

somewhere maybe I'm just imagining

things but I think you said like image

that was a huge contribution to the

community for a long time and now it's

time to move past those kinds of well

image that has been crushed I mean you

know the error rates are

uh

yeah we're getting like 90 accuracy in

in one thousand classification way uh

prediction and I've seen those images

and it's like really high that's really

that's really good if I remember

correctly the top five error rate is now

like one percent or something given your

experience with a gigantic real world

data set would you like to see

benchmarks move in certain directions

that the research Community uses

unfortunately I don't think academics

currently have the next imagenet uh

We've obviously I think we've crushed

mnist we've basically kind of crushed

imagenet uh and there's no next sort of

big Benchmark that the entire Community

rallies behind and uses

um

you know for further development of

these networks uh yeah what it takes for

data set to Captivate the imagination of

everybody like where they all get behind

it that that could also need like a

viral like a leader right you know

somebody with popularity I mean that

yeah why did image of that take off

is there or is it just the accident of

History it was the right amount of

difficult uh it was the right amount of

difficult and simple and uh interesting

enough it just kind of like it was it

was the right time for that kind of a

data set

question from Reddit

uh what are your thoughts on the role

that synthetic data and game engines

will play in the future of neural net

model development

I think

um as neural Nets converge to humans

uh the value of simulation to neural

Nets will be similar to value of

simulation to humans

so people use simulation for uh

people use simulation because they can

learn something in that kind of a system

and without having to actually

experience it

um but are you referring to the

simulation we're doing our head no sorry

simulation I mean like video games or uh

you know other forms of simulation for

various professionals well so let me

push back on that because maybe their

simulation that we do in our heads like

simulate if I do this

what do I think will happen Okay that's

like internal simulation yeah internal

isn't that what we're doing let's

assuming before we act oh yeah but

that's independent from like the use of

uh simulation in the sense of like

computer games or using simulation for

training set creation or you know is it

independent or is it just Loosely

correlated because like uh

isn't that useful to do like um

counterfactual or like Edge case

simulation to like

you know what happens if there's a

nuclear war

what happens if there's you know like

those kinds of things yeah that's a

different simulation from like Unreal

Engine that's how I interpreted the

question uh so like

simulation of the average case

is that what's Unreal Engine what what

what what what do you mean by Unreal

Engine so

simulating a world yeah physics of that

world

why is that different like because you

also can add Behavior to that world and

you can try all kinds of stuff right

like you could throw all kinds of weird

things into it so Unreal Engine is not

just about similar I mean I guess it is

about submitting the physics of the

world it's also doing something with

that

yeah the graphics the physics and the

Agents that you put into the environment

and stuff like that yeah see I think you

I feel like you said that it's not that

important I guess for the future of AI

development is that is that correct to

interpret you that way uh I think

humans use uh simulators

for um humans use simulators and they

find them useful and so computers will

use simulators and find them useful

okay so you're saying it's not I I don't

use simulators very often I play a video

game every once in a while but I don't

think I derive any wisdom about my own

existence from from those video games

it's a momentary escape from reality

versus a source of wisdom about reality

so I don't so I think that's a very

polite way of saying simulation is not

that useful

yeah maybe maybe not I don't see it as

like a fundamental really important part

of like training neural Nets currently

uh but I think uh as neural Nets become

more and more powerful I think you will

need fewer examples to train additional

behaviors and uh simulation is of course

there's a domain Gap in a simulation

that's not the real world there's

slightly something different but uh with

a powerful enough neural net uh you need

um The Domain Gap can be bigger I think

because neural network will sort of

understand that even though it's not the

real world it like has all this high

level structure that I'm supposed to be

able to learn from so then you'll know

we'll actually

yeah you'll be able to Leverage

the synthetic data better yes by closing

the get better understanding in which

ways this is not real data exactly

uh right to do better questions next

time that was that was a question but

I'm just kidding all right um

so is it possible do you think speaking

of feminist to construct neural Nets and

training processes that require very

little data

so we've been talking about huge data

sets like the internet for training I

mean one way to say that is like you

said like the querying itself is another

level of training I guess and that

requires a little data yeah but do you

see any uh value in doing research and

kind of going down the direction of can

we use very little data to train to

construct a knowledge base 100 I just

think like at some point you need a

massive data set and then when you

pre-train your massive neural nut and

get something that you know is like a

GPT or something then you're able to be

very efficient at training any arbitrary

new task uh so a lot of these gpts you

know you can do tasks like sentiment

analysis or translation or so on just by

being prompted with very few examples

here's the kind of thing I want you to

do like here's an input sentence here's

the translation into German input

sentence translation to German input

sentence blank and the neural network

will complete the translation to German

just by looking at sort of the example

you've provided and so that's an example

of a very few shot uh learning in the

activations of the neural net instead of

the weights of the neural land and so I

think

basically uh just like humans neural

Nets will become very data efficient at

learning any other new task but at some

point you need a massive data set to

pre-train your network

to get that and probably we humans have

something like that do we do we have

something like that do we have a passive

in the background

background model constructing thing that

just runs all the time in a

self-supervised way we're not conscious

of it I think humans definitely I mean

obviously we have uh we learn a lot

during during our life span but also we

have a ton of Hardware that helps us

initialize initialization coming from

sort of evolution and so I think that's

also a really big a big component a lot

of people in the field I think they just

talk about the amounts of like seconds

and the you know that a person has lived

pretending that this is a table arasa

sort of like a zero initialization of a

neural net and it's not like you can

look at a lot of animals like for

example zebras zebras get born and they

see and they can run there's zero train

data in their lifespan they can just do

that so somehow I have no idea how

Evolution has found a way to encode

these algorithms and these neural net

initializations are extremely good to 80

CGS and I have no idea how this works

but apparently it's possible because

here's a proof by existence there's

something magical about going from a

single cell to an organism that is born

to the first few years of life I kind of

like the idea that the reason we don't

remember anything about the first few

years of our life is that it's a really

painful process like it's a very

difficult challenging

training process yeah like

intellectually like

and maybe yeah I mean I don't why don't

we remember any of that there might be

some crazy training going on and the

that maybe that's the background model

training that uh is is very painful and

so it's best for the system once it's

trained not to remember how it's

constructed I think it's just like the

hardware for long-term memory is just

not fully developed sure I kind of feel

like the first few years of uh of

infants is not actually like learning

it's brain maturing yeah

um we're born premature

um and there's a theory along those

lines because of the birth canal and the

swelling of the brain and so we're born

premature and then the first few years

we're just the brains maturing and then

there's some learning eventually

um

it's my current view on it what do you

think do you think neural Nets can have

long-term memory

like that approach is something like

humans do you think you know do you

think there needs to be another meta

architecture on top of it to add

something like a knowledge base that

learns facts about the world and all

that kind of stuff yes but I don't know

to what extent it will be explicitly

constructed

um it might take unintuitive forms where

you are telling the GPT like hey you

have a you have a declarative memory

bank to which you can store and retrieve

data from and whenever you encounter

some information that you find useful

just save it to your memory bank and

here's an example of something you have

retrieved and Heiser how you say it and

here's how you load from it you just say

load whatever you teach it in text in

English and then it might learn to use a

memory bank from from that oh so so the

neural net is the architecture for the

background model the the base thing and

then yeah everything else is just on top

of this it's not just a text right it's

you're giving it gadgets and gizmos so

uh you're teaching in some kind of a

special Language by which we can it can

save arbitrary information and retrieve

it at a later time and you're telling

about these special tokens and how to

arrange them to use these interfaces

it's like hey you can use a calculator

here's how you use it just do five three

plus four one equals and when equals is

there uh a calculator will actually read

out the answer and you don't have to

calculate it yourself and you just like

tell it in English this might actually

work do you think in that sense gato is

interesting the the Deep Mind system

that it's not just new language but

actually throws it all

uh in the same pile images actions all

that kind of stuff that's basically what

we're moving towards yeah I think so so

gato is uh is very much a kitchen sink

approach to like

um reinforcement learning lots of

different environments with a single

fixed Transformer model right

um I think it's a very sort of early

result in that in that realm but I think

uh yeah it's along the lines of what I

think things will eventually look like

right so this is the early days of a

system that eventually will look like

this like from a rigid Rich sudden

perspective yeah I'm not super huge fan

of I think all these interfaces that

like look very different

um I would want everything to be

normalized into the same API so for

example it's green pixels versus same

API instead of having like different

world environments at a very different

physics and Joint configurations and

appearances and whatever and you're

having some kind of special tokens for

different games that you can plug I'd

rather just normalize everything to a

single interface so it looks the same to

the neural net if that makes sense so

it's all going to be pixel based pong in

the end I think so

okay uh let me ask you about your own

personal life

a lot of people want to know you're one

of the most productive and brilliant

people in the history of AI what is a

productive day in the life of Andre

capathi look like

what time do you wake up because imagine

um some kind of dance between the

average productive day and a perfect

productive day so the perfect productive

day is the thing we strive

towards in the average is kind of what

it kind of converges to getting all the

mistakes and human eventualities and so

on yeah so what times you wake up are

you morning person I'm not a morning

person I'm a night owl for sure I think

stable or not that's semi-stable like a

eight or nine or something like that

during my PhD it was even later I used

to go to sleep usually at 3am I think uh

the am hours are are precious and very

interesting time to work because

everyone is asleep

um at 8 AM or 7 A.M the east coast is

awake so there's already activity

there's already some text messages

whatever there's stuff happening you can

go in like some news website and there's

stuff happening it's distracting uh at

3am everything is totally quiet and so

you're not going to be bothered and you

have solid chunks of time to do your

work

um so I like those periods Night Owl by

default and then I think like productive

time basically

um what I like to do is you need you

need to like build some momentum on the

problem without too much distraction and

um you need to load your Ram uh your

working memory with that problem

and then you need to be obsessed with it

when you're taking shower when you're

falling asleep you need to be obsessed

with the problem and it's fully in your

memory and you're ready to wake up and

work on it right there so there's a

scale of uh is this in a scale temporal

scale of a single day or a couple of

days a week a month so I can't talk

about one day basically in isolation

because it's a whole process when I want

to get when I want to get productive in

the problem I feel like I need a span of

a few days where I can really get in on

that problem and I don't want to be

interrupted and I'm going to just uh be

completely obsessed with that problem

and that's where I do most of my good

workouts

you've done a bunch of cool like little

projects in a very short amount of time

very quickly so that that requires you

just focusing on it yeah basically I

need to load my working memory with the

problem and I need to be productive

because there's always like a huge fixed

cost to approaching any problem uh you

know like I was struggling with this for

example at Tesla because I want to work

on like small side projects but okay you

first need to figure out okay I need to

SSH into my cluster I need to bring up a

vs code editor so I can like work on

this I need to I run into some stupid

error because of some reason like you're

not at a point where you can be just

productive right away you are facing

barriers and so it's about uh really

removing all that barrier and you're

able to go into the problem and you have

the full problem loaded in your memory

and somehow avoiding distractions of all

different forms like uh news stories

emails but also distractions from other

interesting projects that you previously

worked on are currently working on and

so on you just want to really focus your

mind and I mean I can take some time off

for distractions and in between but I

think it can't be too much uh you know

most of your day is sort of like spent

on that problem and then you know I

drink coffee I have my morning routine I

look at some news uh Twitter Hacker News

Wall Street Journal Etc

so basically you wake up you have some

coffee are you trying to get to work as

quickly as possible do you do taking

this diet of of like what the hell's

happening in the world first I am I do

find it interesting to know about the

world I don't know that it's useful or

good but it is part of my routine right

now so I do read through a bunch of news

articles and I want to be informed and

um I'm suspicious of it I'm suspicious

of the practice but currently that's

where I am Oh you mean suspicious about

the positive effect yeah of that

practice on your productivity and your

well-being my well-being psychologically

uh and also on your ability to deeply

understand the world because how there's

a bunch of sources of information you're

not really focused on deeply integrating

yeah it's a little bit distracting or

yeah in terms of a perfectly productive

day for how long of a stretch of time

in one session do you try to work and

focus on a thing it's a couple hours is

it one hours or 30 minutes is 10 minutes

I can probably go like a small few hours

and then I need some breaks in between

for like food and stuff and uh

yeah but I think like uh it's still

really hard to accumulate hours I was

using a Tracker that told me exactly how

much time I've spent coding any one day

and even on a very productive day I

still spent only like six or eight hours

yeah and it's just because there's so

much padding commute talking to people

food Etc there's like the cost of life

just living and sustaining and

homeostasis and just maintaining

yourself as a human is very high and and

there seems to be a desire within the

human mind to to uh to participate in

society that creates that padding yeah

because I yeah the most productive days

I've ever had is just completely from

start to finish just tuning out

everything yep and just sitting there

and then and then you could do more than

six and eight hours yeah is there some

wisdom about what gives you strength to

do like uh tough days of long Focus

yeah just like whenever I get obsessed

about a problem something just needs to

work something just needs to exist it

needs to exist and you so you're able to

deal with bugs and programming issues

and technical issues and uh design

decisions that turn out to be the wrong

ones you're able to think through all of

that given given that you want to think

to exist yeah it needs to exist and then

I think to me also a big factor is uh

you know are other humans are going to

appreciate it are they going to like it

that's a big part of my motivation if

I'm helping humans and they seem happy

they say nice things uh they tweet about

it or whatever that gives me pleasure

because I'm doing something useful so

like you do see yourself sharing it with

the world like with yes on GitHub with a

blog post or through videos yeah I was

thinking about it like suppose I did all

these things but did not share them I

don't think I would have the same amount

of motivation that I can build up you

enjoy the feeling of other people

uh gaining value and happiness from the

stuff you've created yeah

uh what about diet

is there I saw you playing with in

intermittent fast do you fast does that

help with everything

well the things you played what's been

most beneficial to the your ability to

mentally focus on a thing and just meant

the mental productivity and happiness

you still fast yeah it's so fast but I

do intermittent fasting but really what

it means at the end of the day is I skip

breakfast yeah so I do uh 18 6 roughly

by default when I'm in my steady state

if I'm traveling or doing something else

I will break the rules but in my steady

state I do 18 6 so I eat only from 12 to

6. not a hard Rule and I break it often

but that's my default and then um yeah

I've done a bunch of random experiments

for the most part right now uh where

I've been for the last year and a half I

want to say is I'm um plant-based or

planned forward I heard plant forward it

sounds better exactly I didn't actually

know the differences but it sounds

better in my mind but it just means I

prefer plant-based food and raw or

cooked or I prefer cooked uh and blunt

paste so plant-based

oh forgive me I don't actually know how

wide the category of plant entails

Wellness just means that you're not uh

and you can flex and uh you just prefer

to eat plants and you know you're not

making you're not trying to influence

other people and if someone is you come

to someone's house party and they serve

you a stake that they're really proud of

you will eat it yes right it's just not

judgment oh that's beautiful I mean

that's

um on the flip side of that but I'm very

sort of flexible have you tried doing

one meal a day uh I have uh accidentally

not consistently but I've accidentally

had that I don't I don't like it I think

it makes me feel uh not good it's too

it's too much too much of a hit yeah and

uh So currently I have about two meals a

day 12 and six I do that non-stop I'm

doing it now I'm doing one meal a day

okay so it's interesting it's a

interesting feeling have you ever fasted

longer than a day yeah I've done a bunch

of water fasts because I was curious

what happens uh anything interesting

yeah I would say so I mean you know

what's interesting is that you're hungry

for two days and then starting day three

or so you're not hungry it's like such a

weird feeling because you haven't eaten

in a few days and you're not hungry

isn't that weird it's really one of the

many weird things about human biology is

figure something out it finds finds

another source of energy or something

like that or uh relaxes the system I

don't know how yeah the body is like

you're hungry you're hungry and then it

just gives up it's like okay I guess

we're fasting now there's nothing and

then it just kind of like focuses on

trying to make you not hungry uh and you

know not feel the the damage of that and

uh trying to give you some space to

figure out the food situation

so are you still to this day most

productive uh at night I would say I am

but it is really hard to maintain my PhD

schedule

um especially when I was say working at

Tesla and so on it's a non-starter so

but even now like you know people want

to meet for

various events they Society lives in a

certain period of time and you sort of

have to like work so that's it's hard to

like do a social thing and then after

that return and do work yeah it's just

really hard

uh that's why I try to do social things

I try not to do too uh too much drinking

so I can return and continue doing work

um but a Tesla is there is there

conversions in Tesla but any any company

is there a convergence towards the

schedule or is there more

is that how humans behave when they

collaborate I need to learn about this

yeah do they try to keep a consistent

schedule you're all awake at the same

time I mean I do try to create a routine

and I try to create a steady state in

which I'm uh comfortable in uh so I have

a morning routine I have a day routine I

try to keep things to do a steady state

and um things are predictable and then

you can sort of just like your body just

sort of like sticks to that and if you

try to stress that a little too much it

will create uh you know when you're

traveling and you're dealing with jet

lag you're not able to really Ascend to

you know where you need to go yeah yeah

that's weird as humans with the habits

and stuff uh what are your thoughts on

work-life balance throughout a human

lifetime

so the testing part was known for sort

of pushing people to their limits

in terms of what they're able to do in

terms of what they're uh trying to do in

terms of how much they work all that

kind of stuff yeah I mean I will say

teslaq is still too much uh bad rep for

this because what's happening is Tesla

is a it's a bursting environment uh so I

would say the Baseline uh my only point

of reference is Google where I've

interned three times and I saw what it's

like inside Google and and deepmind

um I would say the Baseline is higher

than that but then there's a punctuated

equilibrium where once in a while

there's a fire and uh someone like

people work really hard and so it's

spiky and bursty and then all the

stories get collected about the bursts

yeah and then it gives the appearance of

like total insanity but actually it's

just a bit more intense environment and

there are fires and Sprints and so I

think uh you know definitely though I I

would say

um it's a more intense environment than

something you would get but you in your

person forget all of that just in your

own personal life

um what do you think about

the happiness of a human being a

brilliant person like yourself

about finding a balance between work and

life or is it such a thing not a good

thought experiment

yeah I think I think balance is good but

I also love to have Sprints that are out

of distribution and that's what I think

I've been pretty uh creative and

um as well Sprints out of distribution

means that most of the time

you have a yeah quote-unquote balance I

have balance most of the time yes I like

being obsessed with something once in a

while once in a while is what once a

week once a month once a year yeah

probably like say once a month or

something yeah and that's when we get a

new GitHub repo come on yeah that's when

you like really care about a problem it

must exist this will be awesome you're

obsessed with it and now you can't just

do it on that day you need to pay the

fixed cost of getting into the groove

and then you need to stay there for a

while and then Society will come and

they will try to mess with you and they

will try to distract you yeah yeah the

worst thing is like a person who's like

I just need five minutes of your time

yeah this is the cost of that is not

five minutes and Society needs to change

how it thinks about just five minutes of

your time right it's never it's never

just one minute it's just 30 it's just a

quick what's the big deal why are you

being so yeah no

uh what's your computer setup what uh

what's like the perfect are you somebody

that's flexible to no matter what laptop

four screens yeah uh or do you uh prefer

a certain setup that you're most

productive um I guess the one that I'm

familiar with is one large screen uh 27

inch

um and my laptop on the side with

operating system I do Max that's my

primary for all tasks I would say OS X

but when you're working on deep learning

everything as Linux your SSH into a

cluster and you're working remotely but

what about the actual development like

that using the IDE yeah you would use uh

I think a good way is you just run vs

code

um my favorite editor right now on your

Mac but you are actually you have a

remote folder through SSH

um so the actual files that you're

manipulating are on the cluster

somewhere else so what's the best IDE

uh vs code what else do people so I use

emacs still that's cool uh so it may be

cool I don't know if it's maximum

productivity

um so what what do you recommend in

terms of editors you worked with a lot

of software Engineers editors for

python C plus plus machine learning

applications I think the current answer

is vs code currently I believe that's

the best

um IDE it's got a huge amount of

extensions it has a GitHub co-pilot

um uh integration which I think is very

valuable what do you think about the the

co-pilot integration I was actually uh I

got to talk a bunch with Guido and

Rossum who's a creative Python and he

loves Coppola he like he programs a lot

with it yeah uh do you

yeah he's copilot I love it and uh it's

free for me but I would pay for it yeah

I think it's very good and the utility

that I found with it was is in is it I

would say there is a learning curve and

you need to figure out when it's helpful

and when to pay attention to its outputs

and when it's not going to be helpful

where you should not pay attention to it

because if you're just reading its

suggestions all the time it's not a good

way of interacting with it but I think I

was able to sort of like mold myself to

it I find it's very helpful number one

in copy paste and replace some parts so

I don't um when the pattern is clear

it's really good at completing the

pattern and number two sometimes it

suggests apis that I'm not aware of so

it tells you about something that you

didn't know so and that's an opportunity

to discover and you it's an opportunity

to see I would never take copilot code

AS given I almost always uh copy a copy

this into a Google Search and you see

what this function is doing and then

you're like oh it's actually actually

exactly what I need thank you copilot so

you learned something so it's in part a

search engine apart maybe getting the

exact syntax correctly that once you see

it yep it's that NP hard thing it's like

once you see it you know yes exactly

correct exactly you yourself you can

struggle you can verify efficiently but

you you can't generate efficiently and

copilot really I mean it's it's

autopilot for programming right and

currently it's doing the link following

which is like the simple copy paste and

sometimes suggest uh but over time it's

going to become more and more autonomous

and so the same thing will play out in

not just coding but actually across many

many different things probably but

coding is an important one right like

writing programs yeah what how do you

see the future of that developing uh the

program synthesis like being able to

write programs that are more and more

complicated because right now it's human

supervised in interesting ways yes like

what it feels like the transition will

be very painful

my mental model for it is the same thing

will happen as with the autopilot uh So

currently it's doing link following is

doing some simple stuff and eventually

we'll be doing autonomy and people will

have to intervene less and less and

there could be like you like testing

mechanisms

like if it writes a function and that

function looks pretty damn correct but

how do you know it's correct because

you're like getting lazier and lazier as

a programmer like your ability to

because like little bugs but I guess it

won't make a little no it will it

copilot will make uh off by one subtle

bugs it has done that to me but do you

think future systems will or is it

really the off by one is actually a

fundamental challenge of programming in

that case it wasn't fundamental and I

think things can improve but uh yeah I

think humans have to supervise I am

nervous about people not supervising

what comes out and what happens to for

example the proliferation of bugs in all

of our systems I'm nervous about that

but I think and there will probably be

some other copilots for bug finding and

stuff like that at some point because

there will be like a lot more automation

for uh oh man

so it's like a program a co-pilot that

generates a compiler for one that does a

linter yes one that does like a a type

Checker yes

it's a committee of like a GPT sort of

like and then they'll be like a manager

for the committee yeah and then there'll

be somebody that says a new version of

this is needed we need to regenerate it

yeah there were 10 gpts that were

forwarded and gave 50 suggestions

another one looked at it and picked a

few that they like a bug one looked at

it and it was like it's probably a bug

they got re-ranked by some other thing

and then a final Ensemble uh GPT comes

in it's like okay given everything you

guys have told me this is probably the

next token you know the feeling is the

number of programmers in the world has

been growing and growing very quickly do

you think it's possible that it'll

actually level out and drop to like a

very low number with this kind of world

because then you'll be doing software

2.0 programming

um and you'll be doing this kind of

generation of copilot type systems

programming but you won't be doing the

old school

software 1.0 program I don't currently

think that they're just going to replace

human programmers

um

it's I'm so hesitant saying stuff like

this right because this is going to be

replaced in five years I don't know it's

going to show that like this is where we

thought because I I agree with you but I

think we might be very surprised

right like what are the next

I I what's your sense of what we're

seeing with language models like does it

feel like the beginning or the middle or

the end the beginning 100 I think the

big question in my mind is for sure GPT

will be able to program quite well

competently and so on how do you steer

the system you still have to provide

some guidance to what you actually are

looking for and so how do you steer it

and how do you say how do you talk to it

how do you um

audit it and verify that what is done is

correct and how do you like work with

this and it's as much not just an AI

problem but a UI ux problem yeah

um so beautiful fertile ground for so

much interesting work for vs code plus

plus where you're not just it's not just

human programming anymore it's amazing

yeah so you're interacting with the

system so not just one prompt but it's

iterative prompting yeah you're trying

to figure out having a conversation with

the system yeah that actually I mean to

me that's super exciting to have a

conversation with the program I'm

writing

yeah maybe at some point uh you're just

conversing with it it's like okay here's

what I want to do actually this variable

maybe it's not even that low level as

variable but you can also Imagine like

can you translate this to c plus and

back to python yeah that already kind of

existence no but just like doing it as

part of the program experience like I

think I'd like to write this function in

C plus plus

or or like you just keep changing for

different uh different programs because

they're different syntax maybe I want to

convert this into a functional language

and so like you get to become

multilingual as a programmer and dance

back and forth efficiently yeah I mean I

think the UI ux of it though is like

still very hard to think through because

it's not just about writing code on a

page you have an entire developer

environment you have a bunch of hardware

on it uh you have some environmental

variables you have some scripts that are

running in the Chrome job like there's a

lot going on to like working with

computers and how do these uh systems

set up environment flags and work across

multiple machines and set up screen

sessions and automate different

processes like how all that works and

it's auditable by humans and so on is

like massive question at the moment

you've built archive sanity

what is archive and what is the future

of academic research publishing that you

would like to see so archive is This

pre-print Server so if you have a paper

you can submit it for publication to

journals or conferences and then wait

six months and then maybe get a decision

pass or fail or you can just upload it

to Archive

and then people can tweet about it three

minutes later and then everyone sees it

everyone reads it and everyone can

profit from it uh in their own ways you

can cite it and it has an official look

to it it feels like a pub like it feels

like a publication process yeah it feels

different than you if you just put in a

blog post oh yeah yeah I mean it's a

paper and usually the the bar is higher

for something that you would expect on

archive as opposed to and something you

would see in a blog post well the

culture created the bar because you

could probably yes host a pretty crappy

face for an archive

um so what's that make you feel like

what's that make you feel about peer

review so rigorous peer review by two

three experts versus the peer review of

the community right as it's written yeah

basically I think the community is very

well able to peer review things very

quickly on Twitter and I think maybe it

just has to do something with AI machine

learning field specifically though I

feel like things are more easily

auditable um and the verification is is

easier potentially than the verification

somewhere else so it's kind of like um

you can think of these uh scientific

Publications there's like little

blockchains where everyone's building on

each other's work and setting each other

and you sort of have ai which is kind of

like this much faster and loose

blockchain but then you have and any one

individual entry is like very um very

cheap to make and then you have other

fields where maybe that model doesn't

make as much sense

um and so I think in AI at least things

are pretty easily verifiable and so

that's why when people upload papers

they're a really good idea and so on

people can try it out like the next day

and they can be the final Arbiter

whether it works or not on their problem

and the whole thing just moves

significantly faster so I kind of feel

like Academia still has a place sorry

this like conference Journal process

still has a place but it's sort of like

an um it lags behind I think and it's a

bit more maybe higher quality process

but it's not sort of the place where you

will discover Cutting Edge work anymore

yeah it used to be the case when I was

starting my PhD that you go to

conferences and journals and you discuss

all the latest research now when you go

to a conference or Journal like no one

discusses anything that's there because

it's already like three generations ago

irrelevant yes which makes me sad about

like deepmind for example where they

they still they still publish in nature

and these big prestigious I mean there's

still value as opposed to The Prestige

that comes with these big venues but the

the result is that they they'll announce

some breakthrough performance and it'll

take like a year to actually publish the

details I mean and those details in if

they were published immediately would

Inspire the community to move in certain

directions with that yeah it would speed

up the rest of the community but I don't

know to what extent that's part of their

objective function also that's true so

it's not just the prestige a little bit

of the delay is uh is part yeah they

certainly deepmind specifically has been

um working in the regime of having a

slightly higher quality basically

process and latency and uh publishing

those papers that way another question

from Reddit do you or have you suffered

from imposter syndrome being the

director of AI Tesla being this person

when you're at Stanford where like the

world looks at you as the expert in AI

to teach teach the world about machine

learning when I was leaving Tesla after

five years I spent a ton of time in

meeting rooms uh and you know I would

read papers in the beginning when I

joined Tesla I was writing code and then

I was writing lesson last code and I was

reading code and then I was reading

lesson less code and so this is just a

natural progression that happens I think

and uh definitely I would say near the

tail end that's when it sort of like

starts to hit you a bit more that you're

supposed to be an expert but actually

the source of Truth is the code that

people are writing the GitHub and the

actual the actual code itself and you're

not as familiar with that as you used to

be and so I would say maybe there's some

like insecurity there yeah that's

actually pretty profound that a lot of

the insecurity has to do with not

writing the code in the computer science

space like that because that is the

truth that that right there code is the

source of Truth the papers and

everything else it's a high level

summary I don't uh yeah just a high

level summary but at the end of the day

you have to read code it's impossible to

translate all that code into actual uh

you know paper form uh so when when

things come out especially when they

have a source code available that's my

favorite place to go so like I said

you're one of the greatest teachers of

machine learning AI ever uh from cs231n

to today what advice would you give to

beginners interested in getting into

machine learning

beginners are often focused on like what

to do and I think the focus should be

more like how much you do so I I'm kind

of like believer on a high level in this

10 000 hours kind of concept where

you just kind of have to just pick the

things where you can spend time and you

you care about and you're interested in

you literally have to put in 10 000

hours of work

um it doesn't even like matter as much

like where you put it and your you'll

iterate and you'll improve and you'll

waste some time I don't know if there's

a better way you need to put in 10 000

hours but I think it's actually really

nice because I feel like there's some

sense of determinism about uh being an

expert at a thing if you spend ten

thousand hours you can literally pick an

arbitrary thing and I think if you spend

ten thousand hours of deliberate effort

and work you actually will become an

expert at it and so I think it's kind of

like a nice thought

um and so uh basically I would focus

more on like are you spending 10 000

hours that's what I focus on so and then

thinking about what kind of mechanisms

maximize your likelihood of getting to

ten thousand dollars exactly which for

us silly humans means probably forming a

daily habit of like every single day

actually doing the thing whatever helps

you so I do think to a large extent is a

psychological problem for yourself uh

one other thing that I help that I think

is helpful for the psychology of it is

many times people compare themselves to

others in the area I think this is very

harmful only compare yourself to you

from some time ago like say a year ago

are you better than you year ago this is

the only way to think

um and I think this then you can see

your progress and it's very motivating

that's so interesting that focus on the

quantity of ours because I think a lot

of people uh in the beginner stage but

actually throughout get paralyzed

uh by uh the choice like which one do I

pick this path or this path yeah like

they'll literally get paralyzed by like

which ID to use well they're worried

yeah they're worried about all these

things but the thing is some of the you

you will waste time doing something

wrong yes you will eventually figure out

it's not right you will accumulate scar

tissue and next time you'll grow

stronger because next time you'll have

the scar tissue and next time you'll

learn from it and now next time you come

into a similar situation you'll be like

all right

I messed up I've spent a lot of time

working on things that never materialize

into anything and I have all that scar

tissue and I have some intuitions about

what was useful what wasn't useful how

things turned out uh so all those

mistakes were uh were not dead work you

know so I just think you should just

focus on working what have you done what

have you done last week

uh that's a good question actually to

ask for for a lot of things not just

machine learning

um it's a good way to cut the

the I forgot what the term will use but

the fluff the blubber whatever the uh

the inefficiencies in life uh what do

you love about teaching you seem to find

yourself

often in the like drawn to teaching

you're very good at it but you're also

drawn to it I mean I don't think I love

teaching I love happy humans and happy

humans like when I teach yes I I

wouldn't say I hate teaching I tolerate

teaching but it's not like the act of

teaching that I like it's it's that um

you know I I have some I have something

I'm actually okay at it yes I'm okay at

teaching and people appreciate it a lot

yeah and uh so I'm just happy to try to

be helpful and uh teaching itself is not

like the most I mean it's really it can

be really annoying frustrating I was

working on a bunch of lectures just now

I was reminded back to my days of 231

and just how much work it is to create

some of these materials and make them

good the amount of iteration and thought

and you go down blind alleys and just

how much you change it so creating

something good

um in terms of like educational value is

really hard and uh it's not fun it's

difficult so for people should

definitely go watch your new stuff you

put out there are lectures where you're

actually building the thing like from

like you said the coldest truth so

discussing back propagation by building

it by looking through and just the whole

thing so how difficult is that to

prepare for I think that's a really

powerful way to teach how did you have

to prepare for that or are you just live

thinking through it I will typically do

like say three takes and then I take

like the the better take uh so I do

multiple takes and I take some of the

better takes and then I just build out a

lecture that way uh sometimes I have to

delete 30 minutes of content because it

just went down the Nelly that I didn't

like too much there's about a bunch of

iteration and it probably takes me you

know somewhere around 10 hours to create

one hour of content to give one hour

it's interesting I mean is it difficult

to go back to the like the basics do you

draw a lot of like wisdom from going

back to the basics yeah going back to

back propagation loss functions where

they come from and one thing I like

about teaching a lot honestly is it

definitely strengthens your

understanding uh so it's not a purely

altruistic activity it's a way to learn

if you have to explain something to

someone uh you realize you have gaps in

knowledge uh and so I even surprised

myself in those lectures like also the

result will obviously look at this and

then the result doesn't look like it and

I'm like okay I thought I understood

this yeah

but that's why it's really cool to

literally code you run it in a notebook

and it gives you a result and you're

like oh wow and like actual numbers

actual input act you know actual code

yeah it's not mathematical symbols Etc

the source of Truth is the code it's not

slides it's just like let's build it

it's beautiful you're a rare human in

that sense uh what advice would you give

to researchers uh trying to develop and

publish idea that have a big impact in

the world of AI so maybe

um undergrads maybe early graduate

students yep I mean I would say like

they definitely have to be a little bit

more strategic than I had to be as a PhD

student because of the way AI is

evolving it's going the way of physics

where you know in physics you used to be

able to do experiments on your benchtop

and everything was great and you could

make progress and now you have to work

in like LHC or like CERN and

and so AI is going in that direction as

well

um so there's certain kinds of things

that's just not possible to do on the

bench top anymore and uh

I think um that didn't used to be the

case at the time do you still think that

there's like

Gan type papers to be written where like

uh like very simple idea that requires

just one computer to illustrate a simple

example I mean one example that's been

very influential recently is diffusion

models diffusion models are amazing the

fusion models are six years old for the

longest time people were kind of

ignoring them as far as I can tell and

they're an amazing generative model

especially in uh in images and so stable

diffusion and so on it's all diffusion

based the fusion is new it was not there

and came from well it came from Google

but a researcher could have come up with

it in fact some of the first

actually no those came from Google as

well but a researcher could come up with

that in an academic Institution

yeah what do you find Most Fascinating

about diffusion models so from the

societal impact to the the technical

architecture what I like about the

fusion is it works so well

is that surprising to you the amount of

the variety almost the novelty of the

synthetic data is generating yeah so the

stable diffusion images are incredible

it's the speed of improvement in

generating images has been insane uh we

went very quickly from generating like

tiny digits to the tiny faces and it all

looked messed up and now we have stable

diffusion and that happened very quickly

there's a lot that Academia can still

contribute you know for example um flash

attention is a very efficient kernel for

running the attention operation inside

the Transformer that came from academic

environment it's a very clever way to

structure the kernel uh that that's the

calculation so it doesn't materialize

the attention Matrix

um and so there's I think there's still

like lots of things to contribute but

you have to be just more strategic do

you think neural networks could be made

to reason

uh yes

do you think they're already reason yes

what's your definition of reasoning

uh information processing

so in a way that humans think through a

problem and come up with novel ideas

it it feels like a reasoning yeah so the

the novelty

I don't want to say but out of

distribution ideas

you think it's possible yes and I think

we're seeing that already in the current

neural Nets you're able to remix the

training set information into true

generalization in some sense that

doesn't appear it doesn't matter

like you're doing something interesting

algorithmically you're manipulating

you know some symbols and you're coming

up with some

correct a unique answer in a new setting

what would uh illustrate to you holy

shit this thing is definitely thinking

to me thinking or reasoning is just

information processing and

generalization and I think the neural

Nets already do that today so being able

to perceive the world or perceive the

whatever the inputs are and to make

uh predictions based on that or actions

based on that that's that's the reason

yeah you're giving correct answers in

novel settings

by manipulating information you've

learned the correct algorithm you're not

doing just some kind of a lookup table

and there's neighbor search

let me ask you about AGI what what are

some moonshot ideas you think might make

significant progress towards AGI or

maybe in other ways what are big

blockers that we're missing now so

basically I am fairly bullish on our

ability to build agis uh basically

automated systems that we can interact

with and are very human-like and we can

interact with them in a digital realm or

Physical Realm currently it seems most

of the models that sort of do these sort

of magical tasks are in a text Realm

um I think uh as I mentioned I'm

suspicious that the text realm is not

enough to actually build full

understanding of the world I do actually

think you need to go into pixels and

understand the physical world and how it

works so I do think that we need to

extend these models to consume images

and videos and train on a lot more data

that is multimodal in that way

if you think you need to touch the world

to understand it also well that's the

big open question I would say in my mind

is if you also require the embodiment

and the ability to uh sort of interact

with the world run experiments and have

a data of that form then you need to go

to Optimus or something like that and so

I would say Optimus in some way is like

a hedge

in AGI because it seems to me that it's

possible that just having data from the

internet is not enough

if that is the case then Optimus may

lead to AGI because Optimus would I to

me there's nothing Beyond optimism you

have like this humanoid form factor that

can actually like do stuff in the world

you can have millions of them

interacting with humans and so on and uh

if that doesn't give a rise to AGI at

some point like not I'm not sure what

will

um so from a completeness perspective I

think that's the uh that's a really good

platform but it's a much harder platform

because uh you are dealing with atoms

and you need to actually like build

these things and integrate them into

society so I think that path takes

longer but it's much more certain and

then there's a path of the internet and

just like training these compression

models effectively uh on uh trying to

compress all the internet

and that might also give these agents as

well compress the internet but also

interact with the internet yeah so it's

not obvious to me

in fact I suspect you can reach AGI

without ever entering the physical world

and which is a little bit more

uh concerning because

it might that results in it happening

faster

so it just feels like we're in again

boiling water we won't know as it's

happening

I would like to I'm not afraid of AGI

I'm excited about it there's always

concerns but I would like to know when

it happens

yeah or and have like hints about when

it happens like a year from now it will

happen that kind of thing yeah I just

feel like in the digital realm it just

might happen yeah I think all we have

available to us because no one has built

AGI again so all we have available to us

is uh is there enough for cow ground on

the periphery I would say yes and we

have the progress so far which has been

very rapid and uh there are next steps

that are available and so

I would say uh yeah it's quite likely

that we'll be interacting with digital

entities how will you know that

somebody's birthday it's going to be a

slow I think it's going to be a slow

incremental transition is going to be

product based and focused it's going to

be GitHub co-pilot getting better and

then uh GPT is helping you right and

then these oracles that you can go to

with mathematical problems I think we're

on a on the verge of being able to ask

very complex

questions in chemistry physics math of

these oracles and have them complete

Solutions

so AGI to use primarily focus on

intelligence so Consciousness doesn't

enter

into uh into it

so in my mind Consciousness is not a

special thing you will you will figure

out and bolt-on I think it's an emerging

phenomenon of a large enough and complex

enough

um generative model sort of so

um if you have a complex and Alpha World

model that understands the world then it

also understands its predicament in the

world as being a language model which to

me is a form of Consciousness or

self-awareness and so in order to

understand the world deeply you probably

have to integrate yourself into the

world yeah and in order to interact with

humans and other living beings

Consciousness is a very useful tool I

think Consciousness is like a modeling

insight

modeling Insight yeah it's a you have a

powerful enough model of understanding

the world that you actually understand

that you are an entity in it yeah but

there's also this um perhaps just a

narrative we tell ourselves there's a it

feels like something to experience the

world the hard problem of Consciousness

yeah but that could be just the

narrative that we tell ourselves yeah I

don't think what yeah I think it will

emerge I think it's going to be

something uh very boring like we'll be

talking to these uh digital AIS they

will claim they're conscious they will

appear conscious they will do all the

things that you would expect of other

humans and uh it's going to just be a

stalemate I I think there would be a lot

of actual fascinating ethical questions

like Supreme Court level questions

of whether you're allowed to turn off a

conscious AI if you're allowed to build

the conscious AI

maybe there would have to be the same

kind of debates that you have around

um sorry to bring up a political topic

but you know abortion which is the

deeper question with abortion

is what is life and the Deep question

with AI is also what is life and what is

conscious and I think that'll be very

fascinating

to bring up it might become illegal to

build systems that are capable that like

of such a level of intelligence that

Consciousness would emerge and therefore

the capacity to suffer would emerge and

some A system that says no please don't

kill me well that's what the Lambda

compute the Lambda chatbot already told

um this Google engineer right like it it

was talking about not wanting to die or

so on so that might become illegal to do

that right

I because otherwise you might have a lot

of a lot of creatures that don't want to

die and they will uh you can just spawn

Infinity of them on a cluster

and then that might lead to like

horrible consequences because then there

might be a lot of people that secretly

love murder and they'll start practicing

murder on those systems I mean there's

just I to me all of this stuff just

brings a beautiful mirror to The Human

Condition and human nature we'll get to

explore it and that's what like the best

of uh the Supreme Court of all the

different debates we have about ideas of

what it means to be human we get to ask

those deep questions that we've been

asking throughout human history there's

always been the other in human history

uh we're the good guys and that's the

bad guys and we're going to uh you know

throughout human history let's murder

the bad guys and the same will probably

happen with robots it'll be the other at

first and then we'll get to ask

questions of what does it mean to be

alive what does it mean to be conscious

yeah and I think there's some Canary in

the coal mines even with what we have

today

um and uh you know for example these

there's these like waifus that you like

work with and some people are trying to

like this company is going to shut down

but this person really like yeah love

their waifu and like is trying to like

Port it somewhere else and like it's not

possible and like I think like

definitely uh people will have feelings

towards uh towards these um systems

because in some sense they are like a

mirror of humanity because they are like

sort of like a big average of humanity

yeah in a way that it's trained but we

can that average we can actually watch

it's nice to be able to interact with

the big average of humanity yeah and do

like a search query on it yeah yeah it's

very fascinating and uh we can also of

course also like shape it it's not just

a pure average we can mess with the

training data we can mess with the

objective we can fine tune them in

various ways so we have some

um you know impact on what those systems

look like if you want to achieve AGI

um and you could have a conversation

with her and ask her uh talk about

anything maybe ask her a question what

kind of stuff would you would you ask I

would have some practical questions in

my mind like uh do I or my loved ones

really have to die uh what can we do

about that

do you think it will answer clearly or

would it answer poetically

I would expect it to give Solutions I

would expect it to be like well I've

read all of these textbooks and I know

all these things that you've produced

and it seems to me like here are the

experiments that I think it would be

useful to run next and hear some Gene

therapies that I think would be helpful

and uh here are the kinds of experiments

that you should run okay let's go over

the Start experiment okay

imagine that mortality is actually uh

like a prerequisite for happiness so if

we become immortal we'll actually become

deeply unhappy and the model is able to

know that so what is this supposed to

tell you stupid human about it yes you

can become a mortal but you will become

deeply unhappy

if if the model is if the AGI system

is trying to empathize with you human

what is this supposed to tell you that

yes you don't have to die but you're

really not going to like it because that

is it going to be deeply honest like

there's a Interstellar what is it the AI

says like humans want 90 honesty

so like you have to pick how honest I

want to answer these practical questions

yeah I love Yeah Interstellar by the way

I think it's like such a sidekick to the

entire story but

at the same time it's like really

interesting it's kind of limited in

certain ways right yeah it's limited and

I think that's totally fine by the way I

don't think uh I think it's

find impossible to have a limited and

imperfect agis

is that the feature almost as an example

like it has a fixed amount of compute on

its physical body and it might just be

that even though you can have a super

amazing Mega brain super intelligent AI

you also can have like you know less

intelligent AIS that you can deploy in a

power efficient way and then they're not

perfect they might make mistakes no I

meant more like say you had infinite

compute and it's still good to make

mistakes sometimes

like in order to integrate yourself like

um

what is it going back to Goodwill

Hunting uh Robin Williams character

says like the human imperfections that's

the good stuff right isn't it isn't that

this like we don't want perfect we want

flaws in part to form connections with

each other because it feels like

something you can attach your feelings

to

the the flaws in that same way you want

an AI That's flawed I don't know I feel

like perfectionist but then you're

saying okay yeah but that's not AGI but

see AGI would need to be intelligent

enough to give answers to humans that

humans don't understand and I think

perfect isn't something humans can't

understand because even science doesn't

give perfect answers there's always gabs

and Mysteries and I don't know I I don't

know if humans want perfect

yeah I could imagine just uh having a

conversation with this kind of Oracle

entity as you'd imagine them and uh yeah

maybe it can tell you about

you know based on my analysis of Human

Condition uh you might not want this and

here are some of the things that might

but every every dumb human will say yeah

yeah trust me I can give me the truth I

can handle it but that's the beauty a

lot of people can choose uh so but then

the old marshmallow test with the kids

and so on I feel like too many people

uh like it can't handle the truth

probably including myself like the Deep

truth of The Human Condition I don't I

don't know if I can handle it like what

if there's some dark stuff what if we

are an alien science experiment and it

realizes that what if it had I mean

I mean this is the Matrix you know the

middle over again

I don't know I would what would I talk

about I don't even yeah I

uh probably I will go with the save for

scientific questions at first that have

nothing to do with my own personal life

yeah immortality just like about physics

and so on yeah uh to build up like let's

see where it's at or maybe see if it has

a sense of humor that's another question

would it be able to uh presumably in

order to if it understands humans deeply

would able to generate

uh yep to generate humor yeah I think

that's actually a wonderful Benchmark

almost like is it able I think that's a

really good point basically to make you

laugh yeah if it's able to be like a

very effective stand-up comedian that is

doing something very interesting

computationally I think being funny is

extremely hard yeah

because

it's hard in a way like a touring test

the original intent of the touring test

is hard because you have to convince

humans and there's nothing that's why

that's why when comedians talk about

this like there's this is deeply honest

because if people can't help but laugh

and if they don't laugh that means

you're not funny they laugh that's funny

and you're showing you need a lot of

knowledge to create to create humor

about like the occupational Human

Condition and so on and then you need to

be clever with it

uh you mentioned a few movies you

tweeted movies that I've seen five plus

times but I'm ready and willing to keep

watching Interstellar Gladiator contact

Goodwill Hunting The Matrix Lord of the

Rings all three Avatar Fifth Element so

on goes on Terminator two Mean Girls I'm

not gonna ask about that

mean girls is great

um what are some of the jump onto your

memory that you love

and why like you mentioned the Matrix

as a computer person why do you love The

Matrix

there's so many properties that make it

like beautiful and interesting so

there's all these philosophical

questions but then there's also agis and

there's simulation and it's cool and

there's you know the black uh you know

uh the look of it the feel of it the

look of it the feel of it the action the

bullet time it was just like innovating

in so many ways

and then uh Good Good Will Hunting why

do you like that one yeah I just I

really like this uh torture genius sort

of character who's like grappling with

whether or not he has like any

responsibility or like what to do with

this gift that he was given or like how

to think about the whole thing and uh

there's also a dance between the genius

and the the personal like what it means

to love another human being and there's

a lot of themes there it's just a

beautiful movie and then the fatherly

figure The Mentor in the in the

psychiatrist and the it like really like

uh

it messes with you you know there's some

movies that's just like really mess with

you uh on a deep level do you relate to

that movie at all no it's not your fault

doctor as I said Lord of the Rings

that's self-explanatory Terminator 2

which is interesting

you we watch that a lot is that better

than Terminator one

you like you like Arnold I do like

Terminator one as well uh I like

Terminator 2 a little bit more but in

terms of like its surface properties

[Laughter]

do you think Skynet is at all a

possibility oh yes

well like the actual sort of uh

autonomous uh weapon system kind of

thing do you worry about that uh stuff

I 100 worry about it and so the I mean

the uh you know some of these uh fears

of AGS and how this will plan out I mean

these will be like very powerful

entities probably at some point and so

um for a long time they're going to be

tools in the hands of humans uh you know

people talk about like alignment of agis

and how to make the problem is like even

humans are not aligned uh so

uh how this will be used and what this

is going to look like is um yeah it's

troubling so

do you think it'll happen so slowly

enough that we'll be able to

as a human civilization think through

the problems yes that's my hope is that

it happens slowly enough and in an open

enough way where a lot of people can see

and participate in it just figure out

how to deal with this transition I think

which is going to be interesting I draw

a lot of inspiration from nuclear

weapons because I sure thought it would

be it would be fucked once they develop

nuclear weapons but like it's almost

like

uh when the when the systems are not so

dangerous they destroy human

civilization we deploy them and learn

the lessons and then we quickly if it's

too dangerous we're quickly quicker we

might still deploy it uh but you very

quickly learn not to use them and so

there'll be like this balance that you

humans are very clever as a species it's

interesting we exploit the resources as

much as we can but we don't we avoid

destroying ourselves it seems like

well I don't know about that actually I

hope it continues

um

I mean I'm definitely like concerned

about nuclear weapons and so on not just

as a result of the recent conflict even

before that uh that's probably like my

number one concern for society so if

Humanity uh destroys itself

or destroys you know 90 of people that

would be because of nukes I think so

um and it's not even about full

destruction to me it's bad enough if we

reset society that would be like

terrible it would be really bad and I

can't believe we're like so close to it

yeah it's like so crazy to me it feels

like we might be a few tweets away from

something like that yep basically it's

extremely unnerving but and has been for

me for a long time

it seems unstable that world leaders

just having a bad mood

can like um

take one step towards a bad Direction

and it escalates yeah and because of a

collection of bad moods it can escalate

without being able to

um stop

yeah it's just it's a huge amount of uh

Power and then also with the

proliferation and basically I don't I

don't actually really see I don't

actually know what the good outcomes are

here

uh so I'm definitely worried about that

a lot and then AGI is not currently

there but I think at some point we'll

more and more become uh something like

it the danger with AGI even is that I

think it's even less likely worse in a

sense that

uh there are good outcomes of AGI and

then the bad outcomes are like an

absolute way like a tiny one way and so

I think um capitalism and humanity and

so on will drive for the positive

uh ways of using that technology but

then if bad outcomes are just like a

tiny like flipping minus sign away uh

that's a really bad position to be in a

tiny perturbation of the system results

in the destruction of the human species

it's a weird line to walk yeah I think

in general what's really weird about

like the Dynamics of humanity and this

explosion was talked about is just like

the insane coupling afforded by

technology yeah and uh just the

instability of the whole dynamical

system I think it's just it doesn't look

good honestly

yes that explosion could be destructive

and constructive and the probabilities

are non-zero in both both senses I'm

going to have to I do feel like I have

to try to be optimistic and so on and

yes I think even in this case I still am

predominantly optimistic but there's

definitely

me too uh do you think we'll become a

multiplayer species

probably yes but I don't know if it's

dominant feature of uh future Humanity

uh there might be some people on some

planets and so on but I'm not sure if

it's like

yeah if it's like a major player in our

culture and so on we still have to solve

the drivers of self-destruction here on

Earth so just having a backup on Mars is

not going to solve the problem so by the

way I love the backup on Mars I think

that's amazing you should absolutely do

that yes and I'm so thankful uh and

would you go to Mars uh personally no I

do like Earth quite a lot okay uh I'll

go to Mars I'll go for you unless I'll

tweet at you from there maybe eventually

I would once it's uh safe enough but I

don't actually know if it's on my

lifetime scale unless I can extend it by

a lot

I do think that for example a lot of

people might disappear into

um virtual realities and stuff like that

and I think that could be the major

thrust of

um sort of the cultural development of

humanity if it survives uh so it might

not be it's just really hard to work in

Physical Realm and go out there and I

think ultimately all your experiences

are in your brain yeah and so it's much

easier to disappear into digital Realm

and I think people will find them more

compelling easier safer

more interesting so you're a little bit

captivated by Virtual Reality by the

possible worlds whether it's the

metaverse or some other manifestation of

that yeah

yeah it's really interesting it's uh

I'm I'm interested just just talking a

lot to Carmack where's the

where's the thing that's currently

preventing that yeah I mean to be clear

I think what's interesting about the

future is

um it's not that

I kind of feel like

the variance in The Human Condition

grows that's the primary thing that's

changing it's not as much the mean of

the distribution it's like the variance

of it so there will probably be people

on Mars and there will be people in VR

and they're all people here on Earth

it's just like there will be so many

more ways of being

and so I kind of feel like I see it as

like a spreading out of a human

experience there's something about the

internet that allows you to discover

those little groups and you you

gravitate each other something about

your biology likes that kind of world

and that you find each other yeah and

we'll have transhumanists and then we'll

have the Amish and they're gonna

everything is just gonna coexist you

know the cool thing about it because

I've interacted with a bunch of Internet

communities is

um they don't know about each other like

you can have a very happy existence just

like having a very close-knit community

and not knowing about each other I mean

even even since this just having

traveled to Ukraine there's they they

don't know so many things about America

you you like when you travel across the

world I think you experience this too

there are certain cultures they're like

they have their own thing going on they

don't and so you can see that happening

more and more and more and more in the

future we have little communities yeah

yeah I think so that seems to be the

that seems to be how it's going right

now and I don't see that Trend like

really reversing I think people are

diverse and they're able to choose their

own like path and existence and I sort

of like celebrate that

um and so will you spend so much time in

the meters in the virtual reality or

which Community are you are you the

physicalist uh the the the physical

reality enjoyer or uh do you see drawing

a lot of uh pleasure and fulfillment in

the digital world

yeah I think well currently the virtual

reality is not that compelling I do

think it can improve a lot but I don't

really know to what extent maybe you

know there's actually like even more

exotic things you can think about with

like neural links or stuff like that so

um currently I kind of see myself as

mostly a team human person I love nature

yeah I love Harmony I love people I love

Humanity I love emotions of humanity

um and I I just want to be like in this

like solar Punk little Utopia that's my

happy place yeah my happy place is like

uh people I love thinking about cool

problems surrounded by a lush beautiful

Dynamic nature yeah yeah and secretly

high tech in places that count places

like they use technology to empower that

love for other humans and nature yeah I

think a technology used like very

sparingly uh I don't love when it sort

of gets in the way of humanity in many

ways uh I like just people being humans

in a way we sort of like slightly

evolved and prefer I think just by

default people kept asking me because

they they know you love reading are

there particular books

that you enjoyed that had an impact on

you

for silly or for profound reasons that

you would recommend

you mentioned the vital question

many of course I think in biology as an

example the vital question is a good one

anything by McLean really uh life

ascending I would say is like a bit more

potentially uh representative as like a

summary

of a lot of the things he's been talking

about I was very impacted by the selfish

Gene I thought that was a really good

book that helped me understand altruism

as an example and where it comes from

and just realizing that you know the

selection is in the level of genes was a

huge insight for me at the time and it

sort of like cleared up a lot of things

for me what do you think about

the the idea that ideas of the organisms

the means yes love it 100

[Laughter]

are you able to walk around with that

notion for a while that that there is an

evolutionary kind of process with ideas

as well there absolutely is there's

memes just like genes and they compete

and they live in our brains it's

beautiful are we silly humans thinking

that we're the organisms is it possible

that the primary

organisms are the ideas

yeah I would say like the the ideas kind

of live in the software of like our

civilization in the in the minds and so

on we think as humans that the hardware

is the fundamental thing I human is a

hardware entity yeah but it could be the

software right yeah

yeah I would say like there needs to be

some grounding at some point to like a

physical reality yeah but if we clone an

Andre

the software is the thing

like is this thing that makes that thing

special right yeah I guess I you're

right but then cloning might be

exceptionally difficult like there might

be a deep integration between the

software and the hardware in ways we

don't quite understand well from the

evolution point of view like what makes

me special is more like the the gang of

genes that are writing in my chromosomes

I suppose right like they're the they're

the replicating unit I suppose and no

but that's just for you the thing that

makes you special sure wow

the reality is what makes you special is

your ability to survive

based on the software that runs on the

hardware that was built by the genes

um so the software is the thing that

makes you survive not the hardware

all right yeah it's just like a second

layer it's a new second layer that

hasn't been there before the brain they

both they both coexist but there's also

layers of the software I mean it's it's

not it's a it's a abstraction that's uh

on top of abstractions but okay so

selfish Gene um a neckline I would say

sometimes books are like not sufficient

I like to reach for textbooks sometimes

um I kind of feel like books are for too

much of a general consumption sometime

and they just kind of like uh they're

too high up in the level of abstraction

and it's not good enough yeah so I like

textbooks I like the cell I think the

cell was pretty cool

uh that's why also I like the writing of

uh McLean is because he's pretty willing

to step one level down and he doesn't uh

yeah he's sort of he's willing to go

there

but he's also willing to sort of be

throughout the stack so he'll go down to

a lot of detail but then he will come

back up and I think he has a yeah

basically I really appreciate that

that's why I love college early college

even high school but just textbooks on

the basics yeah of Computer Science and

Mathematics of of biology of chemistry

yes those are they condense down like uh

uh it's sufficiently General that you

can understand the both the philosophy

and the details but also like you get

homework problems and you you get to

play with it as much as you would if you

weren't yeah programming stuff yeah and

then I'm also suspicious of textbooks

honestly because as an example in deep

learning uh there's no like amazing

textbooks and I feel this changing very

quickly I imagine the same is true and

say uh synthetic biology and so on these

books like this cell are kind of

outdated they're still high level like

what is the actual real source of truth

it's people in wet Labs working with

cells yeah you know sequencing genomes

and

yeah actually working with working with

it and uh I don't have that much

exposure to that or what that looks like

so I sold them fully I'm reading through

the cell and it's kind of interesting

and I'm learning but it's still not

sufficient I would say in terms of

understanding well it's a clean

summarization of the mainstream

narrative

yeah but you have to learn that before

you break out yeah towards The Cutting

Edge yeah what is the actual process of

working with these cells and growing

them and incubating them and you know

it's kind of like a massive cooking

recipe so making sure your self slows

and proliferate and then you're

sequencing them running experiments and

uh just how that works I think is kind

of like the source of truth of at the

end of the day what's really useful in

terms of creating therapies and so on

yeah I wonder in the future AI textbooks

will be because you know there's a

artificial intelligence a modern

approach I actually haven't read if it's

come out the recent version the recent

there's been a recent Edition I also saw

there's a science a deep learning book

I'm waiting for textbooks that worth

recommending worth reading it's It's

tricky because it's like papers

and code code honestly papers are quite

good I especially like the appendix

appendix of any paper as well it's like

it's like the most detail it can have

it doesn't have to be cohesive to

connected to anything else you just

describe me a very specific way you

solved a particular thing yeah many

times papers can be actually quite

readable not always but sometimes the

introduction in the abstract is readable

even for someone outside of the field uh

not this is not always true and

sometimes I think unfortunately

scientists use complex terms even when

it's not necessary I think that's

harmful I think there there's no reason

for that and papers sometimes are longer

than they need to be in this in the

parts that

don't matter yeah appendix would be long

but then the paper itself you know look

at Einstein make it simple

yeah but certainly I've come across

papers I would say in say like synthetic

biology or something that I thought were

quite readable for the abstract and the

introduction and then you're reading the

rest of it and you don't fully

understand but you kind of are getting a

gist and I think it's cool

what uh advice you give advice to folks

interested in machine learning and

research but in General Life advice to a

young person High School

um Early College about how to have a

career they can be proud of or a life

they can be proud of

yeah I think I'm very hesitant to give

general advice I think it's really hard

I've mentioned like some of the stuff

I've mentioned is fairly General I think

like focus on just the amount of work

you're spending on like a thing

uh compare yourself only to yourself not

to others that's good I think those are

fairly General how do you pick the thing

uh you just have like a deep interest in

something uh or like try to like find

the art Max over like the things that

you're interested in ARG Max at that

moment and stick with it how do you not

get distracted and switch to another

thing uh you can if you like

um well if you do an ARG Max repeatedly

every week it doesn't converge it

doesn't it's a problem yeah you can like

low pass filter yourself uh in terms of

like what has consistently been true for

you

um but yeah I definitely see how it can

be hard but I would say like you're

going to work the hardest on the thing

that you care about the most also a low

pass filter yourself and really

introspect in your past were the things

that gave you energy and what are the

things that took energy away from you

concrete examples and usually uh from

those concrete examples sometimes

patterns can merge I like I like it when

things look like this when I'm these

positions so that's not necessarily the

field but the kind of stuff you're doing

in a particular field so for you it

seems like you were energized by

implementing stuff building actual

things yeah being low level learning and

then also uh communicating so that

others can go through the same

realizations and shortening that Gap

um because I usually have to do way too

much work to understand the thing and

then I'm like okay this is actually like

okay I think I get it and like why was

it so much work it should have been much

less work and that gives me a lot of

frustration and that's why I sometimes

go teach so aside from the teaching

you're doing now uh putting out videos

aside from a potential uh Godfather part

two

uh with the AGI at Tesla and Beyond uh

what does the future for Android kapati

hold have you figured that out yet or no

I mean uh as you see through the fog of

war that is all of our future

um do you do you start seeing

silhouettes of the what that possible

future could look like

the consistent thing I've been always

interested in for me at least is is AI

and um

uh that's probably where I'm spending my

rest of my life on because I just care

about a lot and I actually care about

like many other problems as well like

say aging which I basically view as

disease and uh

um I care about that as well but I don't

think it's a good idea to go after it

specifically I don't actually think that

humans will be able to come up with the

answer I think the correct thing to do

is to ignore those problems and you

solve Ai and then use that to solve

everything else and I think there's a

chance that this will work I think it's

a very high chance and uh that's kind of

like the the way I'm betting at least so

when you think about AI are you

interested in all kinds of applications

all kinds of domains and any domain you

focus on will allow you to get insights

to the big problem of AGI yeah for me

it's the ultimate mental problem I don't

want to work on any one specific problem

there's too many problems so how can you

work on all problems simultaneously you

solve The Meta problem which to me is

just intelligence and how do you

automate it is there cool small projects

like archive sanity and and so on that

you're thinking about the the the the

world the ml world can anticipate

there's some always like some fun side

projects yeah um archive sanity is one

basically like there's way too many

archive papers how can I organize it and

recommend papers and so on uh I

transcribed all of your yeah podcasts

what did you learn from that experience

uh from transcribing the process of like

you like consuming audiobooks and and

podcasts and so on and here's the

process that achieves

um closer to human level performance and

annotation yeah well I definitely was

like surprised that uh transcription

with opening eyes whisper was working so

well compared to what I'm familiar with

from Siri and like a few other systems I

guess it works so well and

uh that's what gave me some energy to

like try it out and I thought it could

be fun to random podcasts it's kind of

not obvious to me why whisper is so much

better compared to anything else because

I feel like there should be a lot of

incentive for a lot of companies to

produce transcription systems and that

they've done so over a long time whisper

is not a super exotic model it's a

Transformer it takes smell spectrograms

and you know just outputs tokens of text

it's not crazy uh the model and

everything has been around for a long

time

I'm not actually 100 sure why yeah it's

not obvious to me either it makes me

feel like I'm missing something I'm

missing something yeah because there's a

huge even at Google and so on YouTube uh

transcription yeah

um yeah it's unclear but some of it is

also integrating into a bigger system

yeah that so the user interface how it's

deployed and all that kind of stuff

maybe running it as an independent thing

is eat much easier like an order of

magnitude easier than deploying to a

large integrated system like YouTube

transcription or

um anything like meetings like Zoom has

trans transcription that's kind of

crappy but creating uh interface where

it detects the different individual

speakers it's able to

um

display it in compelling ways Run in

real time all that kind of stuff maybe

that's difficult but that's the only

explanation I have because like um

I'm currently paying uh quite a bit for

human uh transcription human caption

right annotation and like it seems like

uh there's a huge incentive to automate

that yeah it's very confusing and I

think I mean I don't know if you looked

at some of the whisper transcripts but

they're quite good they're good and

especially in tricky cases yeah I've

seen

uh Whispers performance on like super

tricky cases and it does incredibly well

so I don't know a podcast is pretty

simple it's like high quality audio and

you're speaking usually pretty clearly

and so I don't know it uh I don't know

what open ai's plans are yeah either but

yeah there's always like fun fun

projects basically and stable diffusion

also is opening up a huge amount of

experimentation I would say in the

visual realm and generate generating

images and videos and movies videos now

and so that's going to be pretty crazy

uh that's going to that's going to

almost certainly work and it's going to

be really interesting when the cost of

content creation is going to fall to

zero you used to need a painter for a

few months to paint a thing and now it's

going to be speak to your phone to get

your video so if Hollywood will start

using that to generate scene means

which completely opens up yeah so you

can make a like a movie like Avatar

eventually for under a million dollars

much less maybe just by talking to your

phone I mean I know it sounds kind of

crazy

and then there'd be some voting

mechanism like how do you have a like

would there be a show on Netflix that's

generated completely uh automatedly

potentially yeah and what does it look

like also when you can just generate It

On Demand and it's uh and there's

Infinity of it yeah

oh man

all the synthetic content I mean it's

humbling because we we treat ourselves

as special for being able to generate

art and ideas and all that kind of stuff

if that can be done in an automated Way

by AI yeah I think it's fascinating to

me how these uh the predictions of AI

and what it's going to look like and

what it's going to be capable of are

completely inverted and wrong and the

Sci-Fi of 50s and 60s was just like

totally not bright they imagined AI is

like super calculating theore improvers

and we're getting things that can talk

to you about emotions they can do art

it's just like weird are you excited

about that feature just ai's like hybrid

systems heterogeneous systems of humans

and AIS talking about emotions Netflix

and chill with an AI system legit where

the Netflix thing you watch is also

generated by AI

I think it's uh it's going to be

interesting for sure and I think I'm

cautiously optimistic but it's not it's

not obvious

well the sad thing is your brain and

mine developed in a time where

um before Twitter before the before the

internet so I wonder people that are

born inside of it might have a different

experience

um like I maybe you can will still

resist it uh and the people born now

will not

well I do feel like humans are extremely

malleable yeah

and uh you're probably right

what is the meaning of life Andre

we we talked about sort of

the universe having a conversation with

us humans or with the systems we create

to try to answer for the university for

the creator of the universe to notice us

we're trying to create systems that are

loud enough

just answer back

I don't know if that's the meaning of

life that's like meaning of life for

some people the first level answer I

would say is anyone can choose their own

meaning of life because we are conscious

entity and it's beautiful number one but

uh I do think that like a deeper meaning

of life if someone is interested is uh

or along the lines of like what the hell

is All This and like why and if you look

at the into fundamental physics and the

quantum field Theory and a standard

model they're like very complicated and

um

there's this like you know 19 free

parameter parameters of our universe and

like what's going on with all this stuff

and why is it here and can I hack it can

I work with it is there a message for me

am I supposed to create a message

and so I think there's some fundamental

answers there but I think there's

actually even like you can't actually

really make dent in those without more

time and so to me also there's a big

question around just getting more time

honestly

yeah that's kind of like what I think

about quite a bit as well so kind of the

ultimate

or at least first way to sneak up to the

why question is to try to escape

uh the system the universe yeah and then

for that you sort of uh backtrack and

say okay for that that's going to be

take a very long time so the why

question boils down from an engineering

perspective to how do we extend yeah I

think that's the question number one

practically speaking because you can't

uh you're not gonna calculate the answer

to the deeper questions in the time you

have and that could be extending your

own lifetime or extending just the

lifetime of human civilization of

whoever wants to not many people might

not want that but I think people who do

want that I think um I think it's

probably possible uh and I don't I don't

know that people

fully realize this I kind of feel like

people think of death as an

inevitability but at the end of the day

this is a physical system some things go

wrong uh it makes sense why things like

this happen evolutionarily speaking and

uh there's most certainly interventions

that uh that mitigate it that would be

interesting if death is eventually

looked at as

as a fascinating thing that used to

happen to humans I don't think it's

unlikely I think it's I think it's

likely

and it's up to our imagination to try to

predict what the world without death

looks like yeah it's hard to I think the

values will completely change

could be I don't I don't really buy all

these ideas that oh without that there's

no meaning there's nothing as

I don't intuitively buy all those

arguments I think there's plenty of

meaning plenty of things to learn

they're interesting exciting I want to

know I want to calculate uh I want to

improve the condition of

all the humans and organisms that are

alive yet the way we find meaning might

change we there is a lot of humans

probably including myself that finds

meaning in the finiteness of things but

that doesn't mean that's the only source

of meaning yeah I do think many people

will will go with that which I think is

great I love the idea that people can

just choose their own adventure like you

you are born as a conscious free entity

by default I'd like to think and um you

have your unalienable rights for Life uh

in the pursuit of happiness I don't know

if you have that in the nature the

landscape of happiness you can choose

your own adventure mostly and that's not

it's not fully true but I still am

pretty sure I'm an NPC but

um an NPC can't know it's an NPC

there could be different degrees and

levels of consciousness I don't think

there's a more beautiful way to end it

uh Andre you're an incredible person I'm

really honored you would talk with me

everything you've done for the machine

learning world for the AI world

to just inspire people to educate

millions of people it's been it's been

great and I can't wait to see what you

do next it's been an honor man thank you

so much for talking today awesome thank

you

thanks for listening to this

conversation with Andre karapathi to

support this podcast please check out

our sponsors in the description and now

let me leave you with some words from

Samuel Carlin

the purpose of models is not to fit the

data but to sharpen the questions

thanks for listening and hope to see you

next time

Loading...

Loading video analysis...