Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333
By Lex Fridman
Summary
## Key takeaways - **Neural nets: simple math, surprising power**: Neural networks are essentially simple mathematical expressions with many 'knobs' (parameters) that, when trained on complex problems, exhibit surprising emergent behaviors. [01:05], [04:09] - **Biology vs. AI: different optimization paths**: While inspired by the brain, artificial neural networks evolve through a different optimization process than biological brains, making direct analogies misleading. [06:15], [06:53] - **Transformers: general-purpose, adaptable computers**: The Transformer architecture is a powerful, general-purpose 'computer' that is expressive, optimizable, and efficient, enabling it to process diverse data like video, images, and text. [34:35], [36:03] - **Language models learn world knowledge via next-word prediction**: By training on vast internet text to predict the next word, language models implicitly learn about chemistry, physics, and human nature, exhibiting surprising emergent properties. [42:15], [44:01] - **Software 2.0: programming via data and objectives**: The future of software development ('Software 2.0') involves programming not with explicit code, but by curating data sets and defining objectives to train neural networks. [01:06:19], [01:10:37] - **Vision is sufficient and necessary for driving**: Cameras are a high-bandwidth, cost-effective sensor, and since the world is designed for human vision, relying solely on cameras for driving is both necessary and sufficient. [01:19:09], [01:32:34]
Topics Covered
- Why modern AI is an "alien artifact," not a brain copy.
- Humans are biological bootloaders for universe-solving AI.
- Can AI discover "exploits" within the universe's physics?
- The Transformer: Expressive, Optimizable, and Efficient.
- Software 2.0: AI writes code using neural network weights.
Full Transcript
think it's possible that physics has
exploits and we should be trying to find
them arranging some kind of a crazy
quantum mechanical system that somehow
gives you buffer overflow somehow gives
you a rounding error in the floating
Point synthetic intelligences are kind
of like the next stage of development
and I don't know where it leads to like
at some point I suspect
the universe is some kind of a puzzle
these synthetic AIS will uncover that
puzzle and
solve it
the following is a conversation with
Andre capothy previously the director of
AI at Tesla and before that at open Ai
and Stanford he is one of the greatest
scientists engineers and Educators in
the history of artificial intelligence
this is the Lex Friedman podcast to
support it please check out our sponsors
and now dear friends here's Andre
capathi
what is a neural network and why does it
seem to uh do such a surprisingly good
job of learning what is a neural network
it's a mathematical abstraction of
the brain I would say that's how it was
originally developed
at the end of the day it's a
mathematical expression and it's a
fairly simple mathematical expression
when you get down to it it's basically a
sequence of Matrix multiplies which are
really dot products mathematically and
some nonlinearities thrown in and so
it's a very simple mathematical
expression and it's got knobs in it many
knobs many knobs and these knobs are
Loosely related to basically the
synapses in your brain they're trainable
they're modifiable and so the idea is
like we need to find the setting of The
Knobs that makes the neural nut do
whatever you want it to do like classify
images and so on and so there's not too
much mystery I would say in it like
um you might think that basically don't
want to endow it with too much meaning
with respect to the brain and how it
works it's really just a complicated
mathematical expression with knobs and
those knobs need a proper setting for it
to do something uh desirable yeah but
poetry is just the collection of letters
with spaces but it can make us feel a
certain way and in that same way when
you get a large number of knobs together
whether it's in a inside the brain or
inside a computer they seem to they seem
to surprise us with the with their power
yeah I think that's fair so basically
I'm underselling it by a lot because you
definitely do get very surprising
emergent behaviors out of these neurons
when they're large enough and trained on
complicated enough problems like say for
example the next uh word prediction in a
massive data set from the internet and
then these neurons take on a pretty
surprising magical properties yeah I
think it's kind of interesting how much
you can get out of even very simple
mathematical formalism when your brain
right now I was talking is it doing next
word prediction
or is it doing something more
interesting well definitely some kind of
a generative model that's a gpt-like and
prompted by you
um yeah so you're giving me a prompt and
I'm kind of like responding to it in a
generative way and by yourself perhaps a
little bit like are you adding extra
prompts from your own memory inside your
head
automatically feels like you're
referencing some kind of a declarative
structure of like memory and so on and
then uh you're putting that together
with your prompt and giving away some
messages like how much of what you just
said has been said by you before
uh nothing basically right no but if you
actually look at all the words you've
ever said in your life and you do a
search you'll probably said a lot of the
same words in the same order before yeah
it could be I mean I'm using phrases
that are common Etc but I'm remixing it
into a pretty uh sort of unique sentence
at the end of the day but you're right
definitely there's like a ton of
remixing what you didn't you it's like
Magnus Carlsen said uh I'm I'm rated
2900 whatever which is pretty decent I
think you're talking very uh you're not
giving enough credit to neural Nets here
why do they seem to
what's your best intuition
about this emergent Behavior I mean it's
kind of interesting because I'm
simultaneously underselling them but I
also feel like there's an element to
which I'm over like it's actually kind
of incredible that you can get so much
emergent magical Behavior out of them
despite them being so simple
mathematically so I think those are kind
of like two surprising statements that
are kind of just juxtapose together
and I think basically what it is is we
are actually fairly good at optimizing
these neural Nets and when you give them
a hard enough problem they are forced to
learn very interesting Solutions in the
optimization and those solution
basically have these immersion
properties that are very interesting
there's wisdom and knowledge
in the knobs
and so what's this representation that's
in the knobs does it make sense to you
intuitively the large number of knobs
can hold the representation that
captures some deep wisdom about the data
it has looked at
it's a lot of knobs it's a lot of knobs
and somehow you know so speaking
concretely
um one of the neural Nets that people
are very excited about right now are are
gpts which are basically just next word
prediction networks so you consume a
sequence of words from the internet and
you try to predict the next word and uh
once you train these on a large enough
data set
um they you can basically uh prompt
these neural amounts in arbitrary ways
and you can ask them to solve problems
and they will so you can just tell them
you can you can make it look like you're
trying to um
solve some kind of a mathematical
problem and they will continue what they
think is the solution based on what
they've seen on the internet and very
often those Solutions look very
remarkably consistent look correct
potentially
do you still think about the brain side
of it so as neural Nets is an
abstraction or mathematical abstraction
of the brain you still draw wisdom
from from the biological neural networks
or even the bigger question so you're a
big fan of biology and biological
computation
what impressive thing is biology do
doing to you that computers are not yet
that Gap I would say I'm definitely on
I'm much more hesitant with the
analogies to the brain than I think you
would see potentially in the field
um and I kind of feel like
certainly the way neural network started
is everything stemmed from inspiration
by the brain but at the end of the day
the artifacts that you get after
training they are arrived at by a very
different optimization process than the
optimization process that gave rise to
the brain and so I think uh
I kind of think of it as a very
complicated alien artifact
um it's something different I'm not
sorry the uh the neuralness that we're
training okay they are complicated uh
Alien artifact uh I do not make
analogies to the brain because I think
the optimization process that gave rise
to it is very different from the brain
so there was no multi-agent self-play
kind of uh setup uh and evolution it was
an optimization that is basically a what
amounts to a compression objective on a
massive amount of data okay so
artificial neural networks are doing
compression
and biological neural networks
are not to survive and they're not
really doing any they're they're an
agent in a multi-agent self-place system
that's been running for a very very long
time that said Evolution has found that
it is very useful to to predict and have
a predictive model in the brain and so I
think our brain utilizes something that
looks like that as a part of it but it
has a lot more you know gadgets and
gizmos and uh value functions and
ancient nuclei that are all trying to
like make a survive and reproduce and
everything else and the whole thing
through embryogenesis is built from a
single cell I mean it's just the code is
inside the DNA and it just builds it up
like the entire organism
yes and like it does it pretty well
it should not be possible so there's
some learning going on there's some
there's some there's some kind of
computation going through that building
process I mean I I don't know where
if you were just to look at the entirety
of history of life on Earth
where do you think is the most
interesting invention is it the origin
of life itself
is it just jumping to eukaryotes is it
mammals is it humans themselves Homo
sapiens the the origin of intelligence
or highly complex intelligence
or or is it all just in continuation the
same kind of process
certainly I would say it's an extremely
remarkable story that I'm only like
briefly learning about recently all the
way from
um actually like you almost have to
start at the formation of Earth and all
of its conditions and the entire solar
system and how everything is arranged
with Jupiter and Moon and the habitable
zone and everything and then you have an
active Earth
that's turning over material
and um and then you start with a
biogenesis and everything and so it's
all like a pretty remarkable story I'm
not sure that
I can pick like a single Unique Piece of
it that I find most interesting
um
I guess for me as an artificial
intelligence researcher it's probably
the last piece we have lots of animals
that uh you know are are not building
technological Society but we do and um
it seems to have happened very quickly
it seems to have happened very recently
and uh
something very interesting happened
there that I don't fully understand I
almost understand everything else kind
of I think intuitively uh but I don't
understand exactly that part and how
quick it was both explanations would be
interesting one is that this is just a
continuation of the same kind of process
there's nothing special about humans
that would be deeply understanding that
would be very interesting that we think
of ourselves as special but it was
obvious all it was already written in
the in the code that you would have
greater and greater intelligence
emerging and then the other explanation
which is something truly special
happened something like a rare event
whether it's like crazy rare event like
uh Space Odyssey what would it be see if
you say like the invention of Fire
or
the uh as Richard rangham says the beta
males deciding a clever way to kill the
alpha males by collaborating so just
optimizing the collaborations really the
multi-agent aspect of the multi-agent
and that really being constrained on
resources and trying to survive
the collaboration aspect is what created
the complex intelligence but it seems
like it's a natural outgrowth of the
evolution process like what could
possibly be a magical thing that
happened like a rare thing that would
say that humans are actually human level
intelligence is actually a really rare
thing in the universe
yeah I'm hesitant to say that it is rare
by the way but it definitely seems like
it's kind of like a punctuated
equilibrium where you have lots of
exploration and then you have certain
leaps sparse leaps in between so of
course like origin of life would be one
um you know DNA sex eukaryotic system
eukaryotic life
um the endosymbiosis event or the
archaeon 8 little bacteria you know just
the whole thing and then of course
emergence of Consciousness and so on so
it seems like definitely there are
sparse events where mass amount of
progress was made but yeah it's kind of
hard to pick one so you don't think
humans are unique gotta ask you how many
intelligent aliens civilizations do you
think are out there and uh is there
intelligence
different or similar to ours
yeah I've been preoccupied with this
question quite a bit recently uh
basically the for me Paradox and just
thinking through and and the reason
actually that I am very interested in uh
the origin of life is fundamentally
trying to understand how common it is
that there are technological societies
out there uh
um in space and the more I study it the
more I think that um
uh there should be quite a few quite a
lot why haven't we heard from them
because I I agree with you it feels like
I just don't see
why what we did here on Earth is so
difficult to do yeah and especially when
you get into the details of it I used to
think origin of life was very
um
it was this magical rare event but then
you read books like for example McLean
um uh the vital question a life
ascending Etc and he really gets in and
he really makes you believe that this is
not that rare basic chemistry you have
an active Earth and you have your
alkaline Vents and you have lots of
alkaline Waters mixing whether it's a
devotion and you have your proton
gradients and you have the little porous
pockets of these alkaline vents that
concentrate chemistry and um basically
as he steps through all of these little
pieces you start to understand that
actually this is not that crazy you
could see this happen on other systems
um and he really takes you from just a
geology to primitive life and he makes
it feel like it's actually pretty
plausible and also like the origin of
life
um didn't uh was actually fairly fast
after formation of Earth
um if I remember correctly just a few
hundred million years or something like
that after basically when it was
possible life actually arose and so that
makes me feel like that is not the
constraint that is not the limiting
variable and that life should actually
be fairly common
um and then it you know where the
drop-offs are is very
um is very interesting to think about I
currently think that there's no major
drop-offs basically and so there should
be quite a lot of life and basically
what it where that brings me to then is
the only way to reconcile the fact that
we haven't found anyone and so on is
that um we just can't we can't see them
we can't observe them just a quick brief
comment Nick Lane and a lot of
biologists I talked to they really seem
to think that the jump from bacteria to
more complex organisms is the hardest
jump the eukaryotic glyphosis yeah which
I don't I get it they're much more
knowledgeable uh than me about like the
intricacies of biology but that seems
like crazy because how much how many
single cell organisms are there like and
how much time you have surely it's not
that difficult like in a billion years
it's not even that long
of a time really just all these bacteria
under constrained resources battling it
out I'm sure they can invent more
complex again I don't understand it's
like how to move from a hello world
program to like like invent a function
or something like that I don't yeah I I
so I don't yeah so I'm with you I just
feel like I don't see any if the origin
of life that would be my intuition
that's the hardest thing but if that's
not the hardest thing because it happens
so quickly then it's got to be
everywhere and yeah maybe we're just too
dumb to see it well it's just we don't
have really good mechanisms for seeing
this life I mean uh by what
how um so I'm not an expert just to
preface this but just said it was I want
to meet an expert on alien intelligence
and how to communicate I'm very
suspicious of our ability to to find
these intelligences out there and to
find these Earths like radio waves for
example are are terrible uh their power
drops off as basically one over R square
uh so I remember reading that our
current radio waves would not be uh the
ones that we we are broadcasting would
not be uh measurable by our devices
today only like was it like one tenth of
a light year away like not even
basically tiny distance because you
really need like a targeted transmission
of massive power directed somewhere for
this to be picked up on long distances
and so I just think that our ability to
measure is um is not amazing I think
there's probably other civilizations out
there and then the big question is why
don't they build binomial probes and why
don't they Interstellar travel across
the entire galaxy and my current answer
is it's probably Interstellar travel is
like really hard uh you have the
interstellar medium if you want to move
at closer speed of light you're going to
be encountering bullets along the way
because even like tiny hydrogen atoms
and little particles of dust are
basically have like massive kinetic
energy at those speeds and so basically
you need some kind of shielding you need
you have all the cosmic radiation uh
it's just like brutal out there it's
really hard and so my thinking is maybe
Interstellar travel is just extremely
hard
to build hard
it feels like uh it feels like we're not
a billion years away from doing that it
just might be that it's very you have to
go very slowly potentially as an example
through space
um right as opposed to close the speed
of light so I'm suspicious basically of
our ability to measure life and I'm
suspicious of the ability to um just
permeate all of space in the Galaxy or
across galaxies and that's the only way
that I can certainly I can currently see
a way around it yeah it's kind of
mind-blowing to think that there's
trillions of intelligent alien
civilizations out there kind of slowly
traveling through space
to meet each other and some of them meet
some of them go to war some of them
collaborate or they're all just uh
independent they are all just like
little pockets I don't know well
statistically if there's like
if it's there's trillions of them surely
some of them some of the pockets are
close enough to get some of them happen
to be close yeah in the close enough to
see each other and then once you see
once you see something that is
definitely complex life like if we see
something yeah we're probably going to
be severe like intensely aggressively
motivated to figure out what the hell
that is and try to meet them what would
be your first instinct to try to like at
a generational level meet them or defend
against them or what would be your uh
Instinct as a president of the United
States
and the scientists
I don't know which hat you prefer in
this question
yeah I think the the question it's
really hard
um
I will say like for example for us
um we have lots of primitive life forms
on Earth
um next to us we have all kinds of ants
and everything else and we share space
with them and we are hesitant to impact
on them and to we are and we're trying
to protect them by default because they
are amazing interesting dynamical
systems that took a long time to evolve
and they are interesting and special and
I don't know that you want to
um destroy that by default and so I like
complex dynamical systems that took a
lot of time to evolve I think
I'd like to I like to preserve it if I
can afford to and I'd like to think that
the same would be true about uh the
galactic resources and that uh they
would think that we're kind of
incredible interesting story that took
time it took a few billion years to
unravel and you don't want to just
destroy it I could see two aliens
talking about Earth right now and saying
uh I'm I'm a big fan of complex
dynamical systems so I think it was a
value to preserve these and who
basically are a video game they watch or
show a TV show that they watch
yeah I think uh you would need like a
very good reason I think to
to destroy it uh like why don't we
destroy these ant farms and so on it's
because we're not actually like really
in direct competition with them right
now uh we do it accidentally and so on
but
um
there's plenty of resources and so why
would you destroy something that is so
interesting and precious well from a
scientific perspective you might probe
it yeah you might interact with it later
you might want to learn something from
it right so I wonder there's could be
certain physical phenomena that we think
is a physical phenomena but it's
actually interacting with us to like
poke the finger and see what happens I
think it should be very interesting to
scientists other alien scientists what
happened here
um and you know it's a what we're seeing
today is a snapshot basically it's a
result of a huge amount of computation
uh of over like billion years or
something like that so it could have
been initiated by aliens this could be a
computer running a program like when
okay if you had the power to do this
when you okay for sure at least I would
I would pick uh a Earth-like planet that
has the conditions based my
understanding of the chemistry
prerequisites for life
and I would see it with life and run it
right like yeah wouldn't you 100 do that
and observe it and then protect
I mean that that's not just a hell of a
good TV show it's it's a good scientific
experiment yeah and
that in his it's physical simulation
right maybe maybe the evolution is the
most like actually running it
uh is the most efficient way to uh
understand computation or to compute
stuff or to understand life or you know
what life looks like and uh what
branches it can take it does make me
kind of feel weird that we're part of a
science experiment but maybe it's
everything's a science experiments how
to does that change anything for us for
a science experiment
um I don't know two descendants of Apes
talking about being inside of a science
experience I'm suspicious of this idea
of like a deliberate Pence Premiere as
you described it service and I don't see
a divine intervention in some way in the
in the historical record right now I do
feel like
um the story in these in these books
like Nick Lane's books and so on sort of
makes sense uh and it makes sense how
life arose on Earth uniquely and uh yeah
I don't need a I need I don't need to
reach for more exotic explanations right
now sure but NPCs inside a video game
don't
don't don't observe any divine
intervention either and we might just be
all NPCs running a kind of code maybe
eventually they will currently NPCs are
really dumb but once they're running
gpts um maybe they will be like hey this
is really suspicious what the hell so
you uh famously tweeted it looks like if
you bombard Earth with photons for a
while you can emit A roadster
so if like an Hitchhiker's Guide to the
Galaxy we would summarize the story of
Earth so in in that book it's mostly
harmless
uh what do you think is all the possible
stories like a paragraph long or a
sentence long
that Earth could be summarized as once
it's done it's computation so like all
the Possible full
if Earth is a book right yeah uh
probably there has to be an ending I
mean there's going to be an end to Earth
and it could end in all kinds of ways it
could end soon it can end later what do
you think are the possible stories well
definitely there seems to be
yeah you're sort of
it's pretty incredible that these
self-replicating systems will basically
arise from the Dynamics and then they
perpetuate themselves and become more
complex and eventually become conscious
and build a society and I kind of feel
like in some sense it's kind of like a
deterministic wave uh that you know that
kind of just like happens on any you
know any sufficiently well arranged
system like Earth
and so I kind of feel like there's a
certain sense of inevitability in it
um and it's really beautiful and it ends
somehow right so it's a it's a
chemically
a diverse environment
where complex dynamical systems can
evolve and become more more further and
further complex but then there's a
certain
um
what is it there's certain terminating
conditions yeah I don't know what the
terminating conditions are but
definitely there's a trend line of
something and we're part of that story
and like where does that where does it
go so you know we're famously described
often as a biological Bootloader for AIS
and that's because humans I mean you
know we're an incredible
uh biological system and we're capable
of computation and uh you know and love
and so on
um but we're extremely inefficient as
well like we're talking to each other
through audio it's just kind of
embarrassing honestly they were
manipulating like seven symbols uh
serially we're using vocal chords it's
all happening over like multiple seconds
yeah it's just like kind of embarrassing
when you step down to the
uh frequencies at which computers
operate or are able to cooperate on and
so basically it does seem like
um synthetic intelligences are kind of
like the next stage of development and
um I don't know where it leads to like
at some point I suspect uh the universe
is some kind of a puzzle
and these synthetic AIS will uncover
that puzzle and um solve it
and then what happens after right like
what because if you just like Fast
Forward Earth many billions of years
it's like uh it's quiet and then it's
like to tourmal you see like city lights
and stuff like that and then what
happens like at the end like is it like
a
is it or is it like a calming is it
explosion is it like Earth like open
like a giant because you said emit
Roasters like well let's start emitting
like like a giant number of Like
Satellites yes it's some kind of a crazy
explosion and we're living we're like
we're stepping through a explosion and
we're like living day to day and it
doesn't look like it but it's actually
if you I saw a very cool animation of
Earth uh and life on Earth and basically
nothing happens for a long time and then
the last like two seconds like basically
cities and everything and just in the
low earth orbit just gets cluttered and
just the whole thing happens in the last
two seconds and you're like this is
exploding this is a statement explosion
so if you play
yeah yeah if you play it at normal speed
yeah it'll just look like an explosion
it's a firecracker we're living in a
firecracker where it's going to start
emitting all kinds of interesting things
yeah and then so explosion doesn't it
might actually look like a little
explosion with with lights and fire and
energy emitted all that kind of stuff
but when you look inside the details of
the explosion there's actual complexity
happening where there's like uh yeah
human life or some kind of life we hope
it's not destructive firecracker it's
kind of like a constructive uh
firecracker all right so given that I
think uh hilarious disgusting it is a
really interesting to think about like
what the puzzle of the universe is that
the creator of the universe uh give us a
message like for example in the book
contact
UM Carl Sagan uh there's a message for
Humanity for any civilization in uh
digits in the expansion of Pi and base
11 eventually which is kind of
interesting thought uh maybe maybe we're
supposed to be giving a message to our
creator maybe we're supposed to somehow
create some kind of a quantum mechanical
system that alerts them to our
intelligent presence here because if you
think about it from their perspective
it's just say like Quantum field Theory
massive like cellular automaton like
thing and like how do you even notice
that we exist you might not even be able
to pick us up in that simulation and so
how do you uh how do you prove that you
exist that you're intelligent and that
you're a part of the universe so this is
like a touring test for intelligence
from Earth yeah the Creator is uh I mean
maybe this is uh like trying to complete
the next word in a sentence this is a
complicated way of that like Earth is
just is basically sending a message back
yeah the puzzle is basically like
alerting the Creator that we exist or
maybe the puzzle is just to just break
out of the system and just uh you know
stick it to the Creator in some way uh
basically like if you're playing a video
game you can um
you can somehow find an exploit and find
a way to execute on the host machine in
the arbitrary code there's some for
example I believe someone got Mario a
game of Mario to play Pong just by uh
exploiting it and then
um creating a basically writing writing
code and being able to execute arbitrary
code in the game and so maybe we should
be maybe that's the puzzle is that we
should be um
uh find a way to exploit it so so I
think like some of these synthetic ads
will eventually find the universe to be
some kind of a puzzle and then solve it
in some way and that's kind of like the
end game somehow do you often think
about it as a as a simulation so as the
universe being a kind of computation
that has might have bugs and exploits
yes yeah I think so is that what physics
is essentially I think it's possible
that physics has exploits and we should
be trying to find them arranging some
kind of a crazy quantum mechanical
system that somehow gives you buffer
overflow somehow gives you a rounding
error and a floating Point
uh yeah that's right and like more and
more sophisticated exploits like those
are jokes but that could be actually
very close yeah we'll find some way to
extract infinite energy for example when
you train a reinforcement learning
agents um and physical simulations and
you ask them to say run quickly on the
flat ground they'll end up doing all
kinds of like weird things
um in part of that optimization right
they'll get on their back leg and they
will slide across the floor and it's
because of the optimization the
enforcement learning optimization on
that agent has figured out a way to
extract infinite energy from the
friction forces and basically their poor
implementation and they found a way to
generate infinite energy and just slide
across the surface and it's not what you
expected it's just a it's sort of like a
perverse solution and so maybe we can
find something like that maybe we can be
that little
dog in this physical simulation the the
cracks or escapes the intended
consequences of the physics that the
Universe came up with we'll figure out
some kind of shortcut to some weirdness
yeah and then oh man but see the problem
with that weirdness is the first person
to discover the weirdness like sliding
in the back legs
that's all we're going to do yeah it's
very quickly because everybody does that
thing so like the the paper clip
maximizer is a ridiculous idea but that
very well you know could be what then
we'll just uh we'll just all switch that
because it's so fun well no person will
Discover it I think by the way I think
it's going to have to be uh some kind of
a super intelligent AGI of a third
generation
like we're building the first generation
AGI you know
third generation yeah so the the
Bootloader for an AI the that AI yeah
will be a Bootloader for another AI yeah
and then there's no way for us to
introspect like what that might even uh
I think it's very likely that these
things for example like say you have
these agis it's very like for example
they will be completely inert I like
these kinds of sci-fi books sometimes
where these things are just completely
inert they don't interact with anything
and I find that kind of beautiful
because uh they probably they've
probably figured out the meta game of
the universe in some way potentially
they're they're doing something
completely beyond our imagination
um and uh they don't interact with
simple chemical life forms like why
would you do that so I find those kinds
of ideas compelling what's their source
of fun what are they what are they doing
what's the source of solving in the
universe
but inert so can you define what it
means inert so they escape
as in um
they will behave in some very like
strange way to us because they're uh
they're beyond they're playing The Meta
game uh and The Meta game is probably
say like arranging quantum mechanical
systems in some very weird ways to
extract Infinite Energy uh solve the
digital expansion of Pi to whatever
amount uh they will build their own like
little Fusion reactors or something
crazy like they're doing something
Beyond Comprehension and uh not
understandable to us and actually
brilliant under the hood what if quantum
mechanics itself is the system and we're
just thinking it's physics
but we're really parasites on on or not
parasite we're not really hurting
physics we're just living on this
organisms this organism and we're like
trying to understand it but really it is
an organism and with a deep deep
intelligence maybe physics itself is
uh the the organism that's doing a super
interesting thing and we're just like
one little thing yeah ant sitting on top
of it trying to get energy from it we're
just kind of like these particles in a
wave that I feel like is mostly
deterministic and takes uh Universe from
some kind of a big bang to some kind of
a super intelligent replicator some kind
of a stable point in the universe given
these laws of physics you don't think uh
as Einstein said God doesn't play dice
so you think it's mostly deterministic
there's no Randomness in the thing I
think it's deterministic oh there's tons
of uh well I'm I want to be careful with
Randomness pseudo random yeah I don't
like random uh I think maybe the laws of
physics are deterministic
um yeah I think they're determinants
just got really uncomfortable with this
question
do you have anxiety about whether the
universe is random or not
what's there's no Randomness uh you say
you like Goodwill Hunting it's not your
fault Andre it's not it's not your fault
man
um so you don't like Randomness uh yeah
I think it's uh unsettling I think it's
a deterministic system I think that
things that look random like say the uh
collapse of the wave function Etc I
think they're actually deterministic
just entanglement uh and so on and uh
some kind of a Multiverse Theory
something something okay so why does it
feel like we have a free will like if I
raise the hand I chose to do this now
um what
that doesn't feel like a deterministic
thing it feels like I'm making a choice
it feels like it okay so it's all
feelings it's just feelings yeah so when
an RL agent is making a choice is that
um
it's not really making a choice the
choices are all already there yeah
you're interpreting the choice and
you're creating a narrative for or
having made it yeah and now we're
talking about the narrative it's very
meta looking back what is the most
beautiful or surprising idea in deep
learning or AI in general that you've
come across you've seen this field
explode
and grow in interesting ways just what
what cool ideas like like we made you
sit back and go hmm small big or small
well the one that I've been thinking
about recently the most probably is the
the Transformer architecture
um so basically uh neural networks have
a lot of architectures that were trendy
have come and gone for different sensory
modalities like for Vision Audio text
you would process them with different
looking neural nuts and recently we've
seen these convergence towards one
architecture the Transformer and you can
feed it video or you can feed it you
know images or speech or text and it
just gobbles it up and it's kind of like
a bit of a general purpose uh computer
that is also trainable and very
efficient to run on our Hardware
and so uh this paper came out in 2016 I
want to say
um attention is all you need attention
is all you need you criticize the paper
title in retrospect that it wasn't
um it didn't foresee the bigness of the
impact yeah that it was going to have
yeah I'm not sure if the authors were
aware of the impact that that paper
would go on to have probably they
weren't but I think they were aware of
some of the motivations and design
decisions beyond the Transformer and
they chose not to I think expand on it
in that way in the paper and so I think
they had an idea that there was more
um than just the surface of just like oh
we're just doing translation and here's
a better architecture you're not just
doing translation this is like a really
cool differentiable optimizable
efficient computer that you've proposed
and maybe they didn't have all of that
foresight but I think is really
interesting isn't it funny sorry to
interrupt that title is memeable that
they went for such a profound idea they
went with the I don't think anyone used
that kind of title before right
protection is all you need yeah it's
like a meme or something exactly it's
not funny that one like uh maybe if it
was a more serious title it wouldn't
have the impact honestly I yeah there is
an element of me that honestly agrees
with you and prefers it this way yes
if it was two grand it would over
promise and then under deliver
potentially so you want to just uh meme
your way to greatness
that should be a t-shirt so you you
tweeted the Transformers the Magnificent
neural network architecture because it
is a general purpose differentiable
computer it is simultaneously expressive
in the forward pass optimizable via back
propagation gradient descent and
efficient High parallelism compute graph
can you discuss some of those details
expressive optimizable efficient
yeah for memory or or in general
whatever comes to your heart you want to
have a general purpose computer that you
can train on arbitrary problems like say
the task of next word prediction or
detecting if there's a cat in the image
or something like that
and you want to train this computer so
you want to set its weights and I think
there's a number of design criteria that
sort of overlap in the Transformer
simultaneously that made it very
successful and I think the authors were
kind of uh deliberately trying to make
this really powerful architecture and um
so basically it's very powerful in the
forward pass because it's able to
express
um very general computation as a sort of
something that looks like message
passing you have nodes and they all
store vectors and these nodes get to
basically look at each other and it's
each other's vectors and they get to
communicate and basically notes get to
broadcast hey I'm looking for certain
things and then other nodes get to
broadcast hey these are the things I
have those are the keys and the values
so it's not just the tension yeah
exactly Transformer is much more than
just the attention component it's got
many pieces architectural that went into
it the residual connection of the way
it's arranged there's a multi-layer
perceptron in there the way it's stacked
and so on
um but basically there's a message
passing scheme where nodes get to look
at each other decide what's interesting
and then update each other and uh so I
think the um when you get to the details
of it I think it's a very expressive
function uh so it can express lots of
different types of algorithms and
forward paths not only that but the way
it's designed with the residual
connections layer normalizations the
softmax attention and everything it's
also optimizable this is a really big
deal because there's lots of computers
that are powerful that you can't
optimize or they're not easy to optimize
using the techniques that we have which
is back propagation and gradient and
send these are first order methods very
simple optimizers really and so
um you also need it to be optimizable
um and then lastly you want it to run
efficiently in the hardware our Hardware
is a massive throughput machine like
gpus they prefer lots of parallelism so
you don't want to do lots of sequential
operations you want to do a lot of
operations serially and the Transformer
is designed with that in mind as well
and so it's designed for our hardware
and it's designed to both be very
expressive in a forward pass but also
very optimizable in the backward pass
and you said that uh the residual
connections support a kind of ability to
learn short algorithms fast them first
and then gradually extend them longer
during training yeah what's what's the
idea of learning short algorithms right
think of it as a so basically a
Transformer is a series of uh blocks
right and these blocks have attention
and a little multi-layer perceptron and
so you you go off into a block and you
come back to this residual pathway and
then you go off and you come back and
then you have a number of layers
arranged sequentially and so the way to
look at it I think is because of the
residual pathway in the backward path
the gradients uh sort of flow along it
uninterrupted because addition
distributes the gradient equally to all
of its branches so the gradient from the
supervision at the top uh just floats
directly to the first layer and the all
the residual connections are arranged so
that in the beginning during
initialization they contribute nothing
to the residual pathway
um so what it kind of looks like is
imagine the Transformer is kind of like
a uh python uh function like a death
and um you get to do various kinds of
like lines of code say you have a
hundred layers deep Transformer
typically they would be much shorter say
20. so if 20 lines of code then you can
do something in them and so think of
during the optimization basically what
it looks like is first you optimize the
first line of code and then the second
line of code can kick in and the third
line of code can and I kind of feel like
because of the residual pathway and the
Dynamics of the optimization you can
sort of learn a very short algorithm
that gets the approximate tensor but
then the other layers can sort of kick
in and start to create a contribution
and at the end of it you're you're
optimizing over an algorithm that is 20
lines of code
except these lines of code are very
complex because it's an entire block of
a transformer you can do a lot in there
what's really interesting is that this
Transformer architecture actually has
been a remarkably resilient basically
the Transformer that came out in 2016 is
the Transformer you would use today
except you reshuffle some of the layer
norms the layer normalizations have been
reshuffled to a pre-norm formulation and
so it's been remarkably stable but
there's a lot of bells and whistles that
people have attached on and try to uh
improve it I do think that basically
it's a it's a big step in simultaneously
optimizing for lots of properties of a
desirable neural network architecture
and I think people have been trying to
change it but it's proven remarkably
resilient but I do think that there
should be even better architectures
potentially but it's uh your you admire
the resilience here yeah there's
something profound about this
architecture that that at least so maybe
we can everything can be turned into a
uh into a problem that Transformers can
solve currently definitely looks like
the Transformers taking over Ai and you
can feed basically arbitrary problems
into it and it's a general
differentiable computer and it's
extremely powerful and uh this
convergence in AI has been really
interesting to watch uh for me
personally what else do you think could
be discovered here about Transformers
like what's surprising thing or or is it
a stable
um
I went to stable place is there
something interesting we might discover
about Transformers like aha moments
maybe has to do with memory uh maybe
knowledge representation that kind of
stuff
definitely the Zeitgeist today is just
pushing like basically right now this ad
guys is do not touch the Transformer
touch everything else yes so people are
scaling up the data sets making them
much much bigger they're working on the
evaluation making the evaluation much
much bigger and uh
um they're basically keeping the
architecture unchanged and that's how
we've um that's the last five years of
progress in AI kind of
what do you think about one flavor of it
which is language models
have you been surprised
uh
has your sort of imagination been
captivated by you mentioned GPT and all
the bigger and bigger and bigger
language models
and uh what are the limits
of those models do you think
so just let the task of natural language
basically the way GPT is trained right
is you just download a mass amount of
text Data from the internet and you try
to predict the next word in a sequence
roughly speaking you're predicting will
work chunks but uh roughly speaking
that's it and what's been really
interesting to watch is
uh basically it's a language model
language models have actually existed
for a very long time
um there's papers on language modeling
from 2003 even earlier can you explain
that case what a language model is uh
yeah so language model just basically
the rough idea is um just predicting the
next word in a sequence roughly speaking
uh so there's a paper from for example
bengio and the team from 2003 where for
the first time they were using a neural
network to take say like three or five
words and predict the
um next word and they're doing this on
much smaller data sets and the neural
net is not a Transformer it's a multiple
error perceptron but but it's the first
time that a neural network has been
applied in that setting but even before
neural networks there were language
models except they were using engram
models so engram models are just a count
based models so
um if you try to if you start to take
two words and predict the third one you
just count up how many times you've seen
any two word combinations and what came
next and what you predict that's coming
next is just what you've seen the most
of in the training set
and so language modeling has been around
for a long time neural networks have
done language modeling for a long time
so really what's new or interesting or
exciting is just realizing that when you
scale it up
with a powerful enough neural net
Transformer you have all these emergent
properties where basically what happens
is if you have a large enough data set
of text
you are in the task of predicting the
next word you are multitasking a huge
amount of different kinds of problems
you are multitasking understanding of
you know chemistry physics human nature
lots of things are sort of clustered in
that objective it's a very simple
objective but actually you have to
understand a lot about the world to make
that prediction you just said the U word
understanding uh are you in terms of
chemistry and physics and so on what do
you feel like it's doing is it searching
for the right context
uh in in like what what is it what is
the actual process Happening Here Yeah
so basically it gets a thousand words
and it's trying to predict a thousand at
first and uh in order to do that very
very well over the entire data set
available on the internet you actually
have to basically kind of understand the
context of of what's going on in there
yeah
um and uh it's a sufficiently hard
problem that you uh if you have a
powerful enough computer like a
Transformer you end up with uh
interesting Solutions and uh you can ask
it uh to all do all kinds of things and
um it it shows a lot of emerging
properties like in context learning that
was the big deal with GPT and the
original paper when they published it is
that you can just sort of uh prompt it
in various ways and ask it to do various
things and it will just kind of complete
the sentence but in the process of just
completing the sentence it's actually
solving all kinds of really uh
interesting problems that we care about
do you think it's doing something like
understanding
like and when we use the word
understanding for us humans
I think it's doing some understanding it
in its weights it understands I think a
lot about the world and it has to in
order to predict the next word in a
sequence
so let's train on the data from the
internet
uh what do you think about this this
approach in terms of data sets of using
data from the internet do you think the
internet has enough structured data to
teach AI about human civilization
yeah so I think the internet has a huge
amount of data I'm not sure if it's a
complete enough set I don't know that uh
text is enough for having a sufficiently
powerful AGI as an outcome
um of course there is audio and video
and images and all that kind of stuff
yeah so text by itself I'm a little bit
suspicious about there's a ton of things
we don't put in text in writing uh just
because they're obvious to us about how
the world works and the physics of it
and the Things fall we don't put that
stuff in text because why would you we
share that understanding
and so Texas communication medium
between humans and it's not a
all-encompassing medium of knowledge
about the world but as you pointed out
we do have video and we have images and
we have audio and so I think that that
definitely helps a lot but we haven't
trained models uh sufficiently uh across
both across all those modalities yet so
I think that's what a lot of people are
interested in but I wonder what that
shared understanding of like well we
might call Common Sense
has to be learned
inferred in order to complete the
sentence correctly so maybe the fact
that it's implied on the internet the
model is going to have to learn that not
by reading about it by inferring it in
the representation so like common sense
just like we I don't think we learn
common sense like nobody says
tells us explicitly we just figure it
all out by interacting with the world
right so here's a model of reading about
the way people interact with the world
it might have to infer that
I wonder yeah uh you you briefly worked
on a project called the world of bits
training in our RL system to take
actions on the internet
versus just consuming the internet like
we talked about do you think there's a
future for that kind of system
interacting with the internet to help
the learning yes I think that's probably
the uh the final frontier for a lot of
these models because
um so as you mentioned I was at open AI
I was working on this project world of
bits and basically it was the idea of
giving neural networks access to a
keyboard and a mouse and the idea could
possibly go wrong so basically you um
you perceive the input of the screen
pixels
and basically the state of the computer
is sort of visualized for human
consumption in images of the web browser
and stuff like that and then you give
the neural network the ability to press
keyboards and use the mouse and we're
trying to get it to for example complete
bookings and you know interact with user
interfaces and um what did you learn
from that experience like what was some
fun stuff this is super cool idea yeah I
mean it's like
uh yeah I mean the the step between
Observer to actor yeah is a super
fascinating step yeah well the universal
interface in the digital realm I would
say and there's a universal interface in
like the Physical Realm which in my mind
is a humanoid form factor kind of thing
we can later talk about Optimus and so
on but I feel like there's a
they're kind of like a similar
philosophy in some way where the human
the world the physical world is designed
for the human form and the digital world
is designed for the human form of seeing
the screen and using keyword keyboard
and mouse and so as the universal
interface that can basically uh command
the digital infrastructure we've built
up for ourselves and so it feels like a
very powerful interface to to command
and to build on top of now to your
question as to like what I learned from
that it's interesting because the world
of bits was basically uh too early I
think at open AI at the time
this is around 2015 or so and the
Zeitgeist at that time was very
different in AI from the Zeitgeist today
at the time everyone was super excited
about reinforcement learning from
scratch this is the time of the Atari
paper where uh neural networks were
playing Atari games and beating humans
in some cases uh alphago and so on so
everyone's very excited about train
training neural networks from scratch
using reinforcement learning
um directly
it turns out that reinforcement learning
is extremely inefficient way of training
neural networks because you're taking
all these actions and all these
observations and you get some sparse
rewards once in a while so you do all
this stuff based on all these inputs and
once in a while you're like told you did
a good thing you did a bad thing and
it's just an extremely hard problem you
can't learn from that you can burn
forest and you can sort of Brute Force
through it and we saw that I think with
uh you know with uh go and DOTA and so
on and it does work but it's extremely
inefficient uh and not how you want to
approach problems uh practically
speaking and so that's the approach that
at the time we also took to World of
bits we would uh have an agent
initialize randomly so with keyboard
mash and mouse mash and try to make a
booking and it's just like revealed the
insanity of that approach very quickly
where you have to stumble by the correct
booking in order to get a reward of you
did it correctly and you're never going
to stumble by it by chance at random
so even with a simple web interface
there's too many options there's just
too many options uh and uh it's two
sparse of reward signal and you're
starting from scratch at the time and so
you don't know how to read you don't
understand pictures images buttons you
don't understand what it means to like
make a booking but now what's happened
is uh it is time to revisit that and
open your eyes interested in this uh
companies like Adept are interested in
this and so on and uh the idea is coming
back because the interface is very
powerful but now you're not training an
agent from scratch you are taking the
GPT as an initialization so GPT is
pre-trained on all of text and it
understands what's a booking it
understands what's a submit it
understands um quite a bit more and so
it already has those representations
they are very powerful and that makes
all the training significantly more
efficient and makes the problem
tractable should the interaction be with
like the way humans see it with the
buttons and the language or it should be
with the HTML JavaScript and this and
the CSS what's what do you think is the
better so today all this interest is
mostly on the level of HTML CSS and so
on that's done because of computational
constraints but I think ultimately
everything is designed for human visual
consumption and so at the end of the day
there's all the additional information
is in the layout of the web page and
what's next to you and what's a red
background and all this kind of stuff
and what it looks like visually so I
think that's the final frontier as we
are taking in pixels and we're giving
out keyboard mouse commands but I think
it's impractical still today do you
worry about bots on the internet
given given these ideas given how
exciting they are do you worry about
bots on Twitter being not the the stupid
boss that we see now with the cryptobots
but the Bots that might be out there
actually that we don't see that they're
interacting in interesting ways so this
kind of system feels like it should be
able to pass the I'm not a robot click
button whatever
um which you actually understand how
that test works I don't quite like
there's there's a there's a check box or
whatever that you click it's presumably
tracking oh I see like Mouse movement
and the timing and so on yeah so exactly
this kind of system we're talking about
should be able to pass that so yeah what
do you feel about
um Bots that are language models Plus
have some interact ability and are able
to tweet and reply and so on do you
worry about that world
uh yeah I think it's always been a bit
of an arms race uh between sort of the
attack and the defense uh so the attack
will get stronger but the defense will
get stronger as well our ability to
detect that how do you defend how do you
detect how do you know that your karpate
account on Twitter is is human
how do you approach that like if people
were claim you know uh how would you
defend yourself in the court of law that
I'm a human
um this account is yeah at some point I
think uh it might be I think the society
Society will evolve a little bit like we
might start signing digitally signing uh
some of our correspondents or you know
things that we create uh right now it's
not necessary but maybe in the future it
might be I do think that we are going
towards the world where we share
we share the digital space with uh AIS
synthetic beings yeah and uh they will
get much better and they will share our
digital realm and they'll eventually
share our Physical Realm as well it's
much harder uh but that's kind of like
the world we're going towards and most
of them will be benign and awful and
some of them will be malicious and it's
going to be an arms race trying to
detect them so I mean the worst isn't
the AI is the worst is the AIS
pretending to be human so mine I don't
know if it's always malicious there's
obviously a lot of malicious
applications but yeah it could also be
you know if I was an AI I would try very
hard to pretend to be human because
we're in a human world yeah I wouldn't
get any respect as an AI yeah I want to
get some love and respect I don't think
the problem is intractable people are
people are thinking about the proof of
personhood yes and uh we might start
digitally signing our stuff and we might
all end up having like uh
yeah basically some some solution for
proof of personhood it doesn't seem to
be intractable it's just something that
we haven't had to do until now but I
think once the need like really starts
to emerge which is soon I think when
people think about it much more so but
that too will be a race because
um obviously you can probably uh spoof
or fake the the the proof of
personhood so you have to try to figure
out how to probably I mean it's weird
that we have like Social Security
numbers and like passports and stuff
it seems like it's harder to fake stuff
in the physical space than the residual
space it just feels like it's going to
be very tricky very tricky to out
um because it seems to be pretty low
cost fake stuff what are you gonna put
an AI in jail for like trying to use a
fake fake personhood proof you can I
mean okay fine you'll put a lot of AIS
in jail but there'll be more ai's
arbitrary like exponentially more the
cost of creating a bot is very low
uh unless there's some kind of way
to track accurately
like you're not allowed to create any
program without showing uh tying
yourself to that program like you any
program that runs on the internet you'll
be able to uh Trace every single human
program that was involved with that
program yeah maybe you have to start
declaring when uh you know we have to
start drawing those boundaries and
keeping track of okay uh what our
digital entities versus
human entities and uh what is the
ownership of human entities and digital
entities and uh
something like that
um
I don't know but I think I'm optimistic
that this is uh this is uh possible and
at some in some sense we're currently in
like the worst time of it because
um all these Bots suddenly have become
very capable but we don't have defenses
yet built up as a society and but I
think uh that doesn't seem to be
intractable it's just something that we
have to deal with it seems weird that
the Twitter but like really crappy
Twitter Bots are so numerous like is it
so I presume that the engineers at
Twitter are very good
so it seems like what I would infer from
that
uh is it seems like a hard problem it
they're probably catching all right if I
were to sort of steal them on the case
it's a hard problem and there's a huge
cost to uh false positive
to to removing a post by somebody that's
not a bot because creates a very bad
user experience so they're very cautious
about removing so maybe it's uh and
maybe the boss are really good at
learning what gets removed and not
such that they can stay ahead of the
removal process very quickly my
impression of it honestly is there's a
lot of blowing for it I mean yeah just
that's what I it's not subtle it's my
impression of it it's not so but you
have to yeah that's my impression as
well but it feels like maybe you're
seeing the the tip of the iceberg maybe
the number of bots is in like the
trillions and you have to like
just it's a constant assault of bots and
yeah you yeah I don't know
um you have to still man the case
because the boss I'm seeing are pretty
like obvious I could write a few lines
of code that catch these Bots I mean
definitely there's a lot of longing
fruit but I will say I agree that if you
are a sophisticated actor you could
probably create a pretty good bot right
now
um you know using tools like gpts
because it's a language model you can
generate faces that look quite good now
uh and you can do this at scale and so I
think um yeah it's quite plausible and
it's going to be hard to defend there
was a Google engineer that claimed that
the Lambda was sentient do you think
there's any inkling of Truth
to what he felt and more importantly to
me at least do you think language models
will achieve sentence or the illusion of
sentience
soonish fish yeah to me it's a little
bit of a canary Nicole mine kind of
moment honestly a little bit because uh
so this engineer spoke to like a chatbot
at Google and uh became convinced that
uh this bot is sentient yeah as there's
some existential philosophical questions
and it gave like reasonable answers and
looked real and uh and so on so to me
it's a uh
he was he was uh he wasn't sufficiently
trying to stress the system I think and
uh exposing the truth of it as it is
today
um
but uh I think this will be increasingly
harder over time uh so uh yeah I think
more and more people will basically uh
become
um
yeah I think more and more there will be
more people like that over time as this
gets better like form an emotional
connection to to an AI yeah perfectly
plausible in my mind I think these AIS
are actually quite good at human human
connection human emotion a ton of text
on the Internet is about humans and
connection and love and so on so I think
they have a very good understanding in
some in some sense of of how people
speak to each other about this and um
they're very capable of creating a lot
of that kind of text the um
there's a lot of like sci-fi from 50s
and 60s that imagined AIS in a very
different way they are calculating cold
vulcan-like machines that's not what
we're getting today we're getting pretty
emotional AIS that actually uh are very
competent and capable of generating
you know possible sounding text with
respect to all of these topics see I'm
really hopeful about AI systems that are
like companions that help you grow
develop as a human being help you
maximize long-term happiness but I'm
also very worried about AI systems that
figure out from the internet the humans
get attracted to drama and so these
would just be like shit talking AIS
that's just constantly did you hear like
they'll do gossip they'll do uh they'll
try to plant seeds of Suspicion to like
other humans that you love and trust and
just kind of mess with people uh in the
you know because because that's going to
get a lot of attention so drama maximize
drama on the path to maximizing uh
engagement and US humans will feed into
that machine yeah and get it'll be a
giant drama shitstorm
so I'm worried about that so it's the
objective function really defines the
way that human civilization progresses
with AIS in it yeah I think right now at
least today they are not sort of it's
not correct to really think of them as
goal seeking agents that want to do
something they have no long-term memory
or anything they it's literally a good
approximation of it is you get a
thousand words and you're trying to
predict a thousand at first and then you
continue feeding it in and you are free
to prompt it in whatever way you want so
in text so you say okay you are a
psychologist and you are very good and
you love humans and here's a
conversation between you and another
human human colon Something you
something and then it just continues the
pattern and suddenly you're having a
conversation with a fake psychologist
who's not trying to help you and so it's
still kind of like in a realm of a tool
it is a um people can prompt their
arbitrary ways and it can create really
incredible text but it doesn't have
long-term goals over long periods of
time it doesn't try to uh so it doesn't
look that way right now yeah but you can
do short-term goals that have long-term
effects so if my prompting
short-term goal is to get Andre capacity
to respond to me on Twitter when I
like I think AI might that's the goal
but he might figure out that talking
shit to you it would be the best in a
highly sophisticating interesting way
and then you build up a relationship
when you respond once and then it
like over time it gets to not be
sophisticated and just
like just talk shit
and okay maybe you won't get to Andre
but it might get to another celebrity it
might get into other big accounts and
then it'll just so with just that simple
goal get them to respond yeah maximize
the probability of actual response yeah
I mean you could prompt a uh powerful
model like this with their its opinion
about how to do any possible thing
you're interested in so they will
discuss they're kind of on track to
become these oracles I could I sort of
think of it that way they are oracles uh
currently is just text but they will
have calculators they will have access
to Google search they will have all
kinds of couches and gizmos they will be
able to operate the internet and find
different information and
um
yeah in some sense
that's kind of like currently what it
looks like in terms of the development
do you think it'll be an improvement
eventually over what Google is for
access to human knowledge like it'll be
a more effective search engine to access
human knowledge I think there's definite
scope in building a better search engine
today and I think Google they have all
the tools all the people they have
everything they need they have all the
puzzle pieces they have people training
Transformers at scale they have all the
data uh it's just not obvious if they
are capable as an organization to
innovate on their search engine right
now and if they don't someone else will
there's absolute scope for building a
significantly better search engine built
on these tools it's so interesting a
large company where the search there's
already an infrastructure it works as it
brings out a lot of money so where
structurally inside a company is their
motivation to Pivot yeah to say we're
going to build a new search engine yep
that's really hard so it's usually going
to come from a startup right that's um
that would be yeah or some other more
competent organization
um so uh I don't know so currently for
example maybe Bing has another shot at
it you know so Microsoft Edge because
we're talking offline
um I mean I definitely it's really
interesting because search engines used
to be about okay here's some query
here's here's here's web pages that look
like the stuff that you have but you
could just directly go to answer and
then have supporting evidence
um and these uh these models basically
they've read all the texts and they've
read all the web pages and so sometimes
when you see yourself going over to
search results and sort of getting like
a sense of like the average answer to
whatever you're interested in uh like
that just directly comes out you don't
have to do that work
um
so they're kind of like uh
yeah I think they have a way to this of
distilling all that knowledge into
like some level of insight basically do
you think of prompting as a kind of
teaching and learning like this whole
process like another layer
you know because maybe that's what
humans are we already have that
background model and then your the world
is prompting you yeah exactly I think
the way we are programming these
computers now like gpts is is converging
to how you program humans I mean how do
I program humans via prompt I go to
people and I I prompt them to do things
I prompt them from information and so uh
natural language prompt is how we
program humans and we're starting to
program computers directly in that
interface it's like pretty remarkable
honestly so you've spoken a lot about
the idea of software 2.0
um all good ideas
become like cliches so quickly like the
terms it's kind of hilarious
um it's like I think Eminem once said
that like if he gets annoyed by a song
He's written very quickly that means
it's going to be a big hit because it's
it's too catchy but uh can you describe
this idea and how you're thinking about
it has evolved over the months and years
since since you coined it yeah
yeah so I had a blog post on software
2.0 I think several years ago now
um
and the reason I wrote that post is
because I kept I kind of saw something
remarkable happening in
like software development and how a lot
of code was being transitioned to be
written not in sort of like C plus and
so on but it's written in the weights of
a neural net basically just saying that
neural Nets are taking over software the
realm of software and uh taking more and
more tasks and at the time I think not
many people understood uh this uh deeply
enough that this is a big deal it's a
big transition uh neural networks were
seen as one of multiple classification
algorithms you might use for your data
set problem on kaggle like this is not
that this is a change in how we program
computers
and I saw neural Nets as uh this is
going to take over the way we program
computers is going to change is not
going to be people writing a software in
C plus or something like that and
directly programming the software it's
going to be accumulating training sets
and data sets and crafting these
objectives by which we train these
neural Nets and at some point there's
going to be a compilation process from
the data sets and the objective and the
architecture specification into the
binary which is really just uh the
neural nut you know weights and the
forward pass of the neural net and then
you can deploy that binary and so I was
talking about that sort of transition
and uh that's what the post is about and
I saw this sort of play out in a lot of
fields uh you know autopilot being one
of them but also just a simple image
classification people thought originally
you know in the 80s and so on that they
would write the algorithm for detecting
a dog in an image and they had all these
ideas about how the brain does it and
first we detected corners and then we
detect lines and then we stitched them
up and they were like really going at it
they were like thinking about how
they're going to write the algorithm and
this is not the way you build it
and there was a smooth transition where
okay first we thought we were going to
build everything then we were building
the features uh so like Hawk features
and things like that that detect these
little statistical patterns from image
patches and then there was a little bit
of learning on top of it like a support
Vector machine or binary classifier for
cat versus dog and images on top of the
features so we wrote the features but we
trained the last layer sort of the the
classifier and then people are like
actually let's not even design the
features because we can't honestly we're
not very good at it so let's also learn
the features and then you end up with
basically a convolutional neural net
where you're learning most of it you're
just specifying the architecture and the
architecture has tons of fill in the
blanks which is all the knobs and you
let the optimization write most of it
and so this transition is happening
across the industry everywhere and uh
suddenly we end up with a ton of code
that is written in neural net weights
and I was just pointing out that the
analogy is actually pretty strong and we
have a lot of developer environments for
software 1.0 like we have Ides
um how you work with code how you debug
code how do you how do you run code how
do you maintain code we have GitHub so I
was trying to make those analogies in
the new realm like what is the GitHub or
software 2.0 it turns out that something
that looks like hugging face right now
uh you know and so I think some people
took it seriously and built cool
companies and uh many people originally
attacked the post it actually was not
well received when I wrote it and I
think maybe it has something to do with
the title but the post was not well
received and I think more people sort of
have been coming around to it over time
yeah so you were the director of AI at
Tesla where I think this idea
was really implemented at scale which is
how you have engineering teams doing
software 2.0 so can you sort of Linger
on that idea of I think we're in the
really early stages of everything you
just said which is like GitHub Ides
like how do we build engineering teams
that that work in software 2.0 systems
and and the the data collection and the
data annotation which is
all part of that software 2.0 like what
do you think is the task of programming
a software 2.0 is it debugging in the
space of hyper parameters or is it also
debugging the space of data yeah the way
by which you program the computer and
influence its algorithm is not by
writing the commands yourself you're
changing mostly the data set uh you're
changing the um loss functions of like
what the neural net is trying to do how
it's trying to predict things but yeah
basically the data sets and the
architectures of the neural net and um
so in the case of the autopilot a lot of
the data sets have to do with for
example detection of objects and Lane
line markings and traffic lights and so
on So You accumulate massive data sets
of here's an example here's the desired
label and then uh here's roughly how the
architect here's roughly what the
algorithm should look like and that's a
conclusional neural net so the
specification of the architecture is
like a hint as to what the algorithm
should roughly look like and then to
fill in the blanks process of
optimization is the training process
and then you take your neural nut that
was trained it gives all the right
answers on your data set and you deploy
it
so there's in that case perhaps it all
machine learning cases there's a lot of
tasks
so is coming up formulating a task like
uh for a multi-headed neural network is
formulating a task part of the
programming yeah very much so how you
break down a problem into a set of tasks
yeah
I'm on a high level I would say if you
look at the software running in in the
autopilot I gave a number of talks on
this topic I would say originally a lot
of it was written in software 1.0
there's imagine lots of C plus plus all
right and then gradually there was a
tiny neural net that was for example
predicting given a single image is there
like a traffic light or not or is there
a landline marking or not and this
neural net didn't have too much to do in
this in the scope of the software it was
making tiny predictions on individual
little image and then the rest of the
system stitched it up so okay we're
actually we don't have just a single
camera with eight cameras we actually
have eight cameras over time and so what
do you do with these predictions how do
you put them together how do you do the
fusion of all that information and how
do you act on it all of that was written
by humans um in C plus
and then we decided okay we don't
actually want uh to do all of that
Fusion in C plus code because we're
actually not good enough to write that
algorithm we want the neural Nets to
write the algorithm and we want to Port
uh all of that software into the 2.0
stack
and so then we actually had neural Nets
that now take all the eight camera
images simultaneously and make
predictions for all of that
so
um and and actually they don't make
predictions in a in the space of images
they now make predictions directly in 3D
and actually they don't in three
dimensions around the car and now
actually we don't
um manually fuse the predictions over in
3D over time we don't trust ourselves to
write that tracker so actually we give
the neural net uh the information over
time so it takes these videos now and
makes those predictions and so your sort
of just like putting more and more power
into the neural network processing and
at the end of it the eventual sort of
goal is to have most of the software
potentially be in the 2.0 land
um because it works significantly better
humans are just not very good at writing
software basically so the prediction is
space happening in this like 4D land
yeah was three-dimensional world over
time yeah how do you
do annotation in that world what what
have you as it's just a data annotation
whether it's self-supervised or manual
by humans is um is a big part of this
software 2.0 world right I would say by
far in the industry if you're like
talking about the industry and how what
is the technology of what we have
available everything is supervised
learning so you need data sets of input
desired output and you need lots of it
and um there are three properties of it
that you need you need it to be very
large you need it to be accurate No
mistakes and you need it to be diverse
you don't want to uh just have a lot of
correct examples of one thing you need
to really cover the space of possibility
as much as you can and the more you can
cover the space of possible inputs the
better the algorithm will work at the
end now once you have really good data
sets that you're collecting curating
um and cleaning you can train uh your
neural net
um on top of that so a lot of the work
goes into cleaning those data sets now
as you pointed out it's probably it
could be the question is how do you
achieve a ton of uh if you want to
basically predict in 3D you need data in
3D to back that up so in this video we
have eight videos coming from all the
cameras of the system and this is what
they saw and this is the truth of what
actually was around there was this car
there was this car this car these are
the lane line markings this is geometry
of the road there's a traffic light in
this three-dimensional position you need
the ground truth
um and so the big question that the team
was solving of course is how do you how
do you arrive at that ground truth
because once you have a million of it
and it's large clean and diverse then
training a neural network on it works
extremely well and you can ship that
into the car
and uh so there's many mechanisms by
which we collected that training data
you can always go for human annotation
you can go for simulation as a source of
ground truth you can also go for what we
call the offline tracker
um
that we've spoken about at the AI day
and so on which is basically an
automatic reconstruction process for
taking those videos and recovering the
three-dimensional sort of reality of
what was around that car so basically
think of doing like a three-dimensional
reconstruction as an offline thing and
then understanding that okay there's 10
seconds of video this is what we saw and
therefore here's all the lane last cars
and so on and then once you have that
annotation you can train your neural
Nets to imitate it and how difficult is
the reconstruct the 3D reconstruction
it's difficult but it can be done so
there's so the there's overlap between
the cameras and you do the
Reconstruction and there's uh
perhaps if there's any inaccuracy so
that's caught in The annotation step
uh yes the nice thing about The
annotation is that it is fully offline
you have infinite time you have a chunk
of one minute and you're trying to just
offline in a super computer somewhere
figure out where were the positions of
all the cars all the people and you have
your full one minute of video from all
the Angles and you can run all the
neural Nets you want and they can be
very efficient massive neural Nets there
can be neural Nets that can't even run
in the car later at this time so they
can be even more powerful neurons than
what you can eventually deploy so you
can do anything you want
three-dimensional reconstruction neural
Nets uh anything you want just to
recover that truth and then you
supervise that truth
what have you learned you said no
mistakes about humans
doing annotation because I assume humans
are uh there's like a range of things
they're good at in terms of clicking
stuff on screen it's not how interesting
is that to you of a problem of designing
an annotator where humans are accurate
enjoy it like what are they even the
metrics are efficient or productive all
that kind of stuff yeah so uh I grew The
annotation team at Tesla from basically
zero to a thousand uh while I was there
that was really interesting you know my
background is a PhD student researcher
so growing that common organization was
pretty crazy uh but uh yeah I think it's
extremely interesting and part of the
design process very much behind the
autopilot as to where you use humans
humans are very good at certain kinds of
annotations they're very good for
example at two-dimensional annotations
of images they're not good at annotating
uh cars over time in three-dimensional
space very very hard and so that's why
we were very careful to design the tasks
that are easy to do for humans versus
things that should be left to the
offline tracker like maybe the maybe the
computer will do all the triangulation
and 3D reconstruction but the human will
say exactly these pixels of the image
are car exactly these pixels are human
and so co-designing the the data
annotation pipeline was very much bread
and butter was what I was doing daily do
you think there's still a lot of open
problems in that space
um just in general annotation where the
stuff the machines are good at machines
do and the humans do what they're good
at and there's maybe some iterative
process right I think to a very large
extent we went through a number of
iterations and we learned a ton about
how to create these data sets I'm not
seeing big open problems like originally
when I joined I was like I was really
not sure how this would turn out yeah
but by the time I left I was much more
secure in actually we sort of understand
the philosophy of how to create these
data sets and I was pretty comfortable
with where that was at the time so what
are strengths and limitations of cameras
for the driving test in your
understanding when you formulate the
driving task as a vision task with eight
cameras
you've seen that the entire you know
most of the history of the computer
vision field when it has to do with
neural networks what just if you step
back what are the strengths and
limitations of pixels of using pixels to
drive yeah pixels I think are a
beautiful sensory beautiful sensor I
would say the thing is like cameras are
very very cheap and they provide a ton
of information ton of bits uh so it's uh
extremely cheap sensor for a ton of bits
and each one of these bits as a
constraint on the state of the world and
so you get lots of megapixel images uh
very cheap and it just gives you all
these constraints for understanding
what's actually out there in the world
so vision is probably the highest
bandwidth sensor
it's a very high bandwidth sensor and um
I love that pixels it is a is a
constraint on the world This is highly
complex
uh high bandwidth constraint in the
world on the stage of the world that's
fascinating it's not just that but again
this real real importance of
it's the sensor that humans use
therefore everything is designed for
that sensor yeah the text the writing
the flashing signs everything is
designed for vision and so and you just
find it everywhere and so that's why
that is the interface you want to be in
um talking again about these Universal
interfaces and uh that's where we
actually want to measure the world as
well and then develop software uh for
that sensor but there's other
constraints on the state of the world
that humans use to understand the world
I mean Vision ultimately is the main one
but we're like we're like referencing
our understanding of human behavior and
some common sense
physics that could be inferred from
vision from from a perception
perspective but it feels like we're
using some kind of reasoning
to predict the world yeah not just the
pixels I mean you have a powerful prior
uh sorry right for how the world evolves
over time Etc so it's not just about the
likelihood term coming up from the data
itself telling you about what you are
observing but also the prior term of
like where where are the likely things
to see and how do they likely move and
so on and the question is how complex is
the uh
the the range of possibilities that
might happen in the driving task right
that's still is is that to you still an
open problem of how difficult is driving
like philosophically speaking
like do you all the time you've worked
on driving do you understand how hard
driving is yeah driving is really hard
because it has to do with the
predictions of all these other agents
and the theory of mind and you know what
they're gonna do and are they looking at
you are they where are they looking what
are they thinking yeah there's a lot
that goes there at the at the full tail
of you know the the expansion of the
nines that we have to be comfortable
with eventually the final problems are
of that form I don't think those are the
problems that are very common uh I think
eventually they're important but it's
like really in the tail end in the tail
and the rare edge cases
from the vision perspective what are the
toughest parts of the vision problem of
driving
um
well basically the sensor is extremely
powerful but you still need to process
that information
um and so going from brightnesses of
these pixel values to hey here the
three-dimensional world is extremely
hard and that's what the neural networks
are fundamentally doing and so
um the difficulty really is in just
doing an extremely good job of
engineering the entire pipeline uh the
entire data engine having the capacity
to train these neural nuts having the
ability to evaluate the system and
iterate on it uh so I would say just
doing this in production at scale is
like the hard part it's an execution
problem so the data engine but also the
um the sort of deployment of the system
such that has low latency performance so
it has to do all these steps yeah for
the neural net specifically just making
sure everything fits into the chip on
the car yeah and uh you have a finite
budget of flops that you can perform and
uh and memory bandwidth and other
constraints and you have to make sure it
flies and you can squeeze in as much
compute as you can into the tiny what
have you learned from that process
because it maybe that's one of the
bigger like new things coming from a
research background
where there's there's a system that has
to run under heavily constrained
resources right has to run really fast
what what kind of insights have you uh
learned from that
yeah I'm not sure if it's if there's too
many insights you're trying to create a
neural net that will fit in what you
have available and you're always trying
to optimize it and we talked a lot about
it on the AI day and uh basically the
the triple backflips that the team is
doing to make sure it all fits and
utilizes the engine uh so I think it's
extremely good engineering
um and then there's also all kinds of
little insights peppered in on how to do
it properly let's actually zoom out
because I don't think we talked about
the data engine the entirety of the
layout of this idea that I think is just
beautiful with humans in the loop can
you describe the data engine
yeah the data engine is what I call the
almost biological feeling like process
by which you uh perfect the training
sets for these neural networks
um so because most of the programming
now is in the level of these data sets
and make sure they're large diverse and
clean oh basically you have a data set
that you think is good you train your
neural net you deploy it and then you
observe how well it's performing and
you're trying to uh always increase the
quality of your data set so you're
trying to catch scenarios basically
there are basically rare and uh it is in
these scenarios that the neural Nets
will typically struggle in because they
weren't told what to do in those rare
cases in the data set but now you can
close the loop because if you can now
collect all those at scale you can then
feed them back into the Reconstruction
process I described and uh reconstruct
the truth in those cases and add it to
the data set and so the whole thing ends
up being like a staircase of improvement
of perfecting your training set and you
have to go through deployments so that
you can mine uh the parts that are not
yet represented well in the data set so
your data set is basically imperfect it
needs to be diverse it has pockets there
are missing and you need to pad out the
pockets you can sort of think of it that
way
in the data what role do humans play in
this so what's the uh this biological
system like a human body is made up of
cells what what role like how do you
optimize the human uh system the the
multiple Engineers collaborating
figuring out what to focus on what to
contribute which which task to optimize
in this neural network
uh who's in charge of figuring out which
task needs more data
can you speak to the hyper parameters
the human uh system right it really just
comes down to extremely good execution
from an engineering team and does what
they're doing they understand
intuitively the philosophical insights
underlying the data engine and the
process by which the system improves and
uh how to again like delegate the
strategy of the data collection and how
that works and then just making sure
it's all extremely well executed and
that's where most of the work is is not
even the philosophizing or the research
or the ideas of it it's just extremely
good execution it's so hard when you're
dealing with data at that scale so your
role in the data engine executing well
on it it is difficult and extremely
important is there a priority of like uh
like a vision board of saying like
we really need to get better at stop
lights
yeah like the the prioritization of
tasks is that essentially and that comes
from the data that comes to um a very
large extent to what we are trying to
achieve in the product for a map where
we're trying to the release we're trying
to get out
um in the feedback from the QA team
worth it where the system is struggling
or not the things we're trying to
improve and the QA team gives some
signal some information
in aggregate about the performance of
the system in various conditions and
then of course all of us drive it and we
can also see it it's really nice to work
with the system that you can also
experience yourself you know it drives
you home it's is there some insight you
can draw from your individual experience
that you just can't quite get from an
aggregate statistical analysis of data
yeah it's so weird right yes it's it's
not scientific in a sense because you're
just one anecdotal sample yeah I think
there's a ton of uh it's a source of
truth it's your interaction with the
system yeah and you can see it you can
play with it you can perturb it you can
get a sense of it you have an intuition
for it I think numbers just like have a
way of numbers and plots and graphs are
you know much harder yeah it hides a lot
of it's like if you train a language
model
it's a really powerful way is by you
interacting with it yeah 100 try to
build up an intuition yeah I think like
Elon also like he always wanted to drive
the system himself he drives a lot and
uh I'm gonna say almost daily so uh he
also sees this as a source of Truth you
driving the system uh and it performing
and yeah so what do you think tough
questions here uh so Tesla last year
removed radar from um from the sensor
suite and now just announced that it's
going to remove all ultrasonic sensors
relying solely on Vision so camera only
does that make the perception problem
harder or easier
I would almost reframe the question in
some way so the thing is basically you
would think that additional sensors by
the way can I just interrupt good I
wonder if a language model will ever do
that if you prompt it let me reframe
your question that would be epic this is
the wrong problem sorry it's like a
little bit of a wrong question because
basically you would think that these
sensors are an asset to you yeah but if
you fully consider the entire product in
its entirety
these sensors are actually potentially
reliability
because these sensors aren't free they
don't just appear on your car you need
something you need to have an entire
supply chain you have people procuring
it there can be problems with them they
may need replacement they are part of
the manufacturing process they can hold
back the line in production you need to
Source them you need to maintain them
you have to have teams that write the
firmware all of it and then you also
have to incorporate and fuse them into
the system in some way and so it
actually like bloats the organ the a lot
of it and I think Elon is really good at
simplify simplified best part is no part
and he always tries to throw away things
that are not essential because he
understands the entropy in organizations
and approach and I think uh in this case
the cost is high and you're not
potentially seeing it if you're just a
computer vision engineer and I'm just
trying to improve my network and you
know is it more useful or less useful
how useful is it and the thing is if
once you consider the full cost of a
sensor it actually is potentially a
liability and you need to be really sure
that it's giving you extremely useful
information in this case we looked at
using it or not using it and the Delta
was not massive and so it's not useful
is it also blow in the data engine like
having more sensors
is a distraction and these sensors you
know they can change over time for
example you can have one type of say
radar you can have other type of radar
they change over time I suddenly need to
worry about it now suddenly you have a
column in your sqlite telling you oh
which sensor type was it and they all
have different distributions and then uh
they can they just they contribute noise
and entropy into everything and they
bloat stuff and also organizationally
has been really fascinating to me that
it can be very distracting
um if you if all if you only want to get
to work is Vision all the resources are
on it and you're building out a data
engine and you're actually making
forward progress because that is the the
sensor with the most bandwidth the most
constraints on the world and you're
investing fully into that and you can
make that extremely good if you're uh
you're only a finite amount of sort of
spend of focus across different facets
of the system and uh this kind of
reminds me of Rich Sutton's a bitter
lesson it just seems like simplifying
the system yeah
in the long run now of course you don't
know what the long run it seems to be
always the right solution yeah yes in
that case it was 4rl but it seems to
apply generally across all systems that
do computation yeah so where uh what do
you think about the lidar as a crutch
debate
uh the battle between point clouds and
pixels
yeah I think this debate is always like
slightly confusing to me because it
seems like the actual debate should be
about like do you have the fleet or not
that's like the really important thing
about whether you can achieve a really
good functioning of an AI system at this
scale so data collection systems yeah do
you have a fleet or not it's
significantly more important whether you
have lidar or not it's just another
sensor
um and uh
yeah I think similar to the radar
discussion basically I um
but yeah I don't think it it um
basically doesn't offer extra extra
information is extremely costly it has
all kinds of problems you have to worry
about it you have to calibrate it Etc it
creates bloat and entropy you have to be
really sure that you need this uh this
um sensor in this case I basically don't
think you need it and I think honestly I
will make a stronger statement I think
the others some of the other uh
companies are using it are probably
going to drop it yeah so you have to
consider the sensor in the full
in considering can you build a big Fleet
that collects a lot of data and can you
integrate that sensor with that that
data and that sensor into a data engine
that's able to quickly find different
parts of the data that then continuously
improves whatever the model that you're
using yeah another way to look at it is
like vision is necessary in a sense that
uh the drive the world is designed for
human visual consumption so you need
vision is necessary and then also it is
sufficient because it has all the
information that you that you need for
driving and humans obviously is a vision
to drive so it's both necessary and
sufficient so you want to focus
resources and you have to be really sure
if you're going to bring in other
sensors you could you could you could
add sensors to Infinity at some point
you need to draw the line and I think in
this case you have to really consider
the full cost of any One sensor that
you're adopting and do you really need
it and I think the answer in this case
is no so what do you think about the
idea of the that the other companies
are forming high resolution maps and
constraining heavily the geographic
regions in which they operate is that
approach not in your in your view
um not going to scale over time to the
entirety of the United States I think
I'll take two as you mentioned like they
pre-map all the environments and they
need to refresh the map and they have a
perfect centimeter level accuracy map of
everywhere they're going to drive it's
crazy how are you going to
when we're talking about autonomy
actually changing the world we're
talking about the deployment
on a on a global scale of autonomous
systems for transportation and if you
need to maintain a centimeter accurate
map for Earth or like for many cities
and keep them updated it's a huge
dependency that you're taking on huge
dependency
it's a massive massive dependency and
now you need to ask yourself do you
really need it
and humans don't need it
um right so it's it's very useful to
have a low-level map of like okay the
connectivity of your road you know that
there's a fork coming up when you drive
an environment you sort of have that
high level understanding it's like a
small Google Map and Tesla uses Google
Map like similar kind of resolution
information in the system but it will
not pre-map environments to send me a
level accuracy it's a crutch it's a
distraction it costs entropy and it
diffuses the team it dilutes the team
and you're not focusing on what's
actually necessary which is the computer
vision problem
what did you learn about machine
learning about engineering about life
about yourself as one human being from
working with Elon Musk
I think the most I've learned is about
how to sort of run organizations
efficiently and how to
create efficient organizations and how
to fight entropy in an organization so
human Engineering in the fight against
entropy yeah there's a there's a I think
Elon is a very efficient warrior in the
fight against entropy in organizations
what is the entropy in an organization
look like exactly it's process it's
it's process and inefficiencies and that
kind of stuff yeah meetings he hates
meetings he keeps telling people to skip
meetings if they're not useful
um he basically runs the world's biggest
uh startups I would say uh Tesla SpaceX
are the world's biggest startups Tesla
actually has multiple startups I think
it's better to look at it that way and
so I think he's he's extremely good at
uh at that and uh yeah he's a very good
intuition for streamline processes
making everything efficient uh best part
is no part uh simplifying focusing
um and just kind of removing barriers uh
moving very quickly making big moves all
this is a very startupy sort of seeming
things but at scale so strong drive to
simplify for me from your perspective I
mean that
um that also probably applies to just
designing systems and machine learning
and otherwise yeah like simplify
simplify yes
what do you think is the secret to
maintaining the startup culture in a
company that grows is there
can you introspect that
I do think you need someone in a
powerful position with a big hammer like
Elon who's like the cheerleader for that
idea and ruthless ruthlessly pursues it
if no one has a big enough Hammer
everything turns into committees
democracy within the company uh process
talking to stakeholders decision making
just everything just crumbles yeah if
you have a big person who's also really
smart and has a big hammer things move
quickly so you said your favorite scene
in interstellar is the intense docking
scene with the AI and Cooper talking
saying uh Cooper what are you doing
docking it's not possible no it's
necessary
such a good line by the way just so many
questions there why in AI
in that scene presumably is supposed to
be
able to compute a lot more than the
human is saying it's not optimal why the
human I mean that's a movie but
shouldn't they AI know much better than
the human
anyway uh what do you think is the value
of setting seemingly impossible goals
so like uh
our initial intuition which seems like
something that
you have taken on that Elon espouses
that where the initial intuition of the
community might say this is very
difficult and then you take it on anyway
with a crazy deadline you're just from a
human engineering perspective
um
uh have you seen the value of that
I wouldn't say that setting impossible
goals exactly is is a good idea but I
think setting very ambitious goals is a
good idea I think there's a what I call
sublinear scaling of difficulty uh which
means that 10x problems are not 10x hard
usually 10x 10x harder problem is like 2
or 3x harder to execute on because if
you want to actually like if you want to
improve the system by 10 it costs some
amount of work and if you want to 10x
improve the system it doesn't cost you
know 100x amount of the work and it's
because you fundamentally change the
approach and it if you start with that
constraint then some approaches are
obviously dumb and not going to work and
it forces you to reevaluate
um and I think it's a very interesting
way of approaching problem solving but
it requires a weird kind of thinking
it's just going back to your like PhD
days it's like how do you think which
ideas in in the machine Learning
Community are solvable yes it's uh it
requires what is that I mean there's the
cliche of first prince people's thinking
but like it requires to basically ignore
what the community is saying because
doesn't the community doesn't a
community in science usually draw lines
of what isn't isn't possible right and
like it's very hard to break out of that
without going crazy yep I mean I think a
good example here is you know the Deep
learning revolution in some sense
because you could be in computer vision
at that time when during the Deep
learning sort of revolution of 2012 and
so on uh you could be improving your
computer vision stack by 10 or we can
just be saying actually all this is
useless and how do I do 10x better
computer vision well it's not probably
by tuning a hog feature detector I need
a different approach
um I need something that is scalable
going back to uh Richard Sutton's um and
understanding sort of like the
philosophy of the uh bitter lesson and
then being like actually I need a much
more scalable system like a neural
network that in principle works and then
having some deep Believers that can
actually execute on that mission and
make it work so that's the 10x solution
what do you think is the timeline to
solve the problem of autonomous driving
this still in part an open question
yeah I think the tough thing with
timelines of self-driving obviously is
that no one has created self-driving
yeah so it's not like what do you think
is a timeline to build this bridge well
we've built million Bridges before
here's how long that takes it's it you
know it's uh no one has built autonomy
it's not obvious uh some parts turn out
to be much easier than others so it's
really hard to forecast you do your best
based on trend lines and so on and based
on intuition but that's why
fundamentally it's just really hard to
forecast this no one has even still like
being inside of it is hard to uh to do
yes some things turn out to be much
harder and some things turn out to be
much easier
do you try to avoid making forecasts
because like Elon doesn't avoid them
right and heads of car companies in the
past have not avoided it either uh Ford
and other places have made predictions
that we're going to solve at level four
driving by 2020 2021 whatever and now
they're all kind of Backtrack on that
prediction
IU as a
as an AI person
do you free yourself privately make
predictions or do they get in the way of
like your actual ability to think about
a thing
yeah I would say like what's easy to say
is that this problem is tractable and
that's an easy prediction to make
extractable it's going to work yes it's
just really hard some things turn out to
be harder than some things turn out to
be easier uh so uh but it definitely
feels tractable and it feels like at
least the team at Tesla which is what I
saw internally is definitely on track to
that how do you form
a uh strong representation that allows
you to make a prediction about
tractability so like you're the leader
of a lot a lot of humans
you have to kind of say this is actually
possible
like how do you build up that intuition
it doesn't have to be even driving it
could be other tasks it could be um and
I wonder what difficult tasks did you
work on in your life I mean
classification
achieving certain just an image that
certain level of superhuman level
performance yeah expert intuition
it's just intuition it's belief
so just like thinking about it long
enough like studying looking at sample
data like you said driving
uh my intuition has really flawed on
this like I don't have a good intuition
about tractability it could be either it
could be anything it could be solvable
like uh you know the driving task could
could be simplified into something quite
trivial like uh the solution to the
problem would be quite trivial and at
scale more and more cars driving
perfectly
might make the problem much easier Yeah
the more cars you have driving like
people learn how to drive correctly not
correctly but in a way that's more
optimal for a heterogeneous system of
autonomous and semi-autonomous and
manually driven cars that could change
stuff then again also I've spent a
ridiculous number of hours just staring
at pedestrians crossing streets thinking
about humans and it feels like the way
we use our eye contact
it sends really strong signals and
there's certain quirks and edge cases of
behavior and of course a lot of the
fatalities that happen have to do with
drunk driving and
um both on The Pedestrian side and the
driver's side so there's that problem of
driving at night and all that kind of
yeah so I wonder you know it's like the
space
of possible solution to autonomous
driving includes so many human factor
issues
that it's almost impossible to predict
there could be super clean nice
Solutions yeah I would say definitely
like to use a game analogy there's some
fog of War but you definitely also see
the frontier of improvement and you can
measure historically how much you've
made progress and I think for example at
least what I've seen in uh roughly five
years at Tesla when I joined it barely
kept laying on the highway I think going
up from Palo Alto to SF was like three
or four interventions anytime the road
would do anything geometrically or turn
too much it would just like not work and
so going from that to like a pretty
competent system in five years and
seeing what happens also under the hood
and what the scale which the team is
operating now with respect to data and
compute and everything else uh is just a
massive progress
so there's a you're climbing a mountain
and it's fog but you're making a lot of
progress fog you're making progress and
you see what the next directions are and
you're looking at some of the remaining
challenges and they're not like uh
they're not perturbing you and they're
not changing your philosophy and you're
not contorting yourself you're like
actually these are the things that we
still need to do yeah the fundamental
components of solving the problems seem
to be there for the data engine to the
compute to the the computer on the car
to the compute for the training all that
kind of stuff
so you've done
uh over the years you've been a test
you've done a lot of amazing uh
breakthrough ideas and Engineering all
of it
um from the data engine to The Human
Side all of it can you speak to why you
chose to leave Tesla basically as I
described that ran I think over time
during those five years I've kind of uh
gotten myself into a little bit of a
managerial position most of my days were
you know meetings and growing the
organization and making decisions about
sort of high level strategic decisions
about the team and what it should be
working on and so on and uh
it's kind of like a corporate executive
role and I can do it I think I'm okay at
it but it's not like fundamentally what
I what I enjoy and so I think uh when I
joined uh there was no computer vision
team because Tesla was just going from
the transition of using mobileye a
third-party vendor for all of its
computer vision to having to build its
computer vision system so when I showed
up there were two people training deep
neural networks and they were training
them at a computer at their at their
legs like uh
kind of basic classification task yeah
and so
I kind of like grew that into what I
think is a fairly respectable deep
learning team a massive compute cluster
a very good um data annotation
organization and uh I was very happy
with where that was it became quite
autonomous and so I kind of stepped away
and I uh you know I'm very excited to do
much more technical things again
yeah and kind of like we focus on AGI
what was this soul searching like
because you took a little time off and
think like what um how many mushrooms
did you take no I'm just uh I mean what
what was going through your mind the
human lifetime is finite yeah he did a
few incredible things you're you're one
of the best teachers of AI in the world
you're one of the best and I don't mean
that I mean that in the best possible
way you're one of the best tinkerers in
the AI world meaning like understanding
the fundamental fundamentals of how
something works by building it from
scratch and playing with it with the
basic intuitions it's like Einstein
feinmen were all really good at this
kind of stuff like a small example of a
thing to to play with it to try to
understand it uh so that and obviously
now with us that you help build a team
of machine learning
um uh like engineers and a system that
actually accomplishes something in the
real world so given all that like what
was the soul searching like
well it was hard because obviously I
love the company a lot and I love I love
Elon I love Tesla I want um
it was hard to leave I love the team
basically
um but
yeah I think actually I would
potentially like interested in
revisiting it maybe coming back at some
point uh working in Optimus working in
AGI at Tesla uh I think Tesla is going
to do incredible things it's basically
like
uh it's a massive large-scale robotics
kind of company with a ton of In-House
talent for doing really incredible
things and I think uh
human robots are going to be amazing I
think autonomous transportation is going
to be amazing all this is happening at
Tesla so I think it's just a really
amazing organization so being part of it
and helping it along I think was very
basically I enjoyed that a lot yeah it
was basically difficult for those
reasons because I love the company uh
but you know I'm happy to potentially at
some point come back for act two but I
felt like at this stage
I built the team it felt autonomous and
uh I became a manager and I wanted to do
a lot more technical stuff I wanted to
learn stuff I wanted to teach stuff and
uh I just kind of felt like it was a
good time for for a change of pace a
little bit what do you think is uh the
best movie sequel of all time speaking
of part two because like because most of
them suck in movie sequels yeah and you
tweet about movies so just in a tiny
tangent is there what's your what was
like a favorite movie sequel
Godfather Part Two
um are you a fan of Godfather because
you didn't even tweet or mention the
Godfather yeah I don't love that movie I
know it hasn't edit that out we're gonna
edit out the hate towards the Godfather
how dare you just I think I will make a
strong statement I don't know why I
don't know why but I basically don't
like any movie before 1995
something like that didn't you mention
Terminator two okay okay that's like uh
Terminator 2 was a little bit later 1990
no I think Terminator 2 was a name I
like Terminator one as well so okay so
like a few exceptions but by and large
for some reason I don't like movies
before 1995 or something they feel very
slow the camera is like zoomed out it's
boring it's kind of naive it's kind of
weird and also Terminator was very much
ahead of its time yes and The Godfather
there's like no AGI
[Laughter]
I mean but you have Good Will Hunting
was one of the movies you mentioned and
that doesn't have any AGI either I guess
that's mathematics yeah I guess
occasionally I do enjoy movies that
don't feature or like Anchorman that has
no that's the increment it's so good I
don't understand
um speaking of AGI because I don't
understand why Will Ferrell is so funny
it doesn't make sense it doesn't compute
there's just something about him and
he's a singular human because you don't
get that many comedies
these days and I wonder if it has to do
about the culture uh or the like the
machine of Hollywood or does it have to
do with just we got lucky with certain
people and comedy it came together
because he is a singular human
that was a ridiculous tangent I
apologize but you mentioned humanoid
robot so what do you think about Optimus
about Tesla bot do you think we'll have
robots in the factory in in the home in
10 20 30 40 50 years yeah I think it's a
very hard project I think it's going to
take a while but who else is going to
build humano robots at scale yeah and I
think it is a very good form factor to
go after because like I mentioned the
the world is designed for humanoid form
factor these things would be able to
operate our machines they would be able
to sit down in chairs uh potentially
even drive cars uh basically the world
is designed for humans that's the form
factor you want to invest into and make
work over time uh I think you know
there's another school of thought which
is okay pick a problem and design a
robot to it but actually designing a
robot and getting a whole data engine
and everything behind it to work is
actually an incredibly hard problem so
it makes sense to go after General
interfaces that uh okay they are not
perfect for any one given task but they
actually have the generality of just
with a prompt with English able to do
something across and so I think it makes
a lot of sense to go after a general uh
interface
um in the physical world and I think
it's a very difficult project I think
it's going to take time but I see no
other no other company that can execute
on that Vision I think it's going to be
amazing like uh basically physical labor
like if you think transportation is a
large Market try physical labor insane
well but it's not just physical labor to
me the thing that's also exciting is the
social robotics so the the relationship
we'll have on different levels with
those robots that's why I was really
excited to see Optimus like um people
have criticized me for the excitement
but I've I've worked with uh uh a lot of
research Labs that do humanoid legged
robots Boston Dynamics unitary a lot
there's a lot of companies that do
legged robots but that's the the
Elegance of the movement is a tiny tiny
part of the big picture so integrating
the two big exciting things to me about
Tesla doing humanoid or any Lego robots
is
clearly integrating it into the data
engine so the the data engine aspect so
the actual intelligence for the
perception and the and the control and
the planning and all that kind of stuff
integrating into this huge the fleet
that you mentioned right
um and then speaking of Fleet the second
thing is the mass manufacturers Just
knowing
uh culturally
uh driving towards a simple robot that's
cheap to produce at scale yeah and doing
that well having experience to do that
well that changes everything that's why
that's a very different culture and
style than Boston Dynamics who by the
way those those robots are just the the
way they move it's uh like it'll be a
very long time before Tesla could
achieve the smoothness of movement but
that's not what it's about it's it's
about uh it's about the entirety of the
system like we talked about the data
engine and the fleet that's super
exciting even the initial sort of models
uh but that too was really surprising
that in a few months you can get a
prototype yep and the reason that
happened very quickly is as you alluded
to there's a ton of copy based from
what's happening in the autopilot yes a
lot the amount of expertise that like
came out of the Woodworks at Tesla for
building the human robot was incredible
to see like basically Elon said at one
point we're doing this and then
next day basically like all these CAD
models started to appear and people talk
about like the supply chain and
Manufacturing and uh people showed up
with like screwdrivers and everything
like the other day and started to like
put together the body and I was like
whoa like all these people exist at
Tesla and fundamentally building a car
is actually not that different from
building a robot the same and that is
true uh not just for uh the hardware
pieces and also let's not forget
Hardware not just for a demo but
manufacturing of that Hardware at scale
is like a whole different thing but for
software as well basically this robot
currently thinks it's a car
uh it's gonna have a midlife crisis at
some point it thinks it's a car
um some of the earlier demos actually we
were talking about potentially doing
them outside in the parking lot because
that's where all of the computer vision
that was like working out of the box
instead of like in inside
um but all the operating system
everything just copy pastes uh computer
vision mostly copy paste I mean you have
to retrain the neural Nets but the
approach and everything in data engine
and offline trackers and the way we go
about the occupancy tracker and so on
everything copy paste you just need to
retrain the neural Lots uh and then the
planning control of course has to change
quite a bit but there's a ton of copy
paste from what's happening at Tesla and
so if you were to if you were to go with
goal of like okay let's build a million
human robots and you're not Tesla that's
that's a lot to ask if you're a Tesla
it's actually like
it's not it's not that crazy and then
the the follow-up question is and how
difficult just like we're driving how
difficult is the manipulation task uh
such that it can have an impact at scale
I think
depending on the context the really nice
thing about robotics is the um unless
you do a manufacturing that kind of
stuff is there's more room for error
driving is so safety critical and so
that and also time critical robot is
allowed to move slower which is nice yes
I think it's going to take a long time
but the way you want to structure the
development is you need to say okay it's
going to take a long time how can I set
up the uh product development roadmap so
that I'm making Revenue along the way
I'm not setting myself up for a zero one
loss function where it doesn't work
until it works you don't want to be in
that position you want to make it useful
almost immediately and then you want to
slowly deploy it uh and uh at scale and
you want to set up your data engine your
improvement Loops the Telemetry the
evaluation the harness and everything
and you want to improve the product over
time incorrectly and you're making
Revenue along the way that's extremely
important because otherwise you cannot
build these these uh large undertakings
just like don't make sense economically
and also from the point of view of the
team working on it they need the
dopamine along the way they're not just
going to make a promise about this being
useful this is going to change the world
in 10 years when it works this is not
where you want to be you want to be in a
place like I think autopilot is today
where it's offering increased safety and
um and uh convenience of driving today
people pay for it people like it people
purchase it and then you also have the
greater mission that you're working
towards
and you see that so the dopamine for the
team that that was a source of Happiness
yes you're deploying this people like it
people drive it people pay for it they
care about it there's all these YouTube
videos your grandma drives it she gives
you feedback people like it people
engage with it you engage with it huge
do uh people that drive Teslas like
recognize you and give you love like uh
like hey thanks for the for the this
nice feature that it's doing yeah I
think the tricky thing is like some
people really love you some people
unfortunately like you're working on
something that you think is extremely
valuable useful Etc some people do hate
you there's a lot of people who like
hate me and the team and whatever the
whole project and I think they have
Tesla drivers uh many cases they're not
actually yeah that's that's actually
makes me sad about humans or the current
the ways that humans interact I think
that's actually fixable I think humans
want to be good to each other I think
Twitter and social media is part of the
mechanism that actually somehow makes
the negativity more viral but it doesn't
deserve like disproportionately uh add
of like a viral viral boost yeah
negativity but like I wish people would
just get excited about uh so suppress
some of the jealousy some of the ego and
just get excited for others and then
there's a Karma aspect to that you get
excited for others they'll get excited
for you same thing in Academia if you're
not careful there's a like a dynamical
system there if you if you think of in
silos and get jealous of somebody else
being successful that actually perhaps
counterintuitively uh leads the less
productivity of you as a community and
you individually I feel like if you keep
celebrating others that actually makes
you more successful yeah I think people
haven't in depending on the industry
haven't quite learned that yet yeah some
people are also very negative and very
vocal so they're very prominently
featured but actually there's a ton of
people who are cheerleaders but they're
silent cheerlead cheerleaders and uh
when you talk to people just in the
world they will all tell you it's
amazing it's great especially like
people who understand how difficult it
is to get this stuff working like people
who have built products and makers
entrepreneur entrepreneurs like make
making this work and changing something
is is incredibly hard those people are
more likely to cheerlead you well one of
the things that makes me sad is some
folks in the robotics Community uh don't
do the cheerleading and they should
there's uh because they know how
difficult it is well they actually
sometimes don't know how difficult it is
to create a product at scale right they
actually deploy in the real world a lot
of the
development of robots and AI systems is
done on very specific small benchmarks
um and as opposed to real world
conditions yes
yeah I think it's really hard to work on
robotics in academic setting or AI
systems that apply in the real world you
you've criticized you uh flourished and
loved for time the imagenet the famed
image in that data set and I've recently
had some words uh of criticism that the
academic research ml Community gives a
little too much love still to the
imagenet or like those kinds of
benchmarks can you speak to the
strengths and weaknesses of data sets
used in machine learning research
actually I don't know that I recall the
specific instance where I was uh unhappy
or criticizing imagenet I think imagenet
has been extremely valuable uh it was
basically a benchmark that allowed the
Deep Learning Community to demonstrate
that deep neural networks actually work
it was uh there's a massive value in
that um so I think imagenet was useful
but um basically it's become a bit of an
eminist at this point so eminist is like
the 228 by 28 grayscale digits there's
kind of a joke data set that everyone
like just crushes if there's no Papers
written on MNS though right maybe they
should have strong papers like papers
that focus on like how do we learn with
a small amount of data that kind of
stuff yeah I could see that being
helpful but not in sort of like Mainline
computer vision research anymore of
course I think the way I've heard you
somewhere maybe I'm just imagining
things but I think you said like image
that was a huge contribution to the
community for a long time and now it's
time to move past those kinds of well
image that has been crushed I mean you
know the error rates are
uh
yeah we're getting like 90 accuracy in
in one thousand classification way uh
prediction and I've seen those images
and it's like really high that's really
that's really good if I remember
correctly the top five error rate is now
like one percent or something given your
experience with a gigantic real world
data set would you like to see
benchmarks move in certain directions
that the research Community uses
unfortunately I don't think academics
currently have the next imagenet uh
We've obviously I think we've crushed
mnist we've basically kind of crushed
imagenet uh and there's no next sort of
big Benchmark that the entire Community
rallies behind and uses
um
you know for further development of
these networks uh yeah what it takes for
data set to Captivate the imagination of
everybody like where they all get behind
it that that could also need like a
viral like a leader right you know
somebody with popularity I mean that
yeah why did image of that take off
is there or is it just the accident of
History it was the right amount of
difficult uh it was the right amount of
difficult and simple and uh interesting
enough it just kind of like it was it
was the right time for that kind of a
data set
question from Reddit
uh what are your thoughts on the role
that synthetic data and game engines
will play in the future of neural net
model development
I think
um as neural Nets converge to humans
uh the value of simulation to neural
Nets will be similar to value of
simulation to humans
so people use simulation for uh
people use simulation because they can
learn something in that kind of a system
and without having to actually
experience it
um but are you referring to the
simulation we're doing our head no sorry
simulation I mean like video games or uh
you know other forms of simulation for
various professionals well so let me
push back on that because maybe their
simulation that we do in our heads like
simulate if I do this
what do I think will happen Okay that's
like internal simulation yeah internal
isn't that what we're doing let's
assuming before we act oh yeah but
that's independent from like the use of
uh simulation in the sense of like
computer games or using simulation for
training set creation or you know is it
independent or is it just Loosely
correlated because like uh
isn't that useful to do like um
counterfactual or like Edge case
simulation to like
you know what happens if there's a
nuclear war
what happens if there's you know like
those kinds of things yeah that's a
different simulation from like Unreal
Engine that's how I interpreted the
question uh so like
simulation of the average case
is that what's Unreal Engine what what
what what what do you mean by Unreal
Engine so
simulating a world yeah physics of that
world
why is that different like because you
also can add Behavior to that world and
you can try all kinds of stuff right
like you could throw all kinds of weird
things into it so Unreal Engine is not
just about similar I mean I guess it is
about submitting the physics of the
world it's also doing something with
that
yeah the graphics the physics and the
Agents that you put into the environment
and stuff like that yeah see I think you
I feel like you said that it's not that
important I guess for the future of AI
development is that is that correct to
interpret you that way uh I think
humans use uh simulators
for um humans use simulators and they
find them useful and so computers will
use simulators and find them useful
okay so you're saying it's not I I don't
use simulators very often I play a video
game every once in a while but I don't
think I derive any wisdom about my own
existence from from those video games
it's a momentary escape from reality
versus a source of wisdom about reality
so I don't so I think that's a very
polite way of saying simulation is not
that useful
yeah maybe maybe not I don't see it as
like a fundamental really important part
of like training neural Nets currently
uh but I think uh as neural Nets become
more and more powerful I think you will
need fewer examples to train additional
behaviors and uh simulation is of course
there's a domain Gap in a simulation
that's not the real world there's
slightly something different but uh with
a powerful enough neural net uh you need
um The Domain Gap can be bigger I think
because neural network will sort of
understand that even though it's not the
real world it like has all this high
level structure that I'm supposed to be
able to learn from so then you'll know
we'll actually
yeah you'll be able to Leverage
the synthetic data better yes by closing
the get better understanding in which
ways this is not real data exactly
uh right to do better questions next
time that was that was a question but
I'm just kidding all right um
so is it possible do you think speaking
of feminist to construct neural Nets and
training processes that require very
little data
so we've been talking about huge data
sets like the internet for training I
mean one way to say that is like you
said like the querying itself is another
level of training I guess and that
requires a little data yeah but do you
see any uh value in doing research and
kind of going down the direction of can
we use very little data to train to
construct a knowledge base 100 I just
think like at some point you need a
massive data set and then when you
pre-train your massive neural nut and
get something that you know is like a
GPT or something then you're able to be
very efficient at training any arbitrary
new task uh so a lot of these gpts you
know you can do tasks like sentiment
analysis or translation or so on just by
being prompted with very few examples
here's the kind of thing I want you to
do like here's an input sentence here's
the translation into German input
sentence translation to German input
sentence blank and the neural network
will complete the translation to German
just by looking at sort of the example
you've provided and so that's an example
of a very few shot uh learning in the
activations of the neural net instead of
the weights of the neural land and so I
think
basically uh just like humans neural
Nets will become very data efficient at
learning any other new task but at some
point you need a massive data set to
pre-train your network
to get that and probably we humans have
something like that do we do we have
something like that do we have a passive
in the background
background model constructing thing that
just runs all the time in a
self-supervised way we're not conscious
of it I think humans definitely I mean
obviously we have uh we learn a lot
during during our life span but also we
have a ton of Hardware that helps us
initialize initialization coming from
sort of evolution and so I think that's
also a really big a big component a lot
of people in the field I think they just
talk about the amounts of like seconds
and the you know that a person has lived
pretending that this is a table arasa
sort of like a zero initialization of a
neural net and it's not like you can
look at a lot of animals like for
example zebras zebras get born and they
see and they can run there's zero train
data in their lifespan they can just do
that so somehow I have no idea how
Evolution has found a way to encode
these algorithms and these neural net
initializations are extremely good to 80
CGS and I have no idea how this works
but apparently it's possible because
here's a proof by existence there's
something magical about going from a
single cell to an organism that is born
to the first few years of life I kind of
like the idea that the reason we don't
remember anything about the first few
years of our life is that it's a really
painful process like it's a very
difficult challenging
training process yeah like
intellectually like
and maybe yeah I mean I don't why don't
we remember any of that there might be
some crazy training going on and the
that maybe that's the background model
training that uh is is very painful and
so it's best for the system once it's
trained not to remember how it's
constructed I think it's just like the
hardware for long-term memory is just
not fully developed sure I kind of feel
like the first few years of uh of
infants is not actually like learning
it's brain maturing yeah
um we're born premature
um and there's a theory along those
lines because of the birth canal and the
swelling of the brain and so we're born
premature and then the first few years
we're just the brains maturing and then
there's some learning eventually
um
it's my current view on it what do you
think do you think neural Nets can have
long-term memory
like that approach is something like
humans do you think you know do you
think there needs to be another meta
architecture on top of it to add
something like a knowledge base that
learns facts about the world and all
that kind of stuff yes but I don't know
to what extent it will be explicitly
constructed
um it might take unintuitive forms where
you are telling the GPT like hey you
have a you have a declarative memory
bank to which you can store and retrieve
data from and whenever you encounter
some information that you find useful
just save it to your memory bank and
here's an example of something you have
retrieved and Heiser how you say it and
here's how you load from it you just say
load whatever you teach it in text in
English and then it might learn to use a
memory bank from from that oh so so the
neural net is the architecture for the
background model the the base thing and
then yeah everything else is just on top
of this it's not just a text right it's
you're giving it gadgets and gizmos so
uh you're teaching in some kind of a
special Language by which we can it can
save arbitrary information and retrieve
it at a later time and you're telling
about these special tokens and how to
arrange them to use these interfaces
it's like hey you can use a calculator
here's how you use it just do five three
plus four one equals and when equals is
there uh a calculator will actually read
out the answer and you don't have to
calculate it yourself and you just like
tell it in English this might actually
work do you think in that sense gato is
interesting the the Deep Mind system
that it's not just new language but
actually throws it all
uh in the same pile images actions all
that kind of stuff that's basically what
we're moving towards yeah I think so so
gato is uh is very much a kitchen sink
approach to like
um reinforcement learning lots of
different environments with a single
fixed Transformer model right
um I think it's a very sort of early
result in that in that realm but I think
uh yeah it's along the lines of what I
think things will eventually look like
right so this is the early days of a
system that eventually will look like
this like from a rigid Rich sudden
perspective yeah I'm not super huge fan
of I think all these interfaces that
like look very different
um I would want everything to be
normalized into the same API so for
example it's green pixels versus same
API instead of having like different
world environments at a very different
physics and Joint configurations and
appearances and whatever and you're
having some kind of special tokens for
different games that you can plug I'd
rather just normalize everything to a
single interface so it looks the same to
the neural net if that makes sense so
it's all going to be pixel based pong in
the end I think so
okay uh let me ask you about your own
personal life
a lot of people want to know you're one
of the most productive and brilliant
people in the history of AI what is a
productive day in the life of Andre
capathi look like
what time do you wake up because imagine
um some kind of dance between the
average productive day and a perfect
productive day so the perfect productive
day is the thing we strive
towards in the average is kind of what
it kind of converges to getting all the
mistakes and human eventualities and so
on yeah so what times you wake up are
you morning person I'm not a morning
person I'm a night owl for sure I think
stable or not that's semi-stable like a
eight or nine or something like that
during my PhD it was even later I used
to go to sleep usually at 3am I think uh
the am hours are are precious and very
interesting time to work because
everyone is asleep
um at 8 AM or 7 A.M the east coast is
awake so there's already activity
there's already some text messages
whatever there's stuff happening you can
go in like some news website and there's
stuff happening it's distracting uh at
3am everything is totally quiet and so
you're not going to be bothered and you
have solid chunks of time to do your
work
um so I like those periods Night Owl by
default and then I think like productive
time basically
um what I like to do is you need you
need to like build some momentum on the
problem without too much distraction and
um you need to load your Ram uh your
working memory with that problem
and then you need to be obsessed with it
when you're taking shower when you're
falling asleep you need to be obsessed
with the problem and it's fully in your
memory and you're ready to wake up and
work on it right there so there's a
scale of uh is this in a scale temporal
scale of a single day or a couple of
days a week a month so I can't talk
about one day basically in isolation
because it's a whole process when I want
to get when I want to get productive in
the problem I feel like I need a span of
a few days where I can really get in on
that problem and I don't want to be
interrupted and I'm going to just uh be
completely obsessed with that problem
and that's where I do most of my good
workouts
you've done a bunch of cool like little
projects in a very short amount of time
very quickly so that that requires you
just focusing on it yeah basically I
need to load my working memory with the
problem and I need to be productive
because there's always like a huge fixed
cost to approaching any problem uh you
know like I was struggling with this for
example at Tesla because I want to work
on like small side projects but okay you
first need to figure out okay I need to
SSH into my cluster I need to bring up a
vs code editor so I can like work on
this I need to I run into some stupid
error because of some reason like you're
not at a point where you can be just
productive right away you are facing
barriers and so it's about uh really
removing all that barrier and you're
able to go into the problem and you have
the full problem loaded in your memory
and somehow avoiding distractions of all
different forms like uh news stories
emails but also distractions from other
interesting projects that you previously
worked on are currently working on and
so on you just want to really focus your
mind and I mean I can take some time off
for distractions and in between but I
think it can't be too much uh you know
most of your day is sort of like spent
on that problem and then you know I
drink coffee I have my morning routine I
look at some news uh Twitter Hacker News
Wall Street Journal Etc
so basically you wake up you have some
coffee are you trying to get to work as
quickly as possible do you do taking
this diet of of like what the hell's
happening in the world first I am I do
find it interesting to know about the
world I don't know that it's useful or
good but it is part of my routine right
now so I do read through a bunch of news
articles and I want to be informed and
um I'm suspicious of it I'm suspicious
of the practice but currently that's
where I am Oh you mean suspicious about
the positive effect yeah of that
practice on your productivity and your
well-being my well-being psychologically
uh and also on your ability to deeply
understand the world because how there's
a bunch of sources of information you're
not really focused on deeply integrating
yeah it's a little bit distracting or
yeah in terms of a perfectly productive
day for how long of a stretch of time
in one session do you try to work and
focus on a thing it's a couple hours is
it one hours or 30 minutes is 10 minutes
I can probably go like a small few hours
and then I need some breaks in between
for like food and stuff and uh
yeah but I think like uh it's still
really hard to accumulate hours I was
using a Tracker that told me exactly how
much time I've spent coding any one day
and even on a very productive day I
still spent only like six or eight hours
yeah and it's just because there's so
much padding commute talking to people
food Etc there's like the cost of life
just living and sustaining and
homeostasis and just maintaining
yourself as a human is very high and and
there seems to be a desire within the
human mind to to uh to participate in
society that creates that padding yeah
because I yeah the most productive days
I've ever had is just completely from
start to finish just tuning out
everything yep and just sitting there
and then and then you could do more than
six and eight hours yeah is there some
wisdom about what gives you strength to
do like uh tough days of long Focus
yeah just like whenever I get obsessed
about a problem something just needs to
work something just needs to exist it
needs to exist and you so you're able to
deal with bugs and programming issues
and technical issues and uh design
decisions that turn out to be the wrong
ones you're able to think through all of
that given given that you want to think
to exist yeah it needs to exist and then
I think to me also a big factor is uh
you know are other humans are going to
appreciate it are they going to like it
that's a big part of my motivation if
I'm helping humans and they seem happy
they say nice things uh they tweet about
it or whatever that gives me pleasure
because I'm doing something useful so
like you do see yourself sharing it with
the world like with yes on GitHub with a
blog post or through videos yeah I was
thinking about it like suppose I did all
these things but did not share them I
don't think I would have the same amount
of motivation that I can build up you
enjoy the feeling of other people
uh gaining value and happiness from the
stuff you've created yeah
uh what about diet
is there I saw you playing with in
intermittent fast do you fast does that
help with everything
well the things you played what's been
most beneficial to the your ability to
mentally focus on a thing and just meant
the mental productivity and happiness
you still fast yeah it's so fast but I
do intermittent fasting but really what
it means at the end of the day is I skip
breakfast yeah so I do uh 18 6 roughly
by default when I'm in my steady state
if I'm traveling or doing something else
I will break the rules but in my steady
state I do 18 6 so I eat only from 12 to
6. not a hard Rule and I break it often
but that's my default and then um yeah
I've done a bunch of random experiments
for the most part right now uh where
I've been for the last year and a half I
want to say is I'm um plant-based or
planned forward I heard plant forward it
sounds better exactly I didn't actually
know the differences but it sounds
better in my mind but it just means I
prefer plant-based food and raw or
cooked or I prefer cooked uh and blunt
paste so plant-based
oh forgive me I don't actually know how
wide the category of plant entails
Wellness just means that you're not uh
and you can flex and uh you just prefer
to eat plants and you know you're not
making you're not trying to influence
other people and if someone is you come
to someone's house party and they serve
you a stake that they're really proud of
you will eat it yes right it's just not
judgment oh that's beautiful I mean
that's
um on the flip side of that but I'm very
sort of flexible have you tried doing
one meal a day uh I have uh accidentally
not consistently but I've accidentally
had that I don't I don't like it I think
it makes me feel uh not good it's too
it's too much too much of a hit yeah and
uh So currently I have about two meals a
day 12 and six I do that non-stop I'm
doing it now I'm doing one meal a day
okay so it's interesting it's a
interesting feeling have you ever fasted
longer than a day yeah I've done a bunch
of water fasts because I was curious
what happens uh anything interesting
yeah I would say so I mean you know
what's interesting is that you're hungry
for two days and then starting day three
or so you're not hungry it's like such a
weird feeling because you haven't eaten
in a few days and you're not hungry
isn't that weird it's really one of the
many weird things about human biology is
figure something out it finds finds
another source of energy or something
like that or uh relaxes the system I
don't know how yeah the body is like
you're hungry you're hungry and then it
just gives up it's like okay I guess
we're fasting now there's nothing and
then it just kind of like focuses on
trying to make you not hungry uh and you
know not feel the the damage of that and
uh trying to give you some space to
figure out the food situation
so are you still to this day most
productive uh at night I would say I am
but it is really hard to maintain my PhD
schedule
um especially when I was say working at
Tesla and so on it's a non-starter so
but even now like you know people want
to meet for
various events they Society lives in a
certain period of time and you sort of
have to like work so that's it's hard to
like do a social thing and then after
that return and do work yeah it's just
really hard
uh that's why I try to do social things
I try not to do too uh too much drinking
so I can return and continue doing work
um but a Tesla is there is there
conversions in Tesla but any any company
is there a convergence towards the
schedule or is there more
is that how humans behave when they
collaborate I need to learn about this
yeah do they try to keep a consistent
schedule you're all awake at the same
time I mean I do try to create a routine
and I try to create a steady state in
which I'm uh comfortable in uh so I have
a morning routine I have a day routine I
try to keep things to do a steady state
and um things are predictable and then
you can sort of just like your body just
sort of like sticks to that and if you
try to stress that a little too much it
will create uh you know when you're
traveling and you're dealing with jet
lag you're not able to really Ascend to
you know where you need to go yeah yeah
that's weird as humans with the habits
and stuff uh what are your thoughts on
work-life balance throughout a human
lifetime
so the testing part was known for sort
of pushing people to their limits
in terms of what they're able to do in
terms of what they're uh trying to do in
terms of how much they work all that
kind of stuff yeah I mean I will say
teslaq is still too much uh bad rep for
this because what's happening is Tesla
is a it's a bursting environment uh so I
would say the Baseline uh my only point
of reference is Google where I've
interned three times and I saw what it's
like inside Google and and deepmind
um I would say the Baseline is higher
than that but then there's a punctuated
equilibrium where once in a while
there's a fire and uh someone like
people work really hard and so it's
spiky and bursty and then all the
stories get collected about the bursts
yeah and then it gives the appearance of
like total insanity but actually it's
just a bit more intense environment and
there are fires and Sprints and so I
think uh you know definitely though I I
would say
um it's a more intense environment than
something you would get but you in your
person forget all of that just in your
own personal life
um what do you think about
the happiness of a human being a
brilliant person like yourself
about finding a balance between work and
life or is it such a thing not a good
thought experiment
yeah I think I think balance is good but
I also love to have Sprints that are out
of distribution and that's what I think
I've been pretty uh creative and
um as well Sprints out of distribution
means that most of the time
you have a yeah quote-unquote balance I
have balance most of the time yes I like
being obsessed with something once in a
while once in a while is what once a
week once a month once a year yeah
probably like say once a month or
something yeah and that's when we get a
new GitHub repo come on yeah that's when
you like really care about a problem it
must exist this will be awesome you're
obsessed with it and now you can't just
do it on that day you need to pay the
fixed cost of getting into the groove
and then you need to stay there for a
while and then Society will come and
they will try to mess with you and they
will try to distract you yeah yeah the
worst thing is like a person who's like
I just need five minutes of your time
yeah this is the cost of that is not
five minutes and Society needs to change
how it thinks about just five minutes of
your time right it's never it's never
just one minute it's just 30 it's just a
quick what's the big deal why are you
being so yeah no
uh what's your computer setup what uh
what's like the perfect are you somebody
that's flexible to no matter what laptop
four screens yeah uh or do you uh prefer
a certain setup that you're most
productive um I guess the one that I'm
familiar with is one large screen uh 27
inch
um and my laptop on the side with
operating system I do Max that's my
primary for all tasks I would say OS X
but when you're working on deep learning
everything as Linux your SSH into a
cluster and you're working remotely but
what about the actual development like
that using the IDE yeah you would use uh
I think a good way is you just run vs
code
um my favorite editor right now on your
Mac but you are actually you have a
remote folder through SSH
um so the actual files that you're
manipulating are on the cluster
somewhere else so what's the best IDE
uh vs code what else do people so I use
emacs still that's cool uh so it may be
cool I don't know if it's maximum
productivity
um so what what do you recommend in
terms of editors you worked with a lot
of software Engineers editors for
python C plus plus machine learning
applications I think the current answer
is vs code currently I believe that's
the best
um IDE it's got a huge amount of
extensions it has a GitHub co-pilot
um uh integration which I think is very
valuable what do you think about the the
co-pilot integration I was actually uh I
got to talk a bunch with Guido and
Rossum who's a creative Python and he
loves Coppola he like he programs a lot
with it yeah uh do you
yeah he's copilot I love it and uh it's
free for me but I would pay for it yeah
I think it's very good and the utility
that I found with it was is in is it I
would say there is a learning curve and
you need to figure out when it's helpful
and when to pay attention to its outputs
and when it's not going to be helpful
where you should not pay attention to it
because if you're just reading its
suggestions all the time it's not a good
way of interacting with it but I think I
was able to sort of like mold myself to
it I find it's very helpful number one
in copy paste and replace some parts so
I don't um when the pattern is clear
it's really good at completing the
pattern and number two sometimes it
suggests apis that I'm not aware of so
it tells you about something that you
didn't know so and that's an opportunity
to discover and you it's an opportunity
to see I would never take copilot code
AS given I almost always uh copy a copy
this into a Google Search and you see
what this function is doing and then
you're like oh it's actually actually
exactly what I need thank you copilot so
you learned something so it's in part a
search engine apart maybe getting the
exact syntax correctly that once you see
it yep it's that NP hard thing it's like
once you see it you know yes exactly
correct exactly you yourself you can
struggle you can verify efficiently but
you you can't generate efficiently and
copilot really I mean it's it's
autopilot for programming right and
currently it's doing the link following
which is like the simple copy paste and
sometimes suggest uh but over time it's
going to become more and more autonomous
and so the same thing will play out in
not just coding but actually across many
many different things probably but
coding is an important one right like
writing programs yeah what how do you
see the future of that developing uh the
program synthesis like being able to
write programs that are more and more
complicated because right now it's human
supervised in interesting ways yes like
what it feels like the transition will
be very painful
my mental model for it is the same thing
will happen as with the autopilot uh So
currently it's doing link following is
doing some simple stuff and eventually
we'll be doing autonomy and people will
have to intervene less and less and
there could be like you like testing
mechanisms
like if it writes a function and that
function looks pretty damn correct but
how do you know it's correct because
you're like getting lazier and lazier as
a programmer like your ability to
because like little bugs but I guess it
won't make a little no it will it
copilot will make uh off by one subtle
bugs it has done that to me but do you
think future systems will or is it
really the off by one is actually a
fundamental challenge of programming in
that case it wasn't fundamental and I
think things can improve but uh yeah I
think humans have to supervise I am
nervous about people not supervising
what comes out and what happens to for
example the proliferation of bugs in all
of our systems I'm nervous about that
but I think and there will probably be
some other copilots for bug finding and
stuff like that at some point because
there will be like a lot more automation
for uh oh man
so it's like a program a co-pilot that
generates a compiler for one that does a
linter yes one that does like a a type
Checker yes
it's a committee of like a GPT sort of
like and then they'll be like a manager
for the committee yeah and then there'll
be somebody that says a new version of
this is needed we need to regenerate it
yeah there were 10 gpts that were
forwarded and gave 50 suggestions
another one looked at it and picked a
few that they like a bug one looked at
it and it was like it's probably a bug
they got re-ranked by some other thing
and then a final Ensemble uh GPT comes
in it's like okay given everything you
guys have told me this is probably the
next token you know the feeling is the
number of programmers in the world has
been growing and growing very quickly do
you think it's possible that it'll
actually level out and drop to like a
very low number with this kind of world
because then you'll be doing software
2.0 programming
um and you'll be doing this kind of
generation of copilot type systems
programming but you won't be doing the
old school
software 1.0 program I don't currently
think that they're just going to replace
human programmers
um
it's I'm so hesitant saying stuff like
this right because this is going to be
replaced in five years I don't know it's
going to show that like this is where we
thought because I I agree with you but I
think we might be very surprised
right like what are the next
I I what's your sense of what we're
seeing with language models like does it
feel like the beginning or the middle or
the end the beginning 100 I think the
big question in my mind is for sure GPT
will be able to program quite well
competently and so on how do you steer
the system you still have to provide
some guidance to what you actually are
looking for and so how do you steer it
and how do you say how do you talk to it
how do you um
audit it and verify that what is done is
correct and how do you like work with
this and it's as much not just an AI
problem but a UI ux problem yeah
um so beautiful fertile ground for so
much interesting work for vs code plus
plus where you're not just it's not just
human programming anymore it's amazing
yeah so you're interacting with the
system so not just one prompt but it's
iterative prompting yeah you're trying
to figure out having a conversation with
the system yeah that actually I mean to
me that's super exciting to have a
conversation with the program I'm
writing
yeah maybe at some point uh you're just
conversing with it it's like okay here's
what I want to do actually this variable
maybe it's not even that low level as
variable but you can also Imagine like
can you translate this to c plus and
back to python yeah that already kind of
existence no but just like doing it as
part of the program experience like I
think I'd like to write this function in
C plus plus
or or like you just keep changing for
different uh different programs because
they're different syntax maybe I want to
convert this into a functional language
and so like you get to become
multilingual as a programmer and dance
back and forth efficiently yeah I mean I
think the UI ux of it though is like
still very hard to think through because
it's not just about writing code on a
page you have an entire developer
environment you have a bunch of hardware
on it uh you have some environmental
variables you have some scripts that are
running in the Chrome job like there's a
lot going on to like working with
computers and how do these uh systems
set up environment flags and work across
multiple machines and set up screen
sessions and automate different
processes like how all that works and
it's auditable by humans and so on is
like massive question at the moment
you've built archive sanity
what is archive and what is the future
of academic research publishing that you
would like to see so archive is This
pre-print Server so if you have a paper
you can submit it for publication to
journals or conferences and then wait
six months and then maybe get a decision
pass or fail or you can just upload it
to Archive
and then people can tweet about it three
minutes later and then everyone sees it
everyone reads it and everyone can
profit from it uh in their own ways you
can cite it and it has an official look
to it it feels like a pub like it feels
like a publication process yeah it feels
different than you if you just put in a
blog post oh yeah yeah I mean it's a
paper and usually the the bar is higher
for something that you would expect on
archive as opposed to and something you
would see in a blog post well the
culture created the bar because you
could probably yes host a pretty crappy
face for an archive
um so what's that make you feel like
what's that make you feel about peer
review so rigorous peer review by two
three experts versus the peer review of
the community right as it's written yeah
basically I think the community is very
well able to peer review things very
quickly on Twitter and I think maybe it
just has to do something with AI machine
learning field specifically though I
feel like things are more easily
auditable um and the verification is is
easier potentially than the verification
somewhere else so it's kind of like um
you can think of these uh scientific
Publications there's like little
blockchains where everyone's building on
each other's work and setting each other
and you sort of have ai which is kind of
like this much faster and loose
blockchain but then you have and any one
individual entry is like very um very
cheap to make and then you have other
fields where maybe that model doesn't
make as much sense
um and so I think in AI at least things
are pretty easily verifiable and so
that's why when people upload papers
they're a really good idea and so on
people can try it out like the next day
and they can be the final Arbiter
whether it works or not on their problem
and the whole thing just moves
significantly faster so I kind of feel
like Academia still has a place sorry
this like conference Journal process
still has a place but it's sort of like
an um it lags behind I think and it's a
bit more maybe higher quality process
but it's not sort of the place where you
will discover Cutting Edge work anymore
yeah it used to be the case when I was
starting my PhD that you go to
conferences and journals and you discuss
all the latest research now when you go
to a conference or Journal like no one
discusses anything that's there because
it's already like three generations ago
irrelevant yes which makes me sad about
like deepmind for example where they
they still they still publish in nature
and these big prestigious I mean there's
still value as opposed to The Prestige
that comes with these big venues but the
the result is that they they'll announce
some breakthrough performance and it'll
take like a year to actually publish the
details I mean and those details in if
they were published immediately would
Inspire the community to move in certain
directions with that yeah it would speed
up the rest of the community but I don't
know to what extent that's part of their
objective function also that's true so
it's not just the prestige a little bit
of the delay is uh is part yeah they
certainly deepmind specifically has been
um working in the regime of having a
slightly higher quality basically
process and latency and uh publishing
those papers that way another question
from Reddit do you or have you suffered
from imposter syndrome being the
director of AI Tesla being this person
when you're at Stanford where like the
world looks at you as the expert in AI
to teach teach the world about machine
learning when I was leaving Tesla after
five years I spent a ton of time in
meeting rooms uh and you know I would
read papers in the beginning when I
joined Tesla I was writing code and then
I was writing lesson last code and I was
reading code and then I was reading
lesson less code and so this is just a
natural progression that happens I think
and uh definitely I would say near the
tail end that's when it sort of like
starts to hit you a bit more that you're
supposed to be an expert but actually
the source of Truth is the code that
people are writing the GitHub and the
actual the actual code itself and you're
not as familiar with that as you used to
be and so I would say maybe there's some
like insecurity there yeah that's
actually pretty profound that a lot of
the insecurity has to do with not
writing the code in the computer science
space like that because that is the
truth that that right there code is the
source of Truth the papers and
everything else it's a high level
summary I don't uh yeah just a high
level summary but at the end of the day
you have to read code it's impossible to
translate all that code into actual uh
you know paper form uh so when when
things come out especially when they
have a source code available that's my
favorite place to go so like I said
you're one of the greatest teachers of
machine learning AI ever uh from cs231n
to today what advice would you give to
beginners interested in getting into
machine learning
beginners are often focused on like what
to do and I think the focus should be
more like how much you do so I I'm kind
of like believer on a high level in this
10 000 hours kind of concept where
you just kind of have to just pick the
things where you can spend time and you
you care about and you're interested in
you literally have to put in 10 000
hours of work
um it doesn't even like matter as much
like where you put it and your you'll
iterate and you'll improve and you'll
waste some time I don't know if there's
a better way you need to put in 10 000
hours but I think it's actually really
nice because I feel like there's some
sense of determinism about uh being an
expert at a thing if you spend ten
thousand hours you can literally pick an
arbitrary thing and I think if you spend
ten thousand hours of deliberate effort
and work you actually will become an
expert at it and so I think it's kind of
like a nice thought
um and so uh basically I would focus
more on like are you spending 10 000
hours that's what I focus on so and then
thinking about what kind of mechanisms
maximize your likelihood of getting to
ten thousand dollars exactly which for
us silly humans means probably forming a
daily habit of like every single day
actually doing the thing whatever helps
you so I do think to a large extent is a
psychological problem for yourself uh
one other thing that I help that I think
is helpful for the psychology of it is
many times people compare themselves to
others in the area I think this is very
harmful only compare yourself to you
from some time ago like say a year ago
are you better than you year ago this is
the only way to think
um and I think this then you can see
your progress and it's very motivating
that's so interesting that focus on the
quantity of ours because I think a lot
of people uh in the beginner stage but
actually throughout get paralyzed
uh by uh the choice like which one do I
pick this path or this path yeah like
they'll literally get paralyzed by like
which ID to use well they're worried
yeah they're worried about all these
things but the thing is some of the you
you will waste time doing something
wrong yes you will eventually figure out
it's not right you will accumulate scar
tissue and next time you'll grow
stronger because next time you'll have
the scar tissue and next time you'll
learn from it and now next time you come
into a similar situation you'll be like
all right
I messed up I've spent a lot of time
working on things that never materialize
into anything and I have all that scar
tissue and I have some intuitions about
what was useful what wasn't useful how
things turned out uh so all those
mistakes were uh were not dead work you
know so I just think you should just
focus on working what have you done what
have you done last week
uh that's a good question actually to
ask for for a lot of things not just
machine learning
um it's a good way to cut the
the I forgot what the term will use but
the fluff the blubber whatever the uh
the inefficiencies in life uh what do
you love about teaching you seem to find
yourself
often in the like drawn to teaching
you're very good at it but you're also
drawn to it I mean I don't think I love
teaching I love happy humans and happy
humans like when I teach yes I I
wouldn't say I hate teaching I tolerate
teaching but it's not like the act of
teaching that I like it's it's that um
you know I I have some I have something
I'm actually okay at it yes I'm okay at
teaching and people appreciate it a lot
yeah and uh so I'm just happy to try to
be helpful and uh teaching itself is not
like the most I mean it's really it can
be really annoying frustrating I was
working on a bunch of lectures just now
I was reminded back to my days of 231
and just how much work it is to create
some of these materials and make them
good the amount of iteration and thought
and you go down blind alleys and just
how much you change it so creating
something good
um in terms of like educational value is
really hard and uh it's not fun it's
difficult so for people should
definitely go watch your new stuff you
put out there are lectures where you're
actually building the thing like from
like you said the coldest truth so
discussing back propagation by building
it by looking through and just the whole
thing so how difficult is that to
prepare for I think that's a really
powerful way to teach how did you have
to prepare for that or are you just live
thinking through it I will typically do
like say three takes and then I take
like the the better take uh so I do
multiple takes and I take some of the
better takes and then I just build out a
lecture that way uh sometimes I have to
delete 30 minutes of content because it
just went down the Nelly that I didn't
like too much there's about a bunch of
iteration and it probably takes me you
know somewhere around 10 hours to create
one hour of content to give one hour
it's interesting I mean is it difficult
to go back to the like the basics do you
draw a lot of like wisdom from going
back to the basics yeah going back to
back propagation loss functions where
they come from and one thing I like
about teaching a lot honestly is it
definitely strengthens your
understanding uh so it's not a purely
altruistic activity it's a way to learn
if you have to explain something to
someone uh you realize you have gaps in
knowledge uh and so I even surprised
myself in those lectures like also the
result will obviously look at this and
then the result doesn't look like it and
I'm like okay I thought I understood
this yeah
but that's why it's really cool to
literally code you run it in a notebook
and it gives you a result and you're
like oh wow and like actual numbers
actual input act you know actual code
yeah it's not mathematical symbols Etc
the source of Truth is the code it's not
slides it's just like let's build it
it's beautiful you're a rare human in
that sense uh what advice would you give
to researchers uh trying to develop and
publish idea that have a big impact in
the world of AI so maybe
um undergrads maybe early graduate
students yep I mean I would say like
they definitely have to be a little bit
more strategic than I had to be as a PhD
student because of the way AI is
evolving it's going the way of physics
where you know in physics you used to be
able to do experiments on your benchtop
and everything was great and you could
make progress and now you have to work
in like LHC or like CERN and
and so AI is going in that direction as
well
um so there's certain kinds of things
that's just not possible to do on the
bench top anymore and uh
I think um that didn't used to be the
case at the time do you still think that
there's like
Gan type papers to be written where like
uh like very simple idea that requires
just one computer to illustrate a simple
example I mean one example that's been
very influential recently is diffusion
models diffusion models are amazing the
fusion models are six years old for the
longest time people were kind of
ignoring them as far as I can tell and
they're an amazing generative model
especially in uh in images and so stable
diffusion and so on it's all diffusion
based the fusion is new it was not there
and came from well it came from Google
but a researcher could have come up with
it in fact some of the first
actually no those came from Google as
well but a researcher could come up with
that in an academic Institution
yeah what do you find Most Fascinating
about diffusion models so from the
societal impact to the the technical
architecture what I like about the
fusion is it works so well
is that surprising to you the amount of
the variety almost the novelty of the
synthetic data is generating yeah so the
stable diffusion images are incredible
it's the speed of improvement in
generating images has been insane uh we
went very quickly from generating like
tiny digits to the tiny faces and it all
looked messed up and now we have stable
diffusion and that happened very quickly
there's a lot that Academia can still
contribute you know for example um flash
attention is a very efficient kernel for
running the attention operation inside
the Transformer that came from academic
environment it's a very clever way to
structure the kernel uh that that's the
calculation so it doesn't materialize
the attention Matrix
um and so there's I think there's still
like lots of things to contribute but
you have to be just more strategic do
you think neural networks could be made
to reason
uh yes
do you think they're already reason yes
what's your definition of reasoning
uh information processing
so in a way that humans think through a
problem and come up with novel ideas
it it feels like a reasoning yeah so the
the novelty
I don't want to say but out of
distribution ideas
you think it's possible yes and I think
we're seeing that already in the current
neural Nets you're able to remix the
training set information into true
generalization in some sense that
doesn't appear it doesn't matter
like you're doing something interesting
algorithmically you're manipulating
you know some symbols and you're coming
up with some
correct a unique answer in a new setting
what would uh illustrate to you holy
shit this thing is definitely thinking
to me thinking or reasoning is just
information processing and
generalization and I think the neural
Nets already do that today so being able
to perceive the world or perceive the
whatever the inputs are and to make
uh predictions based on that or actions
based on that that's that's the reason
yeah you're giving correct answers in
novel settings
by manipulating information you've
learned the correct algorithm you're not
doing just some kind of a lookup table
and there's neighbor search
let me ask you about AGI what what are
some moonshot ideas you think might make
significant progress towards AGI or
maybe in other ways what are big
blockers that we're missing now so
basically I am fairly bullish on our
ability to build agis uh basically
automated systems that we can interact
with and are very human-like and we can
interact with them in a digital realm or
Physical Realm currently it seems most
of the models that sort of do these sort
of magical tasks are in a text Realm
um I think uh as I mentioned I'm
suspicious that the text realm is not
enough to actually build full
understanding of the world I do actually
think you need to go into pixels and
understand the physical world and how it
works so I do think that we need to
extend these models to consume images
and videos and train on a lot more data
that is multimodal in that way
if you think you need to touch the world
to understand it also well that's the
big open question I would say in my mind
is if you also require the embodiment
and the ability to uh sort of interact
with the world run experiments and have
a data of that form then you need to go
to Optimus or something like that and so
I would say Optimus in some way is like
a hedge
in AGI because it seems to me that it's
possible that just having data from the
internet is not enough
if that is the case then Optimus may
lead to AGI because Optimus would I to
me there's nothing Beyond optimism you
have like this humanoid form factor that
can actually like do stuff in the world
you can have millions of them
interacting with humans and so on and uh
if that doesn't give a rise to AGI at
some point like not I'm not sure what
will
um so from a completeness perspective I
think that's the uh that's a really good
platform but it's a much harder platform
because uh you are dealing with atoms
and you need to actually like build
these things and integrate them into
society so I think that path takes
longer but it's much more certain and
then there's a path of the internet and
just like training these compression
models effectively uh on uh trying to
compress all the internet
and that might also give these agents as
well compress the internet but also
interact with the internet yeah so it's
not obvious to me
in fact I suspect you can reach AGI
without ever entering the physical world
and which is a little bit more
uh concerning because
it might that results in it happening
faster
so it just feels like we're in again
boiling water we won't know as it's
happening
I would like to I'm not afraid of AGI
I'm excited about it there's always
concerns but I would like to know when
it happens
yeah or and have like hints about when
it happens like a year from now it will
happen that kind of thing yeah I just
feel like in the digital realm it just
might happen yeah I think all we have
available to us because no one has built
AGI again so all we have available to us
is uh is there enough for cow ground on
the periphery I would say yes and we
have the progress so far which has been
very rapid and uh there are next steps
that are available and so
I would say uh yeah it's quite likely
that we'll be interacting with digital
entities how will you know that
somebody's birthday it's going to be a
slow I think it's going to be a slow
incremental transition is going to be
product based and focused it's going to
be GitHub co-pilot getting better and
then uh GPT is helping you right and
then these oracles that you can go to
with mathematical problems I think we're
on a on the verge of being able to ask
very complex
questions in chemistry physics math of
these oracles and have them complete
Solutions
so AGI to use primarily focus on
intelligence so Consciousness doesn't
enter
into uh into it
so in my mind Consciousness is not a
special thing you will you will figure
out and bolt-on I think it's an emerging
phenomenon of a large enough and complex
enough
um generative model sort of so
um if you have a complex and Alpha World
model that understands the world then it
also understands its predicament in the
world as being a language model which to
me is a form of Consciousness or
self-awareness and so in order to
understand the world deeply you probably
have to integrate yourself into the
world yeah and in order to interact with
humans and other living beings
Consciousness is a very useful tool I
think Consciousness is like a modeling
insight
modeling Insight yeah it's a you have a
powerful enough model of understanding
the world that you actually understand
that you are an entity in it yeah but
there's also this um perhaps just a
narrative we tell ourselves there's a it
feels like something to experience the
world the hard problem of Consciousness
yeah but that could be just the
narrative that we tell ourselves yeah I
don't think what yeah I think it will
emerge I think it's going to be
something uh very boring like we'll be
talking to these uh digital AIS they
will claim they're conscious they will
appear conscious they will do all the
things that you would expect of other
humans and uh it's going to just be a
stalemate I I think there would be a lot
of actual fascinating ethical questions
like Supreme Court level questions
of whether you're allowed to turn off a
conscious AI if you're allowed to build
the conscious AI
maybe there would have to be the same
kind of debates that you have around
um sorry to bring up a political topic
but you know abortion which is the
deeper question with abortion
is what is life and the Deep question
with AI is also what is life and what is
conscious and I think that'll be very
fascinating
to bring up it might become illegal to
build systems that are capable that like
of such a level of intelligence that
Consciousness would emerge and therefore
the capacity to suffer would emerge and
some A system that says no please don't
kill me well that's what the Lambda
compute the Lambda chatbot already told
um this Google engineer right like it it
was talking about not wanting to die or
so on so that might become illegal to do
that right
I because otherwise you might have a lot
of a lot of creatures that don't want to
die and they will uh you can just spawn
Infinity of them on a cluster
and then that might lead to like
horrible consequences because then there
might be a lot of people that secretly
love murder and they'll start practicing
murder on those systems I mean there's
just I to me all of this stuff just
brings a beautiful mirror to The Human
Condition and human nature we'll get to
explore it and that's what like the best
of uh the Supreme Court of all the
different debates we have about ideas of
what it means to be human we get to ask
those deep questions that we've been
asking throughout human history there's
always been the other in human history
uh we're the good guys and that's the
bad guys and we're going to uh you know
throughout human history let's murder
the bad guys and the same will probably
happen with robots it'll be the other at
first and then we'll get to ask
questions of what does it mean to be
alive what does it mean to be conscious
yeah and I think there's some Canary in
the coal mines even with what we have
today
um and uh you know for example these
there's these like waifus that you like
work with and some people are trying to
like this company is going to shut down
but this person really like yeah love
their waifu and like is trying to like
Port it somewhere else and like it's not
possible and like I think like
definitely uh people will have feelings
towards uh towards these um systems
because in some sense they are like a
mirror of humanity because they are like
sort of like a big average of humanity
yeah in a way that it's trained but we
can that average we can actually watch
it's nice to be able to interact with
the big average of humanity yeah and do
like a search query on it yeah yeah it's
very fascinating and uh we can also of
course also like shape it it's not just
a pure average we can mess with the
training data we can mess with the
objective we can fine tune them in
various ways so we have some
um you know impact on what those systems
look like if you want to achieve AGI
um and you could have a conversation
with her and ask her uh talk about
anything maybe ask her a question what
kind of stuff would you would you ask I
would have some practical questions in
my mind like uh do I or my loved ones
really have to die uh what can we do
about that
do you think it will answer clearly or
would it answer poetically
I would expect it to give Solutions I
would expect it to be like well I've
read all of these textbooks and I know
all these things that you've produced
and it seems to me like here are the
experiments that I think it would be
useful to run next and hear some Gene
therapies that I think would be helpful
and uh here are the kinds of experiments
that you should run okay let's go over
the Start experiment okay
imagine that mortality is actually uh
like a prerequisite for happiness so if
we become immortal we'll actually become
deeply unhappy and the model is able to
know that so what is this supposed to
tell you stupid human about it yes you
can become a mortal but you will become
deeply unhappy
if if the model is if the AGI system
is trying to empathize with you human
what is this supposed to tell you that
yes you don't have to die but you're
really not going to like it because that
is it going to be deeply honest like
there's a Interstellar what is it the AI
says like humans want 90 honesty
so like you have to pick how honest I
want to answer these practical questions
yeah I love Yeah Interstellar by the way
I think it's like such a sidekick to the
entire story but
at the same time it's like really
interesting it's kind of limited in
certain ways right yeah it's limited and
I think that's totally fine by the way I
don't think uh I think it's
find impossible to have a limited and
imperfect agis
is that the feature almost as an example
like it has a fixed amount of compute on
its physical body and it might just be
that even though you can have a super
amazing Mega brain super intelligent AI
you also can have like you know less
intelligent AIS that you can deploy in a
power efficient way and then they're not
perfect they might make mistakes no I
meant more like say you had infinite
compute and it's still good to make
mistakes sometimes
like in order to integrate yourself like
um
what is it going back to Goodwill
Hunting uh Robin Williams character
says like the human imperfections that's
the good stuff right isn't it isn't that
this like we don't want perfect we want
flaws in part to form connections with
each other because it feels like
something you can attach your feelings
to
the the flaws in that same way you want
an AI That's flawed I don't know I feel
like perfectionist but then you're
saying okay yeah but that's not AGI but
see AGI would need to be intelligent
enough to give answers to humans that
humans don't understand and I think
perfect isn't something humans can't
understand because even science doesn't
give perfect answers there's always gabs
and Mysteries and I don't know I I don't
know if humans want perfect
yeah I could imagine just uh having a
conversation with this kind of Oracle
entity as you'd imagine them and uh yeah
maybe it can tell you about
you know based on my analysis of Human
Condition uh you might not want this and
here are some of the things that might
but every every dumb human will say yeah
yeah trust me I can give me the truth I
can handle it but that's the beauty a
lot of people can choose uh so but then
the old marshmallow test with the kids
and so on I feel like too many people
uh like it can't handle the truth
probably including myself like the Deep
truth of The Human Condition I don't I
don't know if I can handle it like what
if there's some dark stuff what if we
are an alien science experiment and it
realizes that what if it had I mean
I mean this is the Matrix you know the
middle over again
I don't know I would what would I talk
about I don't even yeah I
uh probably I will go with the save for
scientific questions at first that have
nothing to do with my own personal life
yeah immortality just like about physics
and so on yeah uh to build up like let's
see where it's at or maybe see if it has
a sense of humor that's another question
would it be able to uh presumably in
order to if it understands humans deeply
would able to generate
uh yep to generate humor yeah I think
that's actually a wonderful Benchmark
almost like is it able I think that's a
really good point basically to make you
laugh yeah if it's able to be like a
very effective stand-up comedian that is
doing something very interesting
computationally I think being funny is
extremely hard yeah
because
it's hard in a way like a touring test
the original intent of the touring test
is hard because you have to convince
humans and there's nothing that's why
that's why when comedians talk about
this like there's this is deeply honest
because if people can't help but laugh
and if they don't laugh that means
you're not funny they laugh that's funny
and you're showing you need a lot of
knowledge to create to create humor
about like the occupational Human
Condition and so on and then you need to
be clever with it
uh you mentioned a few movies you
tweeted movies that I've seen five plus
times but I'm ready and willing to keep
watching Interstellar Gladiator contact
Goodwill Hunting The Matrix Lord of the
Rings all three Avatar Fifth Element so
on goes on Terminator two Mean Girls I'm
not gonna ask about that
mean girls is great
um what are some of the jump onto your
memory that you love
and why like you mentioned the Matrix
as a computer person why do you love The
Matrix
there's so many properties that make it
like beautiful and interesting so
there's all these philosophical
questions but then there's also agis and
there's simulation and it's cool and
there's you know the black uh you know
uh the look of it the feel of it the
look of it the feel of it the action the
bullet time it was just like innovating
in so many ways
and then uh Good Good Will Hunting why
do you like that one yeah I just I
really like this uh torture genius sort
of character who's like grappling with
whether or not he has like any
responsibility or like what to do with
this gift that he was given or like how
to think about the whole thing and uh
there's also a dance between the genius
and the the personal like what it means
to love another human being and there's
a lot of themes there it's just a
beautiful movie and then the fatherly
figure The Mentor in the in the
psychiatrist and the it like really like
uh
it messes with you you know there's some
movies that's just like really mess with
you uh on a deep level do you relate to
that movie at all no it's not your fault
doctor as I said Lord of the Rings
that's self-explanatory Terminator 2
which is interesting
you we watch that a lot is that better
than Terminator one
you like you like Arnold I do like
Terminator one as well uh I like
Terminator 2 a little bit more but in
terms of like its surface properties
[Laughter]
do you think Skynet is at all a
possibility oh yes
well like the actual sort of uh
autonomous uh weapon system kind of
thing do you worry about that uh stuff
I 100 worry about it and so the I mean
the uh you know some of these uh fears
of AGS and how this will plan out I mean
these will be like very powerful
entities probably at some point and so
um for a long time they're going to be
tools in the hands of humans uh you know
people talk about like alignment of agis
and how to make the problem is like even
humans are not aligned uh so
uh how this will be used and what this
is going to look like is um yeah it's
troubling so
do you think it'll happen so slowly
enough that we'll be able to
as a human civilization think through
the problems yes that's my hope is that
it happens slowly enough and in an open
enough way where a lot of people can see
and participate in it just figure out
how to deal with this transition I think
which is going to be interesting I draw
a lot of inspiration from nuclear
weapons because I sure thought it would
be it would be fucked once they develop
nuclear weapons but like it's almost
like
uh when the when the systems are not so
dangerous they destroy human
civilization we deploy them and learn
the lessons and then we quickly if it's
too dangerous we're quickly quicker we
might still deploy it uh but you very
quickly learn not to use them and so
there'll be like this balance that you
humans are very clever as a species it's
interesting we exploit the resources as
much as we can but we don't we avoid
destroying ourselves it seems like
well I don't know about that actually I
hope it continues
um
I mean I'm definitely like concerned
about nuclear weapons and so on not just
as a result of the recent conflict even
before that uh that's probably like my
number one concern for society so if
Humanity uh destroys itself
or destroys you know 90 of people that
would be because of nukes I think so
um and it's not even about full
destruction to me it's bad enough if we
reset society that would be like
terrible it would be really bad and I
can't believe we're like so close to it
yeah it's like so crazy to me it feels
like we might be a few tweets away from
something like that yep basically it's
extremely unnerving but and has been for
me for a long time
it seems unstable that world leaders
just having a bad mood
can like um
take one step towards a bad Direction
and it escalates yeah and because of a
collection of bad moods it can escalate
without being able to
um stop
yeah it's just it's a huge amount of uh
Power and then also with the
proliferation and basically I don't I
don't actually really see I don't
actually know what the good outcomes are
here
uh so I'm definitely worried about that
a lot and then AGI is not currently
there but I think at some point we'll
more and more become uh something like
it the danger with AGI even is that I
think it's even less likely worse in a
sense that
uh there are good outcomes of AGI and
then the bad outcomes are like an
absolute way like a tiny one way and so
I think um capitalism and humanity and
so on will drive for the positive
uh ways of using that technology but
then if bad outcomes are just like a
tiny like flipping minus sign away uh
that's a really bad position to be in a
tiny perturbation of the system results
in the destruction of the human species
it's a weird line to walk yeah I think
in general what's really weird about
like the Dynamics of humanity and this
explosion was talked about is just like
the insane coupling afforded by
technology yeah and uh just the
instability of the whole dynamical
system I think it's just it doesn't look
good honestly
yes that explosion could be destructive
and constructive and the probabilities
are non-zero in both both senses I'm
going to have to I do feel like I have
to try to be optimistic and so on and
yes I think even in this case I still am
predominantly optimistic but there's
definitely
me too uh do you think we'll become a
multiplayer species
probably yes but I don't know if it's
dominant feature of uh future Humanity
uh there might be some people on some
planets and so on but I'm not sure if
it's like
yeah if it's like a major player in our
culture and so on we still have to solve
the drivers of self-destruction here on
Earth so just having a backup on Mars is
not going to solve the problem so by the
way I love the backup on Mars I think
that's amazing you should absolutely do
that yes and I'm so thankful uh and
would you go to Mars uh personally no I
do like Earth quite a lot okay uh I'll
go to Mars I'll go for you unless I'll
tweet at you from there maybe eventually
I would once it's uh safe enough but I
don't actually know if it's on my
lifetime scale unless I can extend it by
a lot
I do think that for example a lot of
people might disappear into
um virtual realities and stuff like that
and I think that could be the major
thrust of
um sort of the cultural development of
humanity if it survives uh so it might
not be it's just really hard to work in
Physical Realm and go out there and I
think ultimately all your experiences
are in your brain yeah and so it's much
easier to disappear into digital Realm
and I think people will find them more
compelling easier safer
more interesting so you're a little bit
captivated by Virtual Reality by the
possible worlds whether it's the
metaverse or some other manifestation of
that yeah
yeah it's really interesting it's uh
I'm I'm interested just just talking a
lot to Carmack where's the
where's the thing that's currently
preventing that yeah I mean to be clear
I think what's interesting about the
future is
um it's not that
I kind of feel like
the variance in The Human Condition
grows that's the primary thing that's
changing it's not as much the mean of
the distribution it's like the variance
of it so there will probably be people
on Mars and there will be people in VR
and they're all people here on Earth
it's just like there will be so many
more ways of being
and so I kind of feel like I see it as
like a spreading out of a human
experience there's something about the
internet that allows you to discover
those little groups and you you
gravitate each other something about
your biology likes that kind of world
and that you find each other yeah and
we'll have transhumanists and then we'll
have the Amish and they're gonna
everything is just gonna coexist you
know the cool thing about it because
I've interacted with a bunch of Internet
communities is
um they don't know about each other like
you can have a very happy existence just
like having a very close-knit community
and not knowing about each other I mean
even even since this just having
traveled to Ukraine there's they they
don't know so many things about America
you you like when you travel across the
world I think you experience this too
there are certain cultures they're like
they have their own thing going on they
don't and so you can see that happening
more and more and more and more in the
future we have little communities yeah
yeah I think so that seems to be the
that seems to be how it's going right
now and I don't see that Trend like
really reversing I think people are
diverse and they're able to choose their
own like path and existence and I sort
of like celebrate that
um and so will you spend so much time in
the meters in the virtual reality or
which Community are you are you the
physicalist uh the the the physical
reality enjoyer or uh do you see drawing
a lot of uh pleasure and fulfillment in
the digital world
yeah I think well currently the virtual
reality is not that compelling I do
think it can improve a lot but I don't
really know to what extent maybe you
know there's actually like even more
exotic things you can think about with
like neural links or stuff like that so
um currently I kind of see myself as
mostly a team human person I love nature
yeah I love Harmony I love people I love
Humanity I love emotions of humanity
um and I I just want to be like in this
like solar Punk little Utopia that's my
happy place yeah my happy place is like
uh people I love thinking about cool
problems surrounded by a lush beautiful
Dynamic nature yeah yeah and secretly
high tech in places that count places
like they use technology to empower that
love for other humans and nature yeah I
think a technology used like very
sparingly uh I don't love when it sort
of gets in the way of humanity in many
ways uh I like just people being humans
in a way we sort of like slightly
evolved and prefer I think just by
default people kept asking me because
they they know you love reading are
there particular books
that you enjoyed that had an impact on
you
for silly or for profound reasons that
you would recommend
you mentioned the vital question
many of course I think in biology as an
example the vital question is a good one
anything by McLean really uh life
ascending I would say is like a bit more
potentially uh representative as like a
summary
of a lot of the things he's been talking
about I was very impacted by the selfish
Gene I thought that was a really good
book that helped me understand altruism
as an example and where it comes from
and just realizing that you know the
selection is in the level of genes was a
huge insight for me at the time and it
sort of like cleared up a lot of things
for me what do you think about
the the idea that ideas of the organisms
the means yes love it 100
[Laughter]
are you able to walk around with that
notion for a while that that there is an
evolutionary kind of process with ideas
as well there absolutely is there's
memes just like genes and they compete
and they live in our brains it's
beautiful are we silly humans thinking
that we're the organisms is it possible
that the primary
organisms are the ideas
yeah I would say like the the ideas kind
of live in the software of like our
civilization in the in the minds and so
on we think as humans that the hardware
is the fundamental thing I human is a
hardware entity yeah but it could be the
software right yeah
yeah I would say like there needs to be
some grounding at some point to like a
physical reality yeah but if we clone an
Andre
the software is the thing
like is this thing that makes that thing
special right yeah I guess I you're
right but then cloning might be
exceptionally difficult like there might
be a deep integration between the
software and the hardware in ways we
don't quite understand well from the
evolution point of view like what makes
me special is more like the the gang of
genes that are writing in my chromosomes
I suppose right like they're the they're
the replicating unit I suppose and no
but that's just for you the thing that
makes you special sure wow
the reality is what makes you special is
your ability to survive
based on the software that runs on the
hardware that was built by the genes
um so the software is the thing that
makes you survive not the hardware
all right yeah it's just like a second
layer it's a new second layer that
hasn't been there before the brain they
both they both coexist but there's also
layers of the software I mean it's it's
not it's a it's a abstraction that's uh
on top of abstractions but okay so
selfish Gene um a neckline I would say
sometimes books are like not sufficient
I like to reach for textbooks sometimes
um I kind of feel like books are for too
much of a general consumption sometime
and they just kind of like uh they're
too high up in the level of abstraction
and it's not good enough yeah so I like
textbooks I like the cell I think the
cell was pretty cool
uh that's why also I like the writing of
uh McLean is because he's pretty willing
to step one level down and he doesn't uh
yeah he's sort of he's willing to go
there
but he's also willing to sort of be
throughout the stack so he'll go down to
a lot of detail but then he will come
back up and I think he has a yeah
basically I really appreciate that
that's why I love college early college
even high school but just textbooks on
the basics yeah of Computer Science and
Mathematics of of biology of chemistry
yes those are they condense down like uh
uh it's sufficiently General that you
can understand the both the philosophy
and the details but also like you get
homework problems and you you get to
play with it as much as you would if you
weren't yeah programming stuff yeah and
then I'm also suspicious of textbooks
honestly because as an example in deep
learning uh there's no like amazing
textbooks and I feel this changing very
quickly I imagine the same is true and
say uh synthetic biology and so on these
books like this cell are kind of
outdated they're still high level like
what is the actual real source of truth
it's people in wet Labs working with
cells yeah you know sequencing genomes
and
yeah actually working with working with
it and uh I don't have that much
exposure to that or what that looks like
so I sold them fully I'm reading through
the cell and it's kind of interesting
and I'm learning but it's still not
sufficient I would say in terms of
understanding well it's a clean
summarization of the mainstream
narrative
yeah but you have to learn that before
you break out yeah towards The Cutting
Edge yeah what is the actual process of
working with these cells and growing
them and incubating them and you know
it's kind of like a massive cooking
recipe so making sure your self slows
and proliferate and then you're
sequencing them running experiments and
uh just how that works I think is kind
of like the source of truth of at the
end of the day what's really useful in
terms of creating therapies and so on
yeah I wonder in the future AI textbooks
will be because you know there's a
artificial intelligence a modern
approach I actually haven't read if it's
come out the recent version the recent
there's been a recent Edition I also saw
there's a science a deep learning book
I'm waiting for textbooks that worth
recommending worth reading it's It's
tricky because it's like papers
and code code honestly papers are quite
good I especially like the appendix
appendix of any paper as well it's like
it's like the most detail it can have
it doesn't have to be cohesive to
connected to anything else you just
describe me a very specific way you
solved a particular thing yeah many
times papers can be actually quite
readable not always but sometimes the
introduction in the abstract is readable
even for someone outside of the field uh
not this is not always true and
sometimes I think unfortunately
scientists use complex terms even when
it's not necessary I think that's
harmful I think there there's no reason
for that and papers sometimes are longer
than they need to be in this in the
parts that
don't matter yeah appendix would be long
but then the paper itself you know look
at Einstein make it simple
yeah but certainly I've come across
papers I would say in say like synthetic
biology or something that I thought were
quite readable for the abstract and the
introduction and then you're reading the
rest of it and you don't fully
understand but you kind of are getting a
gist and I think it's cool
what uh advice you give advice to folks
interested in machine learning and
research but in General Life advice to a
young person High School
um Early College about how to have a
career they can be proud of or a life
they can be proud of
yeah I think I'm very hesitant to give
general advice I think it's really hard
I've mentioned like some of the stuff
I've mentioned is fairly General I think
like focus on just the amount of work
you're spending on like a thing
uh compare yourself only to yourself not
to others that's good I think those are
fairly General how do you pick the thing
uh you just have like a deep interest in
something uh or like try to like find
the art Max over like the things that
you're interested in ARG Max at that
moment and stick with it how do you not
get distracted and switch to another
thing uh you can if you like
um well if you do an ARG Max repeatedly
every week it doesn't converge it
doesn't it's a problem yeah you can like
low pass filter yourself uh in terms of
like what has consistently been true for
you
um but yeah I definitely see how it can
be hard but I would say like you're
going to work the hardest on the thing
that you care about the most also a low
pass filter yourself and really
introspect in your past were the things
that gave you energy and what are the
things that took energy away from you
concrete examples and usually uh from
those concrete examples sometimes
patterns can merge I like I like it when
things look like this when I'm these
positions so that's not necessarily the
field but the kind of stuff you're doing
in a particular field so for you it
seems like you were energized by
implementing stuff building actual
things yeah being low level learning and
then also uh communicating so that
others can go through the same
realizations and shortening that Gap
um because I usually have to do way too
much work to understand the thing and
then I'm like okay this is actually like
okay I think I get it and like why was
it so much work it should have been much
less work and that gives me a lot of
frustration and that's why I sometimes
go teach so aside from the teaching
you're doing now uh putting out videos
aside from a potential uh Godfather part
two
uh with the AGI at Tesla and Beyond uh
what does the future for Android kapati
hold have you figured that out yet or no
I mean uh as you see through the fog of
war that is all of our future
um do you do you start seeing
silhouettes of the what that possible
future could look like
the consistent thing I've been always
interested in for me at least is is AI
and um
uh that's probably where I'm spending my
rest of my life on because I just care
about a lot and I actually care about
like many other problems as well like
say aging which I basically view as
disease and uh
um I care about that as well but I don't
think it's a good idea to go after it
specifically I don't actually think that
humans will be able to come up with the
answer I think the correct thing to do
is to ignore those problems and you
solve Ai and then use that to solve
everything else and I think there's a
chance that this will work I think it's
a very high chance and uh that's kind of
like the the way I'm betting at least so
when you think about AI are you
interested in all kinds of applications
all kinds of domains and any domain you
focus on will allow you to get insights
to the big problem of AGI yeah for me
it's the ultimate mental problem I don't
want to work on any one specific problem
there's too many problems so how can you
work on all problems simultaneously you
solve The Meta problem which to me is
just intelligence and how do you
automate it is there cool small projects
like archive sanity and and so on that
you're thinking about the the the the
world the ml world can anticipate
there's some always like some fun side
projects yeah um archive sanity is one
basically like there's way too many
archive papers how can I organize it and
recommend papers and so on uh I
transcribed all of your yeah podcasts
what did you learn from that experience
uh from transcribing the process of like
you like consuming audiobooks and and
podcasts and so on and here's the
process that achieves
um closer to human level performance and
annotation yeah well I definitely was
like surprised that uh transcription
with opening eyes whisper was working so
well compared to what I'm familiar with
from Siri and like a few other systems I
guess it works so well and
uh that's what gave me some energy to
like try it out and I thought it could
be fun to random podcasts it's kind of
not obvious to me why whisper is so much
better compared to anything else because
I feel like there should be a lot of
incentive for a lot of companies to
produce transcription systems and that
they've done so over a long time whisper
is not a super exotic model it's a
Transformer it takes smell spectrograms
and you know just outputs tokens of text
it's not crazy uh the model and
everything has been around for a long
time
I'm not actually 100 sure why yeah it's
not obvious to me either it makes me
feel like I'm missing something I'm
missing something yeah because there's a
huge even at Google and so on YouTube uh
transcription yeah
um yeah it's unclear but some of it is
also integrating into a bigger system
yeah that so the user interface how it's
deployed and all that kind of stuff
maybe running it as an independent thing
is eat much easier like an order of
magnitude easier than deploying to a
large integrated system like YouTube
transcription or
um anything like meetings like Zoom has
trans transcription that's kind of
crappy but creating uh interface where
it detects the different individual
speakers it's able to
um
display it in compelling ways Run in
real time all that kind of stuff maybe
that's difficult but that's the only
explanation I have because like um
I'm currently paying uh quite a bit for
human uh transcription human caption
right annotation and like it seems like
uh there's a huge incentive to automate
that yeah it's very confusing and I
think I mean I don't know if you looked
at some of the whisper transcripts but
they're quite good they're good and
especially in tricky cases yeah I've
seen
uh Whispers performance on like super
tricky cases and it does incredibly well
so I don't know a podcast is pretty
simple it's like high quality audio and
you're speaking usually pretty clearly
and so I don't know it uh I don't know
what open ai's plans are yeah either but
yeah there's always like fun fun
projects basically and stable diffusion
also is opening up a huge amount of
experimentation I would say in the
visual realm and generate generating
images and videos and movies videos now
and so that's going to be pretty crazy
uh that's going to that's going to
almost certainly work and it's going to
be really interesting when the cost of
content creation is going to fall to
zero you used to need a painter for a
few months to paint a thing and now it's
going to be speak to your phone to get
your video so if Hollywood will start
using that to generate scene means
which completely opens up yeah so you
can make a like a movie like Avatar
eventually for under a million dollars
much less maybe just by talking to your
phone I mean I know it sounds kind of
crazy
and then there'd be some voting
mechanism like how do you have a like
would there be a show on Netflix that's
generated completely uh automatedly
potentially yeah and what does it look
like also when you can just generate It
On Demand and it's uh and there's
Infinity of it yeah
oh man
all the synthetic content I mean it's
humbling because we we treat ourselves
as special for being able to generate
art and ideas and all that kind of stuff
if that can be done in an automated Way
by AI yeah I think it's fascinating to
me how these uh the predictions of AI
and what it's going to look like and
what it's going to be capable of are
completely inverted and wrong and the
Sci-Fi of 50s and 60s was just like
totally not bright they imagined AI is
like super calculating theore improvers
and we're getting things that can talk
to you about emotions they can do art
it's just like weird are you excited
about that feature just ai's like hybrid
systems heterogeneous systems of humans
and AIS talking about emotions Netflix
and chill with an AI system legit where
the Netflix thing you watch is also
generated by AI
I think it's uh it's going to be
interesting for sure and I think I'm
cautiously optimistic but it's not it's
not obvious
well the sad thing is your brain and
mine developed in a time where
um before Twitter before the before the
internet so I wonder people that are
born inside of it might have a different
experience
um like I maybe you can will still
resist it uh and the people born now
will not
well I do feel like humans are extremely
malleable yeah
and uh you're probably right
what is the meaning of life Andre
we we talked about sort of
the universe having a conversation with
us humans or with the systems we create
to try to answer for the university for
the creator of the universe to notice us
we're trying to create systems that are
loud enough
just answer back
I don't know if that's the meaning of
life that's like meaning of life for
some people the first level answer I
would say is anyone can choose their own
meaning of life because we are conscious
entity and it's beautiful number one but
uh I do think that like a deeper meaning
of life if someone is interested is uh
or along the lines of like what the hell
is All This and like why and if you look
at the into fundamental physics and the
quantum field Theory and a standard
model they're like very complicated and
um
there's this like you know 19 free
parameter parameters of our universe and
like what's going on with all this stuff
and why is it here and can I hack it can
I work with it is there a message for me
am I supposed to create a message
and so I think there's some fundamental
answers there but I think there's
actually even like you can't actually
really make dent in those without more
time and so to me also there's a big
question around just getting more time
honestly
yeah that's kind of like what I think
about quite a bit as well so kind of the
ultimate
or at least first way to sneak up to the
why question is to try to escape
uh the system the universe yeah and then
for that you sort of uh backtrack and
say okay for that that's going to be
take a very long time so the why
question boils down from an engineering
perspective to how do we extend yeah I
think that's the question number one
practically speaking because you can't
uh you're not gonna calculate the answer
to the deeper questions in the time you
have and that could be extending your
own lifetime or extending just the
lifetime of human civilization of
whoever wants to not many people might
not want that but I think people who do
want that I think um I think it's
probably possible uh and I don't I don't
know that people
fully realize this I kind of feel like
people think of death as an
inevitability but at the end of the day
this is a physical system some things go
wrong uh it makes sense why things like
this happen evolutionarily speaking and
uh there's most certainly interventions
that uh that mitigate it that would be
interesting if death is eventually
looked at as
as a fascinating thing that used to
happen to humans I don't think it's
unlikely I think it's I think it's
likely
and it's up to our imagination to try to
predict what the world without death
looks like yeah it's hard to I think the
values will completely change
could be I don't I don't really buy all
these ideas that oh without that there's
no meaning there's nothing as
I don't intuitively buy all those
arguments I think there's plenty of
meaning plenty of things to learn
they're interesting exciting I want to
know I want to calculate uh I want to
improve the condition of
all the humans and organisms that are
alive yet the way we find meaning might
change we there is a lot of humans
probably including myself that finds
meaning in the finiteness of things but
that doesn't mean that's the only source
of meaning yeah I do think many people
will will go with that which I think is
great I love the idea that people can
just choose their own adventure like you
you are born as a conscious free entity
by default I'd like to think and um you
have your unalienable rights for Life uh
in the pursuit of happiness I don't know
if you have that in the nature the
landscape of happiness you can choose
your own adventure mostly and that's not
it's not fully true but I still am
pretty sure I'm an NPC but
um an NPC can't know it's an NPC
there could be different degrees and
levels of consciousness I don't think
there's a more beautiful way to end it
uh Andre you're an incredible person I'm
really honored you would talk with me
everything you've done for the machine
learning world for the AI world
to just inspire people to educate
millions of people it's been it's been
great and I can't wait to see what you
do next it's been an honor man thank you
so much for talking today awesome thank
you
thanks for listening to this
conversation with Andre karapathi to
support this podcast please check out
our sponsors in the description and now
let me leave you with some words from
Samuel Carlin
the purpose of models is not to fit the
data but to sharpen the questions
thanks for listening and hope to see you
next time
Loading video analysis...