Marc Andreessen & Amjad Masad on “Good Enough” AI, AGI, and the End of Coding
By a16z
Summary
## Key takeaways - **English is the New Programming Language**: The ultimate goal is to program in natural language, abstracting away syntax entirely. This mirrors historical advancements like Grace Hopper's compiler, moving from machine code to English commands for broader accessibility. [03:50], [04:52] - **AI Agents: The New Programmers**: AI agents are becoming the primary programmers, capable of complex tasks like setting up databases and writing tests. This shift means the user's role is evolving from direct coding to directing these agents. [09:52], [10:09] - **Long-Horizon Reasoning Breakthrough**: Recent advancements, particularly through reinforcement learning, have enabled AI agents to maintain coherence and perform complex tasks for extended periods, overcoming previous limitations of short, error-prone reasoning chains. [13:06], [18:45] - **Verifiable Domains Accelerate AI Progress**: AI development is progressing fastest in domains with clear, verifiable answers like math and code. Softer domains like law and healthcare, lacking deterministic outcomes, see slower progress due to difficulties in verification. [26:00], [30:15] - **"Good Enough" AI Risks Stalling AGI**: The current success of AI in economically productive tasks creates a 'local maximum trap.' This 'good enough' AI may reduce the pressure to pursue true AGI, which requires more generalized, cross-domain learning. [51:15], [51:44]
Topics Covered
- English is the new programming language for everyone.
- Verification loops unlock long-horizon AI agent reasoning.
- AI progresses fastest in verifiable, concrete domains.
- Why does AI's 'magic' still feel disappointing?
- Is economic value a 'local maximum trap' for AGI?
Full Transcript
We're dealing with magic here that we I
think probably all would have thought
was impossible 5 years ago or certainly
10 years ago. This is the most amazing
technology ever and it's moving really
fast and yet we're still like really
disappointed. Like it's not moving fast
enough and like it's like maybe right on
the verge of stalling out. We should
both be like hyper excited but also on
the verge of like slitting our wrists
cuz like you know the gravy train is
coming to an end,
>> right?
>> It is faster but it's not at computer
speed, right? What we expect computer
speed to be. It's sort of like watching
a person work.
>> It's like watching John Carmarmac
>> the world. Okay. the world's the world's
best programmer on a stimulus.
>> On a stimulant. Yeah, that's right.
>> So, let's start with um let's assume
that I'm a sort of a novice programmer.
So, maybe I'm a student um uh or maybe
I'm just somebody, you know, I took a
few coding classes and I've hacked
around a little bit or like I don't
know, I do Excel macros or something
like that, but I'm like not less. I'm
not like a master craftsman at coding.
Um and you know people somebody tells me
about replet and and specifically AI um
AI and Replet like what's my what's my
experience uh when when I launch in with
with what replet is today with AI.
>> Yeah I I would um I I think the
experience of someone with no coding
experience or some coding experience is
largely the same when you go into
replet. Okay.
>> The first thing we try to do is get all
the nonsense away from like setting up
development environment and all of that
stuff and just have you focus on your
idea. So what do you want to build? Do
you want to build a product? Do you want
to solve a problem? Do you want to do a
data vis visualization? So the prompt
box is really open for you. You can put
in anything there. So let's say you want
to, you know, build a startup. You have
an idea for a startup. I would I would
start with like a paragraph long kind of
description of what I want to build. Uh
the agents will read that. It will
>> you just type just type
>> standard English. Standard English. You
just type it in. I want to build a I
want to sell I want to sell crepes. I
want to sell crepes online. So you just
like type in I want to talk.
>> You can it literally could be that four
words or five words. Okay.
>> Or it could be if you're if you have a
programming language you prefer or stack
you prefer, you could do that. But we
actually prefer not for you not to do
that because we're going to pick the
best thing for we're going to classify
the best stack for that request. Right?
>> It's a if it's a data app, we'll pick
Python and stream whatever. If it's like
a web app, we'll pick JavaScript and
Postgress and things like that. So you
just type that
>> or you can decide you can decide you can
say and I want to do it I know Python or
I'm learning Python in school and I want
to do it in Python.
>> That's right. The cool thing about
Replet is you know we we've been around
for almost 10 years now and we built all
this infrastructure. Replet runs any
programming language. So if you're
comfortable with Python you can go in
and do that for sure.
>> Okay.
>> And then just again I know this is
obvious people have used it but like I'm
dealing in English.
>> Yes.
>> So okay go ahead.
>> Yes. You're fully in English. I mean,
you know, just a, you know, a little bit
of of sort of background here, like when
um when I I came here and pitched to you
like 10 years ago or like whatever 7
years ago,
>> right?
>> Uh what we were saying is we were
exactly describing this future is that
>> uh everyone would want to build
software right?
>> And the thing that's kind of getting in
in people's ways is all the uh what Fred
Brooks called the accidental complexity
of programming, right? They're like
essential complexity which is like how
do I bring my startup to market and how
do I build a business and all of that
accidental complexity is what package
manager do I use all of that stuff we've
been abstracting away that for so many
years so you can just um and the last
thing we had to abstract away is code
>> right
>> I had this realization last year which
is I think we you know built an amazing
platform but the business is not
performing and the reason the business
is not performing is that code is the
bottleneck like yes all the other stuff
is important to solve but syntax is
still an issue like you know syntax is
just an unnatural thing for people so
ultimately English is the programming
language
>> right
>> I I just does it work with other other
world languages other than English at
this point
>> yes you can write in Japanese and we
have a lot of users especially Japanese
that tends to be very
>> so does it support these days like for a
does a support every language or is it
still do you still have to do like
custom work to craft a new new language
>> no most most you know uh mainstream
dream language that has like 100 million
plus people that speak it. AI is pretty
good at it.
>> Okay. Yeah.
>> Yeah. Wow.
>> So, uh I I I did a bit of a bit of
historical research recently for for
some reason. I just want to just
understand the moment we're in and
because it's such a special moment. It's
I think it's important to contextualize
it and I I I read this quote from
Gracehopper. So, Gracehopper invented
the compiler as you know. uh at the time
people were uh you know programming in
machine code and that's what programmers
do that's what the specialists do
>> yes
>> and she said you know specialists will
always be a specialist they have to
learn the underlying machinery of
computers but I want to get to a world
where people are programming English
that's what she said that's before
karpathy right that's like you know 75
years ago
>> uh and and and that's why I invented the
compiler and in her mind like C
programming is English
>> right
>> uh But that, you know, that really
didn't uh that was just the start of it.
You had C and then you go higher level
Python and JavaScript. And I think it
we're at a moment where it's the next
step,
>> right?
>> Instead of typing syntax, you're
actually typing thoughts, you know,
which is what we ultimately want.
>> And the machine writes the code
>> and the machine writes the code,
>> right? Right.
>> Um yeah, I remember it. you're you're
probably not old enough uh to remember
but I I remember when when I was a kid
it was um you know there there were were
higher level languages you know by the
70s like like basic and so forth and
forran
>> and C and C but um uh there were still
you know you still would run into people
who were doing assembly programming
assembly language which by the way you
still do you know like game companies or
whatever still do assembly to to to get
>> and they were hating on the kids that
were doing basic. Oh well so so the
assembly people were hating the kids
doing basic but there were also older
coders who hated on the assembly
programmers for doing assembly and not
and not and not no no no not doing
direct machine code right not doing
direct zero in one machine code because
because as because assembly assembly so
people don't know assembly language is
sort of this very low-level programming
language that sort of compiles to actual
actual machine code and if and if it's
it's it's incomprehensible gibberish to
most program even most programmers
>> you're writing an octal or something
>> you're writing like very very close to
the hardware but even still it's still a
language that compiles to zeros and ones
>> um whereas the actual real programmers
actually wrote in zeros and ones. And so
there there's always there's always this
tendency, you know, for the for the for
the pros to be, you know, look on the
nose.
>> Yeah.
>> And say, you know, the new people are
being are being, you know, basically
sloppy. They don't understand what's
happening. You know, they don't really
understand the machine. And then, of
course, you know, with with the higher
level with the higher level abstractions
do is they make they democratize. The
absolute irony is I was part of the
JavaScript revolution. I was at Facebook
uh before starting repled and we built
the modern JavaScript stack. We built
ReactJS and all the tooling around it
>> and we got a lot of hate from from the
programmers that you should type you
know vanilla JavaScript directly and
>> um I was like okay whatever and then
that you and now that's mainstream and
then those guys that built their careers
on the last wave we invented are hating
on this new wave and so just you know
people never change. Okay, got it. Okay,
so you you're typing English I want to
sell crepes online. I want to do this. I
want to have a t-shirt. Whatever the
business is. Okay. What what happens
then?
>> Yeah. and then uh uh replet agent will
show you what it understood. So it's
trying to build um a common
understanding between you and it and I
think there's a lot of things we can do
better there in terms of UI but for now
it'll show you a list of tasks.
>> It'll tell you I'm going to go set up a
database because you need to store your
data somewhere. Uh we need to set up
Shopify or Stripe because we need to
accept payments. Uh and then it shows
you this list and gives you two options
initially. Do you want to start with a
design so that we can iterate back and
forth to get lock that design down or do
you want to build a full thing?
>> Hey, if you want to build a full thing,
we'll go for 20, 30, 40 minutes.
>> Uh, and the a and be like the agent will
tell you go here, install the app.
>> Uh, I'm going to go set up the database,
do the migrations, write the SQL, you
know, build the site. I'm going to also
test it. So this is a recent innovation
we did with um agent 3 is that after it
writes the software spins up a browser
goes around and tests in the browser and
then any issue it kind of iterates kind
of goes and fix the code. So it'll spend
20 30 minutes building that I'll send
you a notification it'll tell you the
app is ready. And so you can test it on
your phone. You can go back to your
computer. You'll see maybe you'll find a
bug or an issue, you'll describe it to
the agent, say, "Hey, it's not exactly
doing what I expected." Uh or if it's
perfect and and you're ready to go and
that's it. You know, 20 minutes. By the
way, there's a lot of examples where
people just get their idea in 20 30
minutes, which is amazing. Um you just
hit publish.
>> Mhm.
>> You hit you hit publish. Um
couple clicks, you'll be up in the
cloud. we'll set up a a virtual machine
in the cloud. The database is deployed.
Everything's done and now you have a
production database.
>> So, think about the steps needed just
two or 3 years ago in order to get to
that step. You have to set up your local
development environment. You have to
sign up for an AWS account. You have to
provision the databases, the virtual
machines, you have to create the entire
pip deployment pipeline. All of that is
done for you. And it just, you know, a
kid can do it, a lay person can do it.
If you're a programmer and uh you're
curious about what the agent did, the
cool thing about replet because we have
this history of being an IDE, you can
peel the layers. You can open the file
tree and you could look at the files.
You can open gits, you can push it to
GitHub, you can connect it to your
editor if you want, you can open it in
Emacs. So the cool thing about Replet,
yes, it is a vibe coding platform that
abstracts away all the complexities, but
all the layers are there for you to look
at.
>> Right. So let's go let's go back to um
that was great, but let's go back to you
said it. It it gives you that the a the
agent gives you you you say I've got my
idea. You plug it in and it says it
gives you this list of things and then
you and then when you describe it you
said I'm going to do this. I'm going to
do that. The eye there in that case was
the agent as opposed to the user. Yes.
>> And so the the agent lists the set of
things that it's going to do and then
the agent actually does those things.
>> Agent does those things. Yeah. That
that's a that's a that's a very
important point. when we did this shift,
we hadn't realized internally at Replet
how much the actual user stopped being
the human user and it's actually the
agent programmer,
>> right?
>> So, one really uh funny thing happened
is we had servers in Asia uh and we the
reason we had servers in Asia because we
wanted our Indian or you know Japanese
users to be to have a you know shorter
uh time to the servers. uh when we
launched the agent their experience got
significantly worse and we're like what
happened like it's supposed to be faster
well turns out it's worse it's because
the AIS are sitting in uh in United
States and so the the programmer is
actually in United States it's you're
sending the request to the programmer
and the programmer is interfacing with a
machine across the world and so yes
suddenly the agent is the programmer
okay so like the the ter ter you know
new terminology agent is a software
program that is basically using the rest
of the
as if it were a as if it were a human
user, but it's not. It's a it's a bot.
>> That's right. It has access to tools
such as write a file, edit a file,
delete a file, uh uh search the package
index, install a package, uh provision a
database, provision object object
storage. It is a programmer that has the
tools and interface. It has a sort of an
interface
>> that that is very similar to a human
programmer. And then um you know we'll
talk more about how this all works but a
debate inside the AI industry um is with
these was kind of this you know this
idea now of having agents that do things
on your behalf and then go out you know
go go out and kind of accomplish
missions. Um there's this you know kind
of debate which is okay how like
obviously you know it's a big deal even
to have an AI agent that can do
relatively simple things to do complex
things of course is you know one of the
great technical challenges of the last
80 years you know to to do that and then
there's sort of this question of like
can the agent go out and run and operate
on its own for 5 minutes you know for
for 15 minutes for an hour for 8 hours
and and meaning like sort of like how
long does it maintain coherence like how
long does it actually like stay in full
control of it of its faculties and not
kind of spin out because at least the
early early agents or the the early AIS,
if if you set them off to do this, they
might be able to run for two or three
minutes, then they would they would
start to get confused and go down rabbit
holes and, you know, kind of kind of
spin out. Um, more recently, more
recently, um, uh, you know, we've seen
that that that that agents can run a lot
longer and and do more complex tasks.
Like, where are we on the curve of
agents being able to run for how long
and for what complexity tasks before
before they break?
>> That's that's absolutely the the I think
the main metric we're looking at. even
back in 2023, you know, had the idea for
software agents, you know, four or five
years ago now. The problem every time we
attempt them, the the problem of
coherence, you know, they'll they'll go
on for a minute or two and then they'll
just, you know, they compound in errors
in a way that they just can't recover.
>> Um,
>> and you can actually see it, right?
Because they actually they actually, if
you watch watch them operate, they get
increasingly confused and then, you
know, maybe even deranged. Yeah, they
vary the range and they they go into
very weird areas and sometimes they
start speaking Chinese and doing really
weird things and um but I would say
sometime around last year we maybe
crossed the 3 four five minute mark
>> and it felt to us that okay we're on a
path where long re you know long horizon
reasoning is getting solved
>> uh and so we made we made a bet and I I
tell my team
>> so long horizon reasoning meaning
reasoning meaning like dealing in like
facts and logic
>> um in a in a sort of complex way and
then long horizon being over a long
period of time. Yes.
>> With many many steps to a reasoning
process.
>> Yeah, that's right. So if you think
about the way large language models work
is that they have a context. This
context is basically the memory all the
text all your prompt and also all the
internal talk that the AI is doing as
it's reasoning. So when the AI is
reasoning it's actually talking to
itself. It's like oh now I need to go
set up a database. Well, what what kind
of tool do I have? Oh, there's a tool
here that says Postgress. Okay, let me
try using that. Okay, I use that. I got
feedback. Let me look at the feedback
and read it. And it'll read the
feedback. And so the that that prompt
box or context is where both the user
input, the environment input, and the
internal thoughts of the machine are all
within. It's sort of like a program
memory in in memory space. And so
reasoning over that was the challenge
for a long time. That's when AIs just
like went off track and now they're able
to kind of think through this entire
thing and and maintain coherence. And
there's there's now techniques around uh
compression of contacts. So there still
context length is still a problem,
right? So I would say LM today, you
know, they're marketed as a million uh
token uh length, which is like a million
words almost. uh in reality it's about
200,000 and then they start to struggle.
So we do a lot of uh you know we stop we
compress the memory. So if a memory if
if a portion of the memory is saying
that I'm getting all the logs from the
database you can summarize you know
paragraphs of logs with one statement or
the database setup that's it right and
so every once in a while we'll compress
the context so that we make sure we
maintain coherence so that there's a lot
of innovation happened outside of the
foundation models as well in order to to
enable that long context coherence. So
what was the what was the key technical
breakthrough at the in the foundation
models that made this possible do you
think?
>> I think it's RL I think it's uh
reinforcement learning. So the way
pre-training works is you know uh they
uh pre-training is a uh the first step
of training a large language model. It
reads a piece of text. It covers the
last words and tries to guess it. That's
how it's trained. That doesn't really
imply long context reasoning. it it you
know it it turns out to be very very
effective. It can learn language that
way. But the reason we weren't able to
move past that limitation is that that
modality of training just wasn't good
enough. And what you want is you want a
type of problem solving over a uh over
long context. So what reinforcement
learning uh uh especially from code
execution gave us is the ability to for
the machine to for the LLM to roll out
what we call trajectories in AI. So
trajectory is a uh stepbystep reasoning
chain in order to reach a solution. So
uh the way uh as I understand
reinforcement learning works is they put
the LM in a programming environment like
replet and say hey here's a pro here's a
codebase here's a bug in the codebase
and we want you to solve it. Um now the
human trainer already knows what the
solution would look like. So we have a
pull request that we have on GitHub so
we know exactly or we have a unit test
that we can run and verify the solution.
So what it does is it rolls out a lot of
different trajectories. Those they
sample the model and maybe one of those
trajectories will reach and a lot of
them will just go go off off track but
one of them will reach the solution by
solving the bug and it reinforces on
that. So that that gets a reward and the
model gets trained that okay you know
this is how you solve these type of
problems. So that's how we're able to
extend these reasoning chains.
>> Got it. and and how it's a two-part
question is how how how good how good
are the models now at long long long
reasoning and and I would say and how do
we know like how how is that
established? Um
there is a nonprofit called meter um
that is uh measuring
uh useful has a benchmark to measure uh
how long a model runs while maintaining
coherence and doing useful useful things
whether it's programming or other
benchmark tasks that they've done. uh
and they put up a paper I think uh late
last year that said every seven months
>> uh the the minutes that a model can run
is doubling.
>> So you go from 2 minutes to you know 4
minutes in 7 months I think they vastly
underestimated that.
>> Is that right? Vastly it's doubling.
It's doubling more often than 7 months.
>> We so Asian 3 we measure that you know
very closely uh and we measure that in
real tasks from real users. So we're not
doing benchmarking. We're actually doing
AB tests and we're looking at the data
that how users are successful or not.
>> For us, the the absolute sign of success
is you made an app and you published it.
Because when you publish it, you're
paying extra money. You're saying this
app is economically useful. I'm going to
publish it. So that's as clear-cut as
possible. And so what we're seeing is in
agent one, the agent could run for 2
minutes
>> and then and then perhaps struggle.
Agent two came out in February, it ran
for 20 minutes. Agent 3 200 minutes.
>> Okay,
>> 200. Some users are pushing it to like
12 hours and things like that. I'm less
confident that it is as good and when it
goes to these stratospheres, but at like
2 3 hours timeline, it is really it's
it's it's it's insanely good. And and
the main innovation outside of the
models is a verification loop. Actually,
uh I remember reading um a research
paper from Nvidia. So what Nvidia did is
they're trying to uh write um GPU
kernels uh using deepseek and that was
like perhaps 7 months ago when deepseek
came out and what they found is that if
we add a verifier in the loop if we can
run the kernel and verify it's working
we're able to run deepseeek for like 20
minutes and it it was generating
actually optimized kernels
>> and so I was like okay the next thing
for us obviously as an as a sort of a
agent lab or like applay our company.
We're not doing the foundation model
stuff, but we're doing a lot of research
on top of that. And so, okay, we know
that agents can run for 10 20 minutes
now or LLMs can stay coherent for
longer, but for you to push them to 200,
300 minutes, you need a verifier in the
loop. So, that's why we spend all our
time uh creating scaffolds to make it so
that the agent can spin up a browser and
do computer use style testing. So once
you put that in the middle, what's
happening is it works for 20 minutes uh
spin up another agent uh spins up a a
browser tests the work of the previous
agent. So it's a multi- aent system
>> and if it is uh if it founds a bug it
starts a new trajectory and says okay
good work let's summarize what you did
the last 20 minutes
>> now that be that plus what the bug that
we found that's a prompt for a new
trajectory right
>> so you stack those on each other and you
can go endlessly but
>> so it's like a mar like setting up a
marathon or like a relay race
>> as long as as long as each step is done
properly you could do in sort of an
infinite number of steps
>> that's right that's right you can always
compress the previous step into a
paragraph And that becomes a prompt. So
it's it's an agent prompting the next
agent.
>> Right. Right. Right. That's amazing. So
um and then when when an agent like when
a modern agent like running on modern
modern LM that are trained this way when
it let's say it runs for 200 minutes
like when you watch the agent run is it
like running is it like processing
through like logic and tasks at the same
pace that like a human being is or
slower or faster? I
>> it's actually I would say it is faster
but not that much significantly faster.
It's not at computer speed, right? What
we expect computer speed to be.
>> It's like watching a per like if you
watch if you if it's describing what
it's doing, it's sort of like watching a
person work.
>> It's like watching John Carmarmacine
work.
>> The world Okay. The world's
the world's best programmer.
>> Yeah.
>> The world's best programmer on a stim on
a stimulant.
>> On a stimulant. Yeah, that's right.
>> Working for you. Working for you.
>> Yeah. There. So, it's very fast and you
can see the uh file diffs running
through, but every once in a while it'll
stop and it'll start thinking. I'll show
you the reasoning. Yeah. It's like, I
did this and I did this. Am I on the
right track? It kind of really tries to
reflect right?
>> Uh and then it might review its work and
decide the next step or it might kick
into the testing agent or you know, so
so you're seeing it do all of that and
every once in a while it calls the tool
for example, it stops and says, well, we
ran into an issue. you know, Postgress
um 15 is not um compatible with this,
you know, database ORM package that that
I have.
>> Um okay, this is a problem I haven't
seen before. I'm going to go search the
web. So, it has a web search tool. Go do
that. And so, it looks like a human
programmer right?
>> And it's really fascinating to watch.
It's one of my favorite things to do is
just to watch the tool chain and
reasoning chain and the testing chain.
And it's yeah it is like watching a
hyperproductive programmer
>> right so you know we're kind of getting
into here kind of the holy grail of AI
which is sort of you know generalized
reasoning um you know by the machine um
so uh you mentioned this a couple times
but this idea of a of a verification so
so just for folks on the listening to
podcast who maybe aren't in the details
let me try to describe this and see see
if I have it right so like a just a just
a large language model the way you would
exper you would have experienced with
like Chad GPT out of the gate two years
ago or whatever would have been it's
like And it's incredible how fluid uh it
is at language. Um it's incredible how
good it is at like writing Shakespearean
sonnetss or rap lyrics. It's it's
amazing how good it is at human
conversation. But if you start to ask it
like problems that involve like rational
thinking uh or problem solving all of a
sudden like you math or the math the
whole show and and in the very beginning
it was you could ask if you ask it very
basic math problems, you know, it would
it would not be able to do them.
>> That's right. Uh but then even when it
got better at those, if you started to
ask it to like, you know, it it could
maybe add two small numbers together,
but it couldn't add two large numbers
together. Or if it could add two large
numbers, it couldn't multiply them.
Yeah.
>> And it's just like, all right, this is
And then it had this there was this
famous the famous was the straw the
strawberry test, the famous strawberry
test, which is how many Rs are in the
word strawberry.
>> That's right.
>> And there was this long period where it
it kept it would it would just guess
wrong. It would say there were only two
Rs in the word strawberry. And it turns
out there are three. Um, so, um, so it
it was this thing and so people were and
there was even this term that was being
used kind of the the slur that was being
used at the time was stoastic parrot.
>> Yeah,
>> I was thinking clanker.
>> Well, well, clanker is the is the new
slur. Clank clanker. Clanker is just the
full-on racial slur against AI as a
species. Um, but the technical critique
was so-called stoastic par stoastic
means random. Uh so sort of random
parrot me meaning basically that this
thing was sort of the large language
models were like a they were like a
mirage where they were like repeating
back to you things that they thought
that you wanted to hear but they didn't
>> in a way it's true in the in the pure
pre-training LLM world
>> right for the for the very basic layer
but then what happened is as you said
over the last year or something there
there was this layering in of of
reinforcement learning and then but the
key to
>> it's not new crucially it's like it's
alpha go right so
>> describe so describe that for a second.
Yeah. So we we had this breakthrough
before in uh 2015 was the Alph Go
breakthrough. I think 2015 2016 where it
is emerging of sort of uh you know the
the the you would know a lot better than
me the old AI debate between the
connectionists uh the the people who who
thinks neuronet networks are the true
sort of way of doing AI and the symbolic
systems I think or like the people that
think that you know discrete reasonings
fates and knowledge bases whatever this
is the way to go and so there was there
was a merging of these two worlds where
the way AlphaG go worked is it had a
neural network but it had a Monte Carlo
research algorithm on top of that. So
the neural network would generate uh
would would like uh generate a list of
potential moves uh and then you had a
more discrete algorithm sort those moves
and find the best based on just uh tree
search based on just trying to verify
again this sort of a verifier in the
loop trying to verify which move might
yield the best based on more classical
way of doing algorithms. Um, and so that
that's a resurgence of of that movement
where we have this amazing generative uh
neural network that is the the LLM and
now let's layer on more discrete ways of
trying to verify whether it's doing the
right thing or not and let's put that in
a training loop and once you do that the
LLM will start gaining new capabilities
such as uh reasoning over math and code
and things like that.
>> Exactly. Right. Okay. And then that's
great. And then and then the the key
thing there though for for RL to work
for LMS to reason the key is that it be
a a problem statement that there is a
defined and verifiable answer. That's
right. Is that right? And so and and and
you might think about this as like let's
give a bunch of examples like in
medicine this might be like um you know
a diagnosis that like a panel of human
doctors agrees with um or or or by the
way or a diagnosis that actually you
know solves the condition. Um in law
this would be a um you know a a argument
that in front of a jury actually results
in an acquitt or or something like that.
Um in u math it's an equation that
actually solves properly. Uh in physics
it's a result that actually works in the
real world.
>> I don't know in civil engineering it's a
bridge that doesn't collapse. Right. So
so so there there there's always some
some test is that the first two do not
work very well just yet. like the the
like I would say uh law and healthcare
they're still a little too squishy a
little too soft it's unlike math or code
like the way that they're training on
math they're using this uh sort of like
a program language uh provable language
called lean for proofs right so you can
run a lean statement you can run a
computer code uh perhaps you can run a
physics simulation or civil engineering
uh sort of physics simulation but you
can't run a diagnosis
>> okay So I would say the
>> but you could verify it with human
answers or or not.
>> Yeah. So that that's a more RL HF in a
way. So it is not the like sort of
autonomous RL train like fully scalable
autonomous which is why coding is moving
faster than any other domain is because
we can we we can generate these problems
and verify them on the fly. But there's
two but with coding as anybody who's
coded knows there's coding there's two
tests which is one is does the code
compile
>> right
>> and then the other is does it produce
the right output and just because it
compiles doesn't mean it produces the
right output and I you tell me but
verifying that it's the correct output
is harder
>> yeah sobbench
is a collection of uh verified pull
request end state uh so so it is it is
not just about compiling we so they they
group of scientists sobbench is the main
benchmark used to test whether AI is
good at software engineering tasks and
we're almost saturating that. So last
year we're at like maybe 5% early 24 or
less and now we're like 82% or something
like that with cloth on at 4.5 that's
state-of-the-art and that's like a
really nice health climb that's
happening right now. uh and basically
they went and looked on GitHub. They
found the the you know most complex
repositories. They found bug statements
that are very clear uh and they found
ProQuest that actually solve those bug
statements with unit tests and
everything. So there is an existing
corpus on GitHub of tasks that that the
AIS can can solve and you can also
generate them. Those are not too hard to
to generate uh you know what's called
synthetic uh data. Uh uh but but you're
right it's not infinitely scalable um
because you you some human verifiers
still need to kind of look at the at the
task but maybe the foundation models
have found a way to have the synthetic
training go all the way
>> right and then what's happening I think
I think because what's happening is the
foundation model companies are in some
cases they are hire they're actually
hiring human experts to generate new
training data. Yes.
>> So they're actually hiring
mathematicians and physicists and coders
to basically sit and you know they're
they're hiring they're they're hiring
human programmers putting them on the
cocaine. Yes.
>> Um and having them probably coffee um uh
and having them actually write code and
then and then write code in a way where
there's a known result of the code
running such that the this RO loop can
be trained properly. That's right. And
then the other the other and then the
other thing these companies are doing is
as you said they're building systems
where the software itself generates the
training data, generates the tests,
generates the valid the validated
results and and that's soal synthetic
training data.
>> That's right. And but yeah, but but
again those work in the very hard
domains. It works to some extent in the
software domains
>> and I think there's some transfer
learning we can you can see the
reasoning work when it comes to you know
tools like deep research and things like
that but we're not making as fast as
progress in the in the more soft domain.
>> So so say softer domains meaning like
domains in which it's harder harder or
even impossible to actually verify
correctness of of result in a sort of a
deterministic factual grounded
>> non-controversial way. Like if you have
a a chronic disease, you could you could
have you know you have a POTS or uh you
know whatever EDS syndrome or and
they're all they're all clusters and
it's because it it is the domain of
abstraction. It is not as concrete as
code and math and things like that. So I
think there's still a long ways to go
there.
>> Right. So sort of the more concrete the
problem like it's the concretness of the
problem that is the key variable not the
difficulty of the problem. Would that be
a way to think about it?
>> Yeah. Yeah. I think the the uh
concreteness in a sense of can you get a
true or false ver verifiable
>> right but like in any domain in any
domain of human effort in which there's
a verifiable answer we should expect
extremely rapid progress.
>> Yes.
>> Right.
>> Yes. Absolutely. And I I think that's
what we're saying.
>> Right. And that and that for sure
includes math. That for sure includes
physics for sure includes chemistry. For
sure includes
>> large areas of code.
>> That's right.
>> Right. What what else does that include
do you think?
>> Bio like we're seeing with a protein
>> genomic. Yeah. Okay. Right.
>> Yeah. Yeah. Things like that. I think
some some areas of robotics, right? Um
there's a clear outcome, right?
>> Uh but but it's not that many. I mean,
surprisingly,
>> well, it depends.
>> Yeah, depends on your point of view.
That's some people might say that's a
lot. Um so, uh and then um you you
mentioned that we you mentioned the pace
of improvement. So, what would you
expect from the pace of improvement
going forward for this?
>> I I think we're we're ripping on coding.
Like I think I think it's just it's
going like I think it's going to be like
what we're working on with with agent
floor right now um is by by next year we
think you're going to be sitting instead
of rep in front of replet and you're
shooting off multiple agents at a time.
You're like planning a new feature. Um
so I I want you know social network on
top of my storefront and another one is
like hey um refactor the database. Hey,
in and you're running parallel agents.
So, you have five 10 agents kind of
working in the background and they're
merging the code and taking care of all
of that, but you also have a really nice
interface on top of that that you're
doing design and you're interacting with
AI in a more creative way. Uh maybe
using visuals and charts and things like
that. So, there's a multimodal angle of
that of that interaction. So I think you
know creating software is going to be
such an exciting
uh area and and and I think that the lay
person will be as good as a what a
senior software engineer that works at
Google uh is today. So I think I think
that's happening very soon. Um but but
you know I don't see them and be curious
about your point of view but like my
experience between as as a sort of a you
know on the let's say healthcare side or
more you know write me an essay side or
more creative side haven't seen as much
of a rapid improvement as what we're
seeing in code. So so I think I think
code is going to go to the moon. Math is
probably as well some some you know
scientific domains bio things like that
those are are going to move really fast.
>> Yeah. So there's this there's this
there's this weird dynamic see if you
agree with this and Eric also curious
your point of view on this like there's
this weird dynamic that we have and we
have this in the office here a lot and I
also have this with like the leading of
entrepreneurs a lot which is this thing
of like
>> like wow this is the most amazing
technology ever and it's moving really
fast and yet we're still like really
disappointed um and like it's not moving
fast enough and like it's like maybe
right on the verge of stalling out
>> and like you know we should both be like
hyper excited but also on the verge of
like slitting our wrists because like
you know the gravy train is coming to an
end,
>> right? And and I always wonder it's like
you know on the one hand it's like okay
like you know not all I don't know
ladders go to the moon like just because
something you know looks like it works
or you know doesn't mean it's going to
you know be able to you're going to be
able to scale it up and have it work you
know to the fullest extent. Um uh you
know so like it's important to like
recognize practical limits and not just
extrapolate everything to infinity. Um
on the other hand like you know we're
dealing with magic here that we I think
probably all would have thought was
impossible 5 years ago or certainly 10
years ago.
>> Like I I didn't you know look I I you
know I got my CS degree in the late '
80s early 90s. I I never I didn't think
I would live to see any of this, right?
Like this is just amazing that this is
actually happening in in in my lifetime.
>> Um
>> but but there's a huge bet on AGI,
right? like whether it's the foundation
models uh I think you know now the
entire US economy is sort of a bet on
AGI and and there are crucial questions
to ask whether are we on track to AGI or
not because there are some ways that I
can tell you it doesn't seem like we're
on track to AGI because we uh because
there doesn't seem to be transfer
learning across these domains that are
that are you know significance right so
if we get a lot better at code
we're not immediately getting better at
like generalized reasoning we need to go
also you know get training data and
create RL environment for bio or
chemistry or physics or math or law or
so so and this this has been the sort of
point of discussion now in the AI
community after the Darkish and Richard
Sutton uh interview where uh you know
Richard Sutton kind of poured this cold
water on the on the bitter lesson. So
everyone was using this uh essay that he
wrote called the bitter lesson. The idea
is that there are um infinitely scalable
ways of uh doing uh uh AI research and
and and and anytime you can pour more
compute and more data and go more
performance out you're just you know
that's the ultimate way of getting to
AGI and some people you know interpreted
that interview that perhaps he's
doubtful that even we're even on a on a
bitter uh lessened path here and perhaps
the current training regime is actually
very much the opposite in which we we
are so dependent on human data and human
annotation and and all of that stuff. So
I think the I I agree with you. I mean
as a company we're we're excited about
where things are headed but but there's
there's a question of like are we on
track to AGI or not and be curious what
you think. So, so and you know Ilia I
think you know Ilioskcover makes a makes
a specific form of this argument which
is basically like we're just literally
running out of training data. It's a
fossil fuel argument right like if we
slurped all the training fundamentally
we've slurped all the data off the
internet that is where almost all the
data is at this point. There's a little
bit more data that's in like you know
private dark pool somewhere that we're
going to go get but like
>> we have it all and then right we're
we're in this business now trying to
generate new data but generating new
data is hard and expensive you know
compared to just like slurping things
off the internet. So
>> there are these arguments. Um you know
having said that you know you get into
definitional questions here really quick
which are kind of a rabbit hole but
having said that like you mentioned
transfer learning. So transfer learning
is the ability of the machine to right
to be an expert in one domain and then
and then generalize that into another
domain.
>> My answer to that is like have you met
people?
>> Um and how many people do you know are
able to do transfer learning?
>> Not many. Right. Well because there's
>> quite the opposite actually. The nerdier
they are in a certain domain the kind of
you know often they have blind spots. We
joke about how everyone's just [ __ ]
in one area or they make some like
massive mistake and and like don't trust
them on this but on this other topic you
know
>> right? Yeah. Well and this is a
well-known thing among like for example
public intellect. So this happens
there's actually been whole books
written about this on so-called public
intellectuals. So you get these people
who show up on TV and they're experts
and what happens is they're like an
expert in economics right and then they
show up on TV and they talk about
politics and they don't know anything
about politics right or they don't know
anything about like medicine or they
don't know anything about the law or
they don't know anything about
computers. You know, this is the Paul
Gregman talking about how the internet's
going to be no more significant than the
fax machine.
>> Facts.
>> He's a brilliant economist. He has no
idea what a how a computer works.
>> Is he a brilliant economist?
>> Well, at one at one point at one point
at one point, let's get even if even if
he's a brilliant Well, this is the thing
like what does that mean? Like should a
brilliant economist be able to
extrapolate, you know, the internet is
is a good question. But um but the point
being like even if he is a you know,
take any take anybody Oh, by the way, or
like Ein like Einstein's like actually
my favorite example. I think you'd agree
Einstein was a brilliant physicist.
>> He was like a he was he was a Stalinist.
Like he was just he was Yeah. He was a
socialist and he was a Stalinist and he
was like he thought like Stalin was
fantastic.
>> Out still.
>> Yeah. Okay. All right.
>> True socialism.
>> All right. All right. Einstein, you
know, I'll I'll
I'll take your word for it. But like
once he got into politics, he was just
like totally loopy or or you know, even
right or wrong. It's just he just
sounded like all of a sudden like an
undergraduate lunatic, like somebody in
a dorm room. Like he there was no
transfer learning from physics into
politics. like he he didn't listen right
or wrong he didn't there was no there
was clearly there was nothing new in his
political analysis it was the same wrote
routine [ __ ] you get out of
>> you know yeah so so in a way the
argument you're making is like we maybe
already a human level AI I mean perhaps
the definition of AGI is is is something
totally different is like above human
level that something that totally
generalizes across domains it's it's not
something that we've seen
>> yeah like we've ideal yeah I was saying
we we've we've and you know look we
should we should shoot big but we we've
idealized a a we've idealized a goal
um that may be idealized in a way that
like number one it's just it it's it's
like so far beyond what people can do
that it's you're no longer it's no
longer relevant comparison to people and
and usually AGI is defined as you know
able to do everything better than a
person can
>> and it's like well okay so if doing
everything better than a person can it's
like if a person can't do any transfer
learning at all
>> right doing even a little little bit a
marginal bit might might actually be
better or it might not matter just
because no no human can do it and so
therefore you just you just stack up the
domains there's also this well-known
phenomenon in AI with you know t
typically this works the other way which
there's a phenomenon AI AI engineers
always complain about and scientists
always complain about which is the
definition of AI is always the next
thing that that the machine can't do and
so like the definition for of AI for a
long time was like can it beat humans at
chess
>> and then the minute it could beat humans
at chess that was no longer AI that was
just like oh that's just like boring
>> that's computer chess it became
>> computer chess it's just like boring and
now it's an app on your iPhone and
nobody nobody and nobody cares right and
it's immediately then
>> cheering test was the test and then we
passed it and nobody
>> we blew this is a really big deal
>> there was no celebration
>> there was no parties That's exactly
right. There was no for 80 years the
Turing test I mean they made a movie
about it like the whole thing that was
the thing and like we blew right through
it and nobody even registered it. Nobody
cares. It gets no credit for it. We're
just like ah it's still you know
complete p piece of [ __ ] like
>> right and so there's this thing where so
the AI scientists are are are used to
complaining basically that they're that
they're they're being they're always
being judged against the next thing as
opposed to all the things they've
already they've already solved.
>> Um uh but but that's maybe the other
side of it which is they're also putting
out for themselves um an unreasonable
goal. an an unreasonable goal and then
doing this sort of self flagagillation
kind of along the way and and and I I
kind of wonder yeah I I wonder kind of
which way that cuts.
>> Yeah. Yeah. It's an interesting question
like I started thinking about this idea
of like it doesn't matter whether it's
truly AGI and the way I define AGI is
that you put in a AI system in any
environment and efficiently learns right
>> um you know it doesn't have to have that
much prior knowledge in order to kind of
learn something but also can transfer
that knowledge across different domains.
Um but you know we can get to like
functional AGI and what functional AGI
is is just yeah collect data on every uh
useful uh economic activity in uh in the
world today and train an LLM on top of
that or train the same foundation model
on top of that and and we we'll go we'll
target every sector economy and and you
can automate a big part of labor that
way. So I think I think yeah I think
we're on that track for sure.
>> Right. um you tweeted after GBG5 came
out that you were feeling the
diminishing returns. Yeah. What were you
expecting and but and and what needs to
be done? Do we need another breakthrough
to get back to the pace of growth or
what are your thoughts there?
>> I mean this this whole discussion is is
sort of about that and and my feeling is
that uh you know GPT5 uh got good at
verifiable domains. It didn't feel that
much better at anything else. the more
human angle of it. It felt like it
regressed and like you had this uh sort
of uh Reddit pitchfork uh sort of uh
movement against against Sam and Open AI
because they felt like they lost a
friend. Gupta felt a lot more human and
closer uh whereas GT5 felt a lot more
robotic, you know, very in its head kind
of trying to think through through
everything. And um and so I I I would
have just expected like when we went
from GP2 2 to 3, it was clear it was
getting a lot more human. It was uh a
lot closer to our experience. It can you
can feel like it's actually all it gets
me like there's something about it that
understands the world better. Similarly
3 to four to five didn't feel like it
was a better overall
being as it were. But is that is that is
that is that a is the question there
like is it emotionality? Is it partly
emotionality but but again partly like I
like to ask models like very
controversial uh things. Um can it
reason through
uh I don't know how deep we want to go
here but like um what happened with
World Trade 7,
>> right?
>> Sure.
>> It's an interesting question, right?
Like I'm not I'm not putting out a
theory, but like it's interesting like
how did it you know and and can it can
it think through controversial questions
>> in the same way that it can go think
through a coding problem and there
hasn't been any movement there like the
all the reasoning and all of that stuff
I haven't se and not just that you know
that's a cute example but like um co
right like you know the origins of co
right
>> you know go you know dig up GPT4 or
other models
and go to GPT5, you're not going to find
that much difference of okay, let's
reason together. Let's try to figure out
what was the origins of CO because it's
still an unanswered question, you know,
and I don't see them making progress in
that. I mean, you play a lot with them.
Do you feel like
>> I use it differently? I don't know,
maybe I have different expectations. Um,
I I'm I the way I my main use case
actually is sort of sort of PhD and
everything at my beck and call. Um, and
so I'm I'm trying to get it to explain
things to me more than I'm trying to
like, you know, have conversations with
it. Maybe maybe I'm just unusual with
that. But
>> and that that that gets back
>> well. So what I what I what I found
specifically is uh a combination of like
GPT5 Pro plus deep reasoning or like
Rock 4 heavy like the you know the the
highest end models um u like that um you
know they now basically generate 30 to
40 page you know essentially books on
demand on any topic. Um and so anytime I
get curious about something you just
take it maybe it's my version of it but
it's something like I don't like a good
here's a good example. um when when when
a when an advanced economy puts a tariff
on on on a raw m you know on a raw
material or on a finished good like who
pays
>> you know is it is it the consumer is it
the is it the importer is it the
exporter or is it the producer and and
this actually a very complicated it
turns out very complicated question it's
a big big big thing that economists
study a lot and it's just like okay who
you know who pays and what I found like
for that kind of thing is it's
outstanding
>> well well but but it's outstanding at um
sort of going out of the web getting
information synthesizing it
>> correct it it gives me it gives me a
synthesized 20 30 40 p basically tops
out 40 p 40 40 pages of PDF. Yeah.
>> Um uh but I can get I can get up to 40
pages of PDF but it's a completely
coherent and as far as I can tell for
everything I've cross-cheed a completely
like it like world class like if I hired
you know for a question like that if I
hired like a great you know econ
posttock at Stanford who just like went
out and did that work like it would
maybe be that good.
>> Yeah. Um but then but then of course the
significance is it's like it's like you
know at least for this is true for many
domains you know kind of PhD and
everything and so
>> but but this is synthesizing knowledge
not trying to create new knowledge.
>> Well but this this gets to the this sort
of you know of course the you get into
the angels dancing on the head of a pin
thing which is like what what you know
what's the difference how many how much
new knowledge ever actually is there
anyway? What do you actually expect from
people when you ask them questions? Um,
and so what what I'm looking for is
like, yes, explain this to me in like
the the the clearest, most
sophisticated, most complex, most like
complete way that it's possible for
somebody to, you know, for a real expert
to be able to to to explain things to
me.
>> Um, and that's what I use it for. And
again, as far as I can tell from the
crossing, like I'm getting, you know,
like almost like basically 100 out of
100, like I don't even think I've had an
issue in months where it's like had had
a problem in it.
>> And it's like, yeah, you can say, yeah,
synthesizing is supposed to create new
information, but like it's it's
generating a 40 p. He's basically
generating a 40-page book.
>> That's amazing.
>> That's like incredibly like fluid. It's,
you know, it's it's it's it's you know,
the the logical coherence of the entire
like it's it's a great writing. Like if
if you if you evaluated an a a human
author on it, you would say, "Wow,
that's a great author." You know, do are
people who write books, you know,
creating new knowledge? Well, yeah.
Well, sort of not because a lot of what
they're doing is building on everything
that came before them is synthesizing a
mind, but also like a book is a creative
accomplishment, right? And so,
>> yeah, one of the thing I'm I'm I'm I'm
interested in I'm hoping AI could help
us solve is just like how confusing the
information ecosystem right now. You
know, everything feels like propaganda.
Like it doesn't feel like you're getting
real information from anywhere. So, I I
really want an AI that could help me
reason from first principles about
what's happening in the world for me to
actually get real information. and and
maybe that's an unreasonable sort of ask
of of the AI researchers, but but I
don't think we're we have made any
progress there. So maybe I'm over focus.
Yeah, maybe I'm over in being in my my
line or maybe I'm over focused on ar
arguing at people as opposed to um
trying to get as trying to get underline
truth. But well here here's the thing I
I do a lot with this is I just say like
take take a provocative point of view um
and then steel man the position take
your co thing steel so I often I often
pair these steel man the position that
it was a lab leak um and steel man the
position that it was natural origins
>> um and and again like is this creativity
or not? I don't know. But like what
comes back is like 30 pages each of like
wow like that is like the most
compelling case in the world I can
imagine with like every you know
everything marshaled against it like the
argument structured in like the most
possible
>> part of the reason that started
happening is because it stopped being
taboo to talk about a human origin when
it was taboo
>> the the AIS would like talk will you
know talk down to like oh you're a
conspiracy theorist and so yes uh so
there's a there's a you know period of
time and so it takes something truly
controversial and they actually they
they can't reason about it because of
all RLHF and answers all the limitations
and as as you know I won't pick no
specific ones here but like there there
are certain certain big models that will
still lecture you
>> that you're a bad person for asking that
question but but you know like I just
there some of them are just like really
really open now to you know being able
to do these things
>> um and then um uh yeah so um okay uh
yeah so okay so yeah so there's this
yeah so so basically like ultimately
what you're looking for like the
ultimate thing would be if there's
something that's like I don't I think
anybody's really defined this really
well because it's not because again it's
like the conventional all the
conventional definitions of AGI are like
basically comparing to people.
>> Yeah.
>> And there there and there it's always
like you know it's the conventional
explanations of of of um of uh of AGI
always for me struck me a lot like the
debate around like whether a
self-driving car works or not which is
is does a self-driving car work because
it's a perfect driver uh or does it work
because it is a is better than the human
driver and better than the human driver
I think is actually quite you know just
like with the the chess thing and the go
thing. I actually think like that that
that's like a real thing. And then and
then and then and then there's the like
is it a perfect driver which is you know
what they're obviously the the
self-driving car companies are working
for but then I think you're looking for
something beyond the perfect driver.
You're looking for the car who like
knows where to go. So I I I I'm of two
minds, right? So one mind is the sort of
practical entrepreneur, right?
>> Uh and I just I have so many toys to
play with to build like stop AI progress
today and replet will continue to get
better for the next 5 years. like
there's so much we can do just on the
app uh app layer and the infrastructure
layer.
>> So you know I but but I think that will
you know the the foundation models will
continue to get better as well and so
it's it's a very exciting time in our
industry. Um the other mind is more
academic because as a kid I've always
been interested in the nature of
consciousness, nature of intelligence. I
was always interested in AI and reading
the literature there and I would point
to the RL uh literature. So Richard
Soden, there's another guy I think
co-founder of deep mind Shane Lag wrote
wrote a paper trying to define what AGI
is. Um and in there I think that the
definition of AI I think is the is the
original perhaps correct one which is uh
efficient continual learning.
>> Okay. Like if you if you truly want to
build an artificial general intelligence
that you can drop in any domain, you can
drop in a car without that much prior
knowledge about cars and within um you
know how long does it take a human to to
learn how to drive you within months be
able to drive a car really well, you
know, generalized skill sort of
generalized skill acquisition,
generalized understanding acquisition,
generalized reasoning acquisition. And I
think that's the thing that will like
truly change the world. That's the thing
that would give us a better
understanding of of the human mind of
human consciousness and that's the thing
that will like propel us to the next uh
level of human civilization.
on a civilization level that's a really
deep question but separ
but but there's an academic aspect of it
that I'm really
>> so what and what odds what if we're on
if we're on Kelsey today what what odds
do we place on that
>> I I I'm kind of bearish on on on true
AGI breakthrough because
>> what we built is so useful and
economically valuable uh so in a way
>> good enough good enough is the enemy
Yeah. Yeah. Do do you remember that
essay? Um,
>> worse is better.
>> Worse is better.
>> Worse is better. Worse is better. And
and
>> so there's like a local there's like a
trap. There's like a local local maximum
trap. We're in a local maximum
>> local maximum trap where it's because
it's because it's good enough for so
much economically productive work.
>> Yes.
>> It relieves the pressure um in the
system to create the generalized answer.
>> Yes. And and then you have the weirdos
like Rich Sutton and others that are
still trying to go that down that path
and maybe they'll succeed,
>> right? Uh but there's enormous
optimization energy behind the current
thing that we're hell climbing on this
like local maximum.
>> Right. Right. Right. And and the irony
of it is everybody's worried about like
the you know the gazillions of dollars
going into building out all this stuff
and and so the the the most ironic thing
in the world would be if the gazillions
of dollars are going into the local
maximum.
>> That's right.
>> As as opposed to a counterfactual world
in which they're going into solving the
general problem.
>> But but it's also potentially
irrational. Like maybe the general
problem is actually you know not within
our lifetimes. Who knows? Right. Um, h
>> how much further do you think like do
you think we squeeze most of the juice
out of out of LLMs in general then? Or
are there any other research directions
that you're particularly um excited
about?
>> Well, that's the thing. I think the
problem is there aren't that that many.
I I think the the the breakthroughs in
RL are incredibly exciting, but we also
knew about them now for like over 10
years where you marry generative uh
systems with uh with tree search and
things like that. Um but but there's a
lot more to go there and I think again
the the the the original minds behind
reinforcement learning are trying to go
down that path and try to kind of
bootstrap intelligence from scratch. Uh
Carmarmac is is going down that path as
as far as I understand Carmarmac you
guys may be invested but the the you
know they're they're not trying to go
down the LLM path. So there are people
that are trying to do that but I'm not
seeing a lot of progress or outcome
there but I watch it kind of from far.
Although, you know, for all we know,
it's already there's already a bot on X
somewhere.
>> You know, you know, you never know. It
might not be a big announcement. It
might just be a you know, one day
there's just like a bot on X that starts
winning all the arguments.
>> Yeah, it could be
>> or a code a user read and all of a
sudden it's
>> generating incredible software. Um,
okay. Let's uh let's spend our remaining
minutes. Let's let's let's talk about
you. So, uh so uh so how so yeah, take
us start from the beginning with your uh
with your life and how how did you get
how did you get from being born to being
in Silicon Valley?
>> Okay.
um
>> in two minutes. Yeah,
>> I'm just I'm joking. But
>> yeah, I I got introduced to computers uh
very very early on. And so for whatever
reason, so I was born in Aman, Jordan
>> and for whatever reason, my my dad who
was just a government engineer at the
time uh decided that computers were
important and he didn't have a lot of
money took out of that, bought a
computer. It was the first computer in
their in our neighborhood. first
computer of anyone I know. And I just
one of my earliest memories I was 6
years old just watching my dad unpack
this machine and sort of open up this
huge manual and kind of finger type CD
LS MKDIR and like I would, you know, be
behind his shoulder and just like
watching him, you know, type these
commands and seeing the sort of machine
kind of respond and do exactly what he's
asked it to do. Um,
>> pop in Tylenol as your
Exactly.
Autism activated.
>> Of course, you have to.
>> You have to.
>> Exactly. What kind of um what kind of
computer was it?
>> Uh it was uh an IBM as far as I
remember. It was IBM PC.
>> What year was this? Uh 1993.
>> 1993. Okay. So, it's DOS. So, did it
have Windows at that point or
>> No, it didn't have Windows.
>> Right before Windows. right before
Windows, but I think Windows had been
out, but you would add
>> you it was an add-on. You wouldn't boot
it up. So, we I think we bought the disc
for uh for uh for Windows and you had to
kind of uh bootloaded, you know, from
the disk and and it will open Windows
and you can click around. It wasn't that
interesting cuz there wasn't a lot on
it. So, a lot of time I just spend in
DOSs and writing batch files and opening
games and messing around with that. Um
but it wasn't until Visual Basic that I
started. So like after Windows 95 that I
started making real software, right?
>> Uh and the first idea I had was um I I
used to be a huge gamer. So I used I
used to go to these uh Lang gaming cafes
and play Counter Strike and I would go
there and you know the whole thing is
full of computers but they don't use any
software to run their business. M it was
just like people running around just
like writing down your machine number,
how much time you spend on it and how
much did you pay and kind of tapping
your shoulders like hey you need to pay
a little more for that. And I asked him
like why don't you like just build a
piece of software that allows me to log
in and have a time or whatever. And I
was like yeah we don't know how to do
that. And I was like okay I think I know
how to do that. So I spent I was like 12
or something like that. I spent like 2
years building that uh and then went out
and tried to sell it and was able to
sell it uh and was making so much money.
I remember McDonald's opened uh in
Jordan uh around the time when I was 13
14. I took my entire class to
McDonald's. It was very expensive, but I
was balling it all this money and I was
showing off um and uh and so that was
the first uh business that I uh created.
And then when it came to and at the time
I started kind of learning about AI, you
know, reading sci-fi and all of that
stuff. And when it came time to go to
college, uh I didn't want to go to
computer science because I felt like
coding is on its way to get automated. I
remember using these um wizards. Do you
remember wizards?
>> Yes.
>> Wizards basically. It's like extremely
crude early bots or that generate code.
Yeah.
>> Yeah. And I remember you could like, you
know, type in a few things like here's
my project, here's what it does,
whatever, and then click click click and
just like scaffold a lot of code. I was
like, oh, I think that's the future.
Like coding is such a
>> it's almost
>> yeah it's solved you know why why should
I go into coding I was okay if AI can do
the code what should I do well someone
needs to build and maintain the
computers and so I went to the computer
engineering and and and did that for a
while uh but then rediscovered my love
for for programming uh reading program
essays on lisp and things like that and
uh started messing around with scheme
and programming languages like that um
but then I found it incredibly difficult
to just like learn different programming
languages. I didn't have a laptop at the
time. And so every time I go to like
wanting to learn Python or Java, I would
go to the computer lab, download
gigabytes of software, try to set it up,
type a little bit of code, try to run
it, you know, run into missing DL issue
or and I was like, man, this is so
primitive. Like at the time it was 2008
something like that you know we had uh
Google Docs, we had Gmail, you could
like open the browser uh and partly
thanks to you and be able to kind of uh
use software on the internet and I
thought the web is the ultimate software
platform like everything should go on
the web. Okay, who's building an online
development environment, right? And and
no one, right? And it felt like I w I
found like $100 bill on the, you know,
on the floor of Grand Association. Like
surely someone should be building this,
but no, no one was building this. And so
I was like, okay, I'll I'll try to build
it.
>> And I got something done in like a
couple hours. Uh, which was a text box.
You type in some JavaScript. We And
there's a there's a button that says
eval. You click eval and evaluates. It
shows you in a in an alert box, right?
Right. So oneplus 1 2 I was like oh I
have a programming environment. I showed
it to my friends people started using
it. I added a few additional things like
saving the program. I was like okay all
right this is there's there's a real
idea here. People love it. And then
again it took me two two or three years
to actually be able to build anything
because you know the browser can only
run JavaScript. And it took a
breakthrough at the time. Uh, Moisella
had a research project called mcripton
that allowed you to uh compile different
uh programming languages like C, C++
into JavaScript. And for the browser to
be able to run something like Python, I
needed to compile C, Python to
JavaScript. So I was the first to do it
in the world. uh so built uh contributed
to that project and built a lot of the
scaffolding around it and we uh my
friends and I compiled Python into
JavaScript and I was like okay we did it
for Python let's do it for Ruby let's do
it for Lo and that's how the emergence
of the idea for replet came is that when
you need a ripple you should get it you
should replet it and so ripple is is the
most primitive programming environment
possible so I added all these
programming languages and again all this
time my friends were using it and
excited about
And I was on GitHub at the time and just
my standard thing is like when I make a
piece of software is open source it. And
so I was open sourcing all the things I
was you know years building just like
this underlying infrastructure to be
able to just run code in the browser
>> and then it went viral uh went viral in
hacker news and it coincided with the
MOO era. So massively online courses
Udacity was coming online Corsera and
and most famously Code Academy. Right.
So, Code Academy uh was the first kind
of website that allowed you to code in
the browser interactively and learn how
to code. And they built a lot of it on
my software that I was open sourcing all
the way from Jordan. And so, I remember
seeing them on Hacker News and they were
going super viral. I was like, "Hey,
that's, you know, I I recognize this.
What are you using?" And so, I left the
Hacker News comments. I was like, "Oh,
you're using my my open source package."
And so, they reached out to me. They uh
they're like, "Hey, would like to hire
you." I was like, "I'm not interested. I
I want to start a startup. I want to
start this thing called Replet." and and
they're like, "Well, no, you know, you
should come work with us. We can we can
do the same stuff." And I kept saying,
"No." I was like, "Okay, I'll contract
with you." They were paying me $12 an
hour. I was really excited about it.
Back from Oman.
>> Um, but they came out to their to their
credit. They came out of Jordan to
recruit me and spend a few a few days
there. And then I, you know, I kept
saying no. And in the end, they gave me
an offer I can't refuse. Um, and they
got me an O1 visa. Came to the United
States.
>> That's when you moved. So when when was
the first cuz you were born what year to
>> 1987
>> 87. What was the first year that you
could remember where you had the idea
that you might not live your life in
Jordan and that you might you might
actually move to the US?
>> Uh when I watched Pirates of Silicon
Valley.
>> Is that right? Okay. Got it. All right.
>> Uh maybe
98 or 99. I don't know when it came out.
>> Okay. That might be a good place to
>> Yeah.
>> Is it worth telling the hacker story
because there's a version of the world
where you didn't actually like if that
changed maybe you wouldn't have gone to
America.
>> Right. Right. Yeah. So uh in in school I
was programming the whole time you so I
I just want to start businesses. I just
like I'm exploding with ideas all the
time. And like the reason Replet exists
is because I have ideas all the time. I
just want to go type on the computer and
like build them. Um so I wasn't going to
school. It was like incredibly boring
for me. Uh and part of the reason why
Replet has a mobile app today is because
I always wanted to program under the
desk just to do things.
>> Um and so the at school they kept
failing me uh for attendance. you know,
so I would get A's, but I just didn't
show up and so they they would fail me.
And so I felt it was incredibly unfair.
And all my friends were graduating now.
This year was like 2011. I've been like
for 6 years in college. It should be
like a three or four year. And I was
like incredibly depressed. I really
wanted to be in Silicon Valley. And so I
was like, "Oh, what if I changed my
grades?"
>> There we go.
>> The university database. And um and and
so I went into my parents uh uh basement
uh and uh implemented uh the polyphasic
sleep. Are you familiar with that?
>> I I I I am
>> uh Leonardo da Vinci's uh polyphysic
sleep. I didn't hear from Ronaldo da
Vinci. I heard it from Seinfeld cuz uh
there's an episode where John Kamemer
goes on on
>> poly sleep what 20 minutes every four
hours. 20 minutes every 24 hours. And
yes, and this this somehow is going to
work well. And it
>> Yeah. And and hacking, if you've ever
done anything,
>> as the meme goes, this this has never
worked for anybody else, but it might
but it might work for me.
>> Yes.
>> And a lot of what hacking is is that
you're you're coming up with ideas for
like finding certain security holes and
like writing a script and then running
that script and that script will take
like a 20 30 minutes to run and so
you'll take that, you know, 20 30
minutes to sleep and go on. So I spent
two weeks just going mad like trying to
hack into the university database and uh
finally I found um a way I found a SQL
injection somewhere on the site uh and I
found a way to like be able to edit the
the records but I didn't want to risk
it. So I went to my neighbor who's going
to the same school. Uh I think till this
day no one caught him. But I went to him
and I said um hey uh I have this way to
change grades like would you want to be
my guinea pig? And I was honest about
it. I was like I'm not going to do it.
are you open to doing?
He's like, "Yeah, yeah, yeah." They call
his human trials.
This is how medicine works.
So, so we we went and and and uh we went
and changed his grades and he he went
and pulled his transcript and the you
know, the update wasn't wasn't there and
went back to the basement. Well, turned
out that I had access to the uh slave
database. I didn't have access to
ambassador database.
>> So, find a way through the network
privilege escalation. It was an Oracle
database that had a vulnerability and
then found the real database and then I
just, you know, did it for myself. Uh,
changed the grades and went and pulled
my scrcripts and sure enough it actually
changed. Went and bought the the the
gown, went to the graduation parties,
uh, did all that. We're graduating. Um,
and then one day I'm at home. It's like
maybe 6:00 or 7:00 p.m. I get a you know
the the telephone at home rings
ominous ominous ring
>> Santa
um hello and he's like hey this is the
university registration system and I
knew the guy that run it. Uh he's like
look you know we we we're having this
problem. The system's been down all day
and it keeps coming back to your record.
there's an anomaly in your record where
you're both pass you have a passing
grade but you're also banned from that
uh final exam of subject I was like oh
[ __ ] well turns out the database is not
normalized so typically that when they
ban you from an exam the the grades
resets to 35 out of 100 but apparently
there's a boolean flag and by the way
all the column names in the database are
single single letters that was the
hardest thing is security by obscurity
>> right
>> and turns out there's a flag that I
didn't track so when when when when you
go over attendance um uh when you don't
attend and they they they want to fail
you, they they ban you from the final
exam. So, I changed the grades and that
that that created uh an issue and
brought down the system. So, they were
calling me and I thought at the time I
was like, you know, I could I could
potentially lie and I'll it'll be a huge
issue or I just like I'll just I'll just
fess up. Yeah. So, I said, hey, listen,
look, um yeah, I might know something
about it. Hey, let me let me come uh
tomorrow and kind of talk to you about
what happened. So, I go in and I open
the door and it's the deans of all the
all the schools. It's a computer science
computer. They were all working on it
for like days because it's like it's
like it's a very computerheavy, you
know, university and it was like a
problem
>> and they're all kind of really intrigued
about what happened. And so I pull up a
whiteboard and started explaining what I
did and and everyone was engaged. I gave
them a lecture basically. your oral exam
for your PhD.
This is great.
>> They were they were they were really
excited and uh and I think I it was
endearing to them. I was like, "Oh, wow.
This is a this is a very interesting
problem."
>> Um and then I was like, "Okay, great.
Thank you." And I was like, "Hey, wait,
>> wait. We don't know what to do with you.
Do we send you to jail? Do we
>> And uh I was like, "Hey, we have to
escalate to the university um uh
president." and and he he was a great
man and I think uh he gave me a second
chance in life and I went to him and I
uh you know I I explained the situation
I said like I'm really frustrated. I
need to graduate. I need to get on with
my life. I've been here for six years
and I just can't sit in in in school the
stuff I already know. I'm a really good
programmer. Uh and and he gave me a
Spider-Man line at the time. was like
with great power comes great
responsibility and you have a great
power and you know and it really
affected me and I think he he was right
at the moment and and so he said well
we're going to let you go but you're
going to have to help the system
administrators secure the system
>> uh for the summer I was like happy to do
it and I show up and all the programmers
there hate me hate my gut%
>> and uh they they would lock me out like
I would see them they would be outside I
would knock on the door and nobody would
listen it's like they don't want to let
me in I try to help them a little bit
they theyen't and collaborative and so I
was like all right whatever. Uh and so
it came time for me to actually
graduate. It was the final project and
one of the computer science dean came to
me and he said look I I need to call a
favor. I was a big part of the reason we
kind of let you go and we didn't kind of
prosecute you. Uh so I want you to work
with me on the um on the final project
and it's going to be around security and
hacking. I was like no I'm I'm done with
that [ __ ] like I just want to I just
want to build programming environments
and things like that. Uh and he's like
no you have to do it. I was like okay.
So I I thought I' do something more
productive. So I wrote a security
scanner uh that was very proud of that
that kind of crawls the different side
that tries to do SQL injection and all
sorts of things. Um and actually my
security scanner found another
vulnerability in the system.
>> Amazing.
>> And so I went to the defense and he's
like you need to run this security
scanner live and show that there's a
vulnerability. And I didn't understand
what was going on at the time, but I
just okay. So I gave the presentation
about how the system works and I was
like, oh, let's run it. And it showed
that there's security vulnerability.
Okay, let's get let's try to get a
shell. So the system automatically runs
all the security stuff and it gets you
gets you a shell. And then the other
dean that turned out he was giving the
mandate to secure the system. And now I
started to realize I'm a pawn in some
kind of rivalry here. and and his his
face turned red and he's like, "No, it's
impossible. You know, we secured the
system. You're lying." I was like, "You
know, you're accusing me of lying." All
right. What should we know? Should we
know your um uh your salary or your
password? What do you want me to look
up? And I was like, "Yeah, I look up my
password." So, I I look up his his
password. Uh and it was like gibberish.
It was encrypted. And I was like, "Oh,
that's not my password. See, you're
lying." I was like, "Well, there's a
decrypt function that the programmers
put in there." So I I do decrypt and it
shows his password. It was something
embarrassing. I forgot I forgot what it
was. And so he gets up really angry,
shakes my hand and leaves to change his
password. Uh so that I I was able to
hack into the university another time.
Luckily I I was able to graduate, gave
them the software, they secured the
system. But um but yeah, later on I
would realize that yeah, he wanted to
embarrass the other guy, which was why I
was in the middle.
>> Politics. Well, I think the the moral
the moral of the story is if if you can
successfully hack into your school
system and change your grade, you
deserve the grade and you deserve to
graduate.
>> I I think so.
>> And and and just for any for any parents
out there just children out there, you
can just you can see you can site you
can sight site me as the moral you can
site you can set out to me as the moral
moral authority moral authority on this.
One maybe lesson I think that is very
relevant for the AI age. Uh I think that
the traditional sort of more conformous
path is paying less and less dividends
and I think uh you know kids coming up
today should use all the tools available
to be able to discover and chart their
own paths cuz I feel like just you know
listening to the traditional advice and
doing the same things that people have
always done is just not as it's not
working out as much as as we'd like.
>> Thanks for coming on the podcast. Thank
you, man. Fantastic.
Wow.
Wow. Wow.
Loading video analysis...