LongCut logo

Marc Andreessen & Amjad Masad on “Good Enough” AI, AGI, and the End of Coding

By a16z

Summary

## Key takeaways - **English is the New Programming Language**: The ultimate goal is to program in natural language, abstracting away syntax entirely. This mirrors historical advancements like Grace Hopper's compiler, moving from machine code to English commands for broader accessibility. [03:50], [04:52] - **AI Agents: The New Programmers**: AI agents are becoming the primary programmers, capable of complex tasks like setting up databases and writing tests. This shift means the user's role is evolving from direct coding to directing these agents. [09:52], [10:09] - **Long-Horizon Reasoning Breakthrough**: Recent advancements, particularly through reinforcement learning, have enabled AI agents to maintain coherence and perform complex tasks for extended periods, overcoming previous limitations of short, error-prone reasoning chains. [13:06], [18:45] - **Verifiable Domains Accelerate AI Progress**: AI development is progressing fastest in domains with clear, verifiable answers like math and code. Softer domains like law and healthcare, lacking deterministic outcomes, see slower progress due to difficulties in verification. [26:00], [30:15] - **"Good Enough" AI Risks Stalling AGI**: The current success of AI in economically productive tasks creates a 'local maximum trap.' This 'good enough' AI may reduce the pressure to pursue true AGI, which requires more generalized, cross-domain learning. [51:15], [51:44]

Topics Covered

  • English is the new programming language for everyone.
  • Verification loops unlock long-horizon AI agent reasoning.
  • AI progresses fastest in verifiable, concrete domains.
  • Why does AI's 'magic' still feel disappointing?
  • Is economic value a 'local maximum trap' for AGI?

Full Transcript

We're dealing with magic here that we I

think probably all would have thought

was impossible 5 years ago or certainly

10 years ago. This is the most amazing

technology ever and it's moving really

fast and yet we're still like really

disappointed. Like it's not moving fast

enough and like it's like maybe right on

the verge of stalling out. We should

both be like hyper excited but also on

the verge of like slitting our wrists

cuz like you know the gravy train is

coming to an end,

>> right?

>> It is faster but it's not at computer

speed, right? What we expect computer

speed to be. It's sort of like watching

a person work.

>> It's like watching John Carmarmac

>> the world. Okay. the world's the world's

best programmer on a stimulus.

>> On a stimulant. Yeah, that's right.

>> So, let's start with um let's assume

that I'm a sort of a novice programmer.

So, maybe I'm a student um uh or maybe

I'm just somebody, you know, I took a

few coding classes and I've hacked

around a little bit or like I don't

know, I do Excel macros or something

like that, but I'm like not less. I'm

not like a master craftsman at coding.

Um and you know people somebody tells me

about replet and and specifically AI um

AI and Replet like what's my what's my

experience uh when when I launch in with

with what replet is today with AI.

>> Yeah I I would um I I think the

experience of someone with no coding

experience or some coding experience is

largely the same when you go into

replet. Okay.

>> The first thing we try to do is get all

the nonsense away from like setting up

development environment and all of that

stuff and just have you focus on your

idea. So what do you want to build? Do

you want to build a product? Do you want

to solve a problem? Do you want to do a

data vis visualization? So the prompt

box is really open for you. You can put

in anything there. So let's say you want

to, you know, build a startup. You have

an idea for a startup. I would I would

start with like a paragraph long kind of

description of what I want to build. Uh

the agents will read that. It will

>> you just type just type

>> standard English. Standard English. You

just type it in. I want to build a I

want to sell I want to sell crepes. I

want to sell crepes online. So you just

like type in I want to talk.

>> You can it literally could be that four

words or five words. Okay.

>> Or it could be if you're if you have a

programming language you prefer or stack

you prefer, you could do that. But we

actually prefer not for you not to do

that because we're going to pick the

best thing for we're going to classify

the best stack for that request. Right?

>> It's a if it's a data app, we'll pick

Python and stream whatever. If it's like

a web app, we'll pick JavaScript and

Postgress and things like that. So you

just type that

>> or you can decide you can decide you can

say and I want to do it I know Python or

I'm learning Python in school and I want

to do it in Python.

>> That's right. The cool thing about

Replet is you know we we've been around

for almost 10 years now and we built all

this infrastructure. Replet runs any

programming language. So if you're

comfortable with Python you can go in

and do that for sure.

>> Okay.

>> And then just again I know this is

obvious people have used it but like I'm

dealing in English.

>> Yes.

>> So okay go ahead.

>> Yes. You're fully in English. I mean,

you know, just a, you know, a little bit

of of sort of background here, like when

um when I I came here and pitched to you

like 10 years ago or like whatever 7

years ago,

>> right?

>> Uh what we were saying is we were

exactly describing this future is that

>> uh everyone would want to build

software right?

>> And the thing that's kind of getting in

in people's ways is all the uh what Fred

Brooks called the accidental complexity

of programming, right? They're like

essential complexity which is like how

do I bring my startup to market and how

do I build a business and all of that

accidental complexity is what package

manager do I use all of that stuff we've

been abstracting away that for so many

years so you can just um and the last

thing we had to abstract away is code

>> right

>> I had this realization last year which

is I think we you know built an amazing

platform but the business is not

performing and the reason the business

is not performing is that code is the

bottleneck like yes all the other stuff

is important to solve but syntax is

still an issue like you know syntax is

just an unnatural thing for people so

ultimately English is the programming

language

>> right

>> I I just does it work with other other

world languages other than English at

this point

>> yes you can write in Japanese and we

have a lot of users especially Japanese

that tends to be very

>> so does it support these days like for a

does a support every language or is it

still do you still have to do like

custom work to craft a new new language

>> no most most you know uh mainstream

dream language that has like 100 million

plus people that speak it. AI is pretty

good at it.

>> Okay. Yeah.

>> Yeah. Wow.

>> So, uh I I I did a bit of a bit of

historical research recently for for

some reason. I just want to just

understand the moment we're in and

because it's such a special moment. It's

I think it's important to contextualize

it and I I I read this quote from

Gracehopper. So, Gracehopper invented

the compiler as you know. uh at the time

people were uh you know programming in

machine code and that's what programmers

do that's what the specialists do

>> yes

>> and she said you know specialists will

always be a specialist they have to

learn the underlying machinery of

computers but I want to get to a world

where people are programming English

that's what she said that's before

karpathy right that's like you know 75

years ago

>> uh and and and that's why I invented the

compiler and in her mind like C

programming is English

>> right

>> uh But that, you know, that really

didn't uh that was just the start of it.

You had C and then you go higher level

Python and JavaScript. And I think it

we're at a moment where it's the next

step,

>> right?

>> Instead of typing syntax, you're

actually typing thoughts, you know,

which is what we ultimately want.

>> And the machine writes the code

>> and the machine writes the code,

>> right? Right.

>> Um yeah, I remember it. you're you're

probably not old enough uh to remember

but I I remember when when I was a kid

it was um you know there there were were

higher level languages you know by the

70s like like basic and so forth and

forran

>> and C and C but um uh there were still

you know you still would run into people

who were doing assembly programming

assembly language which by the way you

still do you know like game companies or

whatever still do assembly to to to get

>> and they were hating on the kids that

were doing basic. Oh well so so the

assembly people were hating the kids

doing basic but there were also older

coders who hated on the assembly

programmers for doing assembly and not

and not and not no no no not doing

direct machine code right not doing

direct zero in one machine code because

because as because assembly assembly so

people don't know assembly language is

sort of this very low-level programming

language that sort of compiles to actual

actual machine code and if and if it's

it's it's incomprehensible gibberish to

most program even most programmers

>> you're writing an octal or something

>> you're writing like very very close to

the hardware but even still it's still a

language that compiles to zeros and ones

>> um whereas the actual real programmers

actually wrote in zeros and ones. And so

there there's always there's always this

tendency, you know, for the for the for

the pros to be, you know, look on the

nose.

>> Yeah.

>> And say, you know, the new people are

being are being, you know, basically

sloppy. They don't understand what's

happening. You know, they don't really

understand the machine. And then, of

course, you know, with with the higher

level with the higher level abstractions

do is they make they democratize. The

absolute irony is I was part of the

JavaScript revolution. I was at Facebook

uh before starting repled and we built

the modern JavaScript stack. We built

ReactJS and all the tooling around it

>> and we got a lot of hate from from the

programmers that you should type you

know vanilla JavaScript directly and

>> um I was like okay whatever and then

that you and now that's mainstream and

then those guys that built their careers

on the last wave we invented are hating

on this new wave and so just you know

people never change. Okay, got it. Okay,

so you you're typing English I want to

sell crepes online. I want to do this. I

want to have a t-shirt. Whatever the

business is. Okay. What what happens

then?

>> Yeah. and then uh uh replet agent will

show you what it understood. So it's

trying to build um a common

understanding between you and it and I

think there's a lot of things we can do

better there in terms of UI but for now

it'll show you a list of tasks.

>> It'll tell you I'm going to go set up a

database because you need to store your

data somewhere. Uh we need to set up

Shopify or Stripe because we need to

accept payments. Uh and then it shows

you this list and gives you two options

initially. Do you want to start with a

design so that we can iterate back and

forth to get lock that design down or do

you want to build a full thing?

>> Hey, if you want to build a full thing,

we'll go for 20, 30, 40 minutes.

>> Uh, and the a and be like the agent will

tell you go here, install the app.

>> Uh, I'm going to go set up the database,

do the migrations, write the SQL, you

know, build the site. I'm going to also

test it. So this is a recent innovation

we did with um agent 3 is that after it

writes the software spins up a browser

goes around and tests in the browser and

then any issue it kind of iterates kind

of goes and fix the code. So it'll spend

20 30 minutes building that I'll send

you a notification it'll tell you the

app is ready. And so you can test it on

your phone. You can go back to your

computer. You'll see maybe you'll find a

bug or an issue, you'll describe it to

the agent, say, "Hey, it's not exactly

doing what I expected." Uh or if it's

perfect and and you're ready to go and

that's it. You know, 20 minutes. By the

way, there's a lot of examples where

people just get their idea in 20 30

minutes, which is amazing. Um you just

hit publish.

>> Mhm.

>> You hit you hit publish. Um

couple clicks, you'll be up in the

cloud. we'll set up a a virtual machine

in the cloud. The database is deployed.

Everything's done and now you have a

production database.

>> So, think about the steps needed just

two or 3 years ago in order to get to

that step. You have to set up your local

development environment. You have to

sign up for an AWS account. You have to

provision the databases, the virtual

machines, you have to create the entire

pip deployment pipeline. All of that is

done for you. And it just, you know, a

kid can do it, a lay person can do it.

If you're a programmer and uh you're

curious about what the agent did, the

cool thing about replet because we have

this history of being an IDE, you can

peel the layers. You can open the file

tree and you could look at the files.

You can open gits, you can push it to

GitHub, you can connect it to your

editor if you want, you can open it in

Emacs. So the cool thing about Replet,

yes, it is a vibe coding platform that

abstracts away all the complexities, but

all the layers are there for you to look

at.

>> Right. So let's go let's go back to um

that was great, but let's go back to you

said it. It it gives you that the a the

agent gives you you you say I've got my

idea. You plug it in and it says it

gives you this list of things and then

you and then when you describe it you

said I'm going to do this. I'm going to

do that. The eye there in that case was

the agent as opposed to the user. Yes.

>> And so the the agent lists the set of

things that it's going to do and then

the agent actually does those things.

>> Agent does those things. Yeah. That

that's a that's a that's a very

important point. when we did this shift,

we hadn't realized internally at Replet

how much the actual user stopped being

the human user and it's actually the

agent programmer,

>> right?

>> So, one really uh funny thing happened

is we had servers in Asia uh and we the

reason we had servers in Asia because we

wanted our Indian or you know Japanese

users to be to have a you know shorter

uh time to the servers. uh when we

launched the agent their experience got

significantly worse and we're like what

happened like it's supposed to be faster

well turns out it's worse it's because

the AIS are sitting in uh in United

States and so the the programmer is

actually in United States it's you're

sending the request to the programmer

and the programmer is interfacing with a

machine across the world and so yes

suddenly the agent is the programmer

okay so like the the ter ter you know

new terminology agent is a software

program that is basically using the rest

of the

as if it were a as if it were a human

user, but it's not. It's a it's a bot.

>> That's right. It has access to tools

such as write a file, edit a file,

delete a file, uh uh search the package

index, install a package, uh provision a

database, provision object object

storage. It is a programmer that has the

tools and interface. It has a sort of an

interface

>> that that is very similar to a human

programmer. And then um you know we'll

talk more about how this all works but a

debate inside the AI industry um is with

these was kind of this you know this

idea now of having agents that do things

on your behalf and then go out you know

go go out and kind of accomplish

missions. Um there's this you know kind

of debate which is okay how like

obviously you know it's a big deal even

to have an AI agent that can do

relatively simple things to do complex

things of course is you know one of the

great technical challenges of the last

80 years you know to to do that and then

there's sort of this question of like

can the agent go out and run and operate

on its own for 5 minutes you know for

for 15 minutes for an hour for 8 hours

and and meaning like sort of like how

long does it maintain coherence like how

long does it actually like stay in full

control of it of its faculties and not

kind of spin out because at least the

early early agents or the the early AIS,

if if you set them off to do this, they

might be able to run for two or three

minutes, then they would they would

start to get confused and go down rabbit

holes and, you know, kind of kind of

spin out. Um, more recently, more

recently, um, uh, you know, we've seen

that that that that agents can run a lot

longer and and do more complex tasks.

Like, where are we on the curve of

agents being able to run for how long

and for what complexity tasks before

before they break?

>> That's that's absolutely the the I think

the main metric we're looking at. even

back in 2023, you know, had the idea for

software agents, you know, four or five

years ago now. The problem every time we

attempt them, the the problem of

coherence, you know, they'll they'll go

on for a minute or two and then they'll

just, you know, they compound in errors

in a way that they just can't recover.

>> Um,

>> and you can actually see it, right?

Because they actually they actually, if

you watch watch them operate, they get

increasingly confused and then, you

know, maybe even deranged. Yeah, they

vary the range and they they go into

very weird areas and sometimes they

start speaking Chinese and doing really

weird things and um but I would say

sometime around last year we maybe

crossed the 3 four five minute mark

>> and it felt to us that okay we're on a

path where long re you know long horizon

reasoning is getting solved

>> uh and so we made we made a bet and I I

tell my team

>> so long horizon reasoning meaning

reasoning meaning like dealing in like

facts and logic

>> um in a in a sort of complex way and

then long horizon being over a long

period of time. Yes.

>> With many many steps to a reasoning

process.

>> Yeah, that's right. So if you think

about the way large language models work

is that they have a context. This

context is basically the memory all the

text all your prompt and also all the

internal talk that the AI is doing as

it's reasoning. So when the AI is

reasoning it's actually talking to

itself. It's like oh now I need to go

set up a database. Well, what what kind

of tool do I have? Oh, there's a tool

here that says Postgress. Okay, let me

try using that. Okay, I use that. I got

feedback. Let me look at the feedback

and read it. And it'll read the

feedback. And so the that that prompt

box or context is where both the user

input, the environment input, and the

internal thoughts of the machine are all

within. It's sort of like a program

memory in in memory space. And so

reasoning over that was the challenge

for a long time. That's when AIs just

like went off track and now they're able

to kind of think through this entire

thing and and maintain coherence. And

there's there's now techniques around uh

compression of contacts. So there still

context length is still a problem,

right? So I would say LM today, you

know, they're marketed as a million uh

token uh length, which is like a million

words almost. uh in reality it's about

200,000 and then they start to struggle.

So we do a lot of uh you know we stop we

compress the memory. So if a memory if

if a portion of the memory is saying

that I'm getting all the logs from the

database you can summarize you know

paragraphs of logs with one statement or

the database setup that's it right and

so every once in a while we'll compress

the context so that we make sure we

maintain coherence so that there's a lot

of innovation happened outside of the

foundation models as well in order to to

enable that long context coherence. So

what was the what was the key technical

breakthrough at the in the foundation

models that made this possible do you

think?

>> I think it's RL I think it's uh

reinforcement learning. So the way

pre-training works is you know uh they

uh pre-training is a uh the first step

of training a large language model. It

reads a piece of text. It covers the

last words and tries to guess it. That's

how it's trained. That doesn't really

imply long context reasoning. it it you

know it it turns out to be very very

effective. It can learn language that

way. But the reason we weren't able to

move past that limitation is that that

modality of training just wasn't good

enough. And what you want is you want a

type of problem solving over a uh over

long context. So what reinforcement

learning uh uh especially from code

execution gave us is the ability to for

the machine to for the LLM to roll out

what we call trajectories in AI. So

trajectory is a uh stepbystep reasoning

chain in order to reach a solution. So

uh the way uh as I understand

reinforcement learning works is they put

the LM in a programming environment like

replet and say hey here's a pro here's a

codebase here's a bug in the codebase

and we want you to solve it. Um now the

human trainer already knows what the

solution would look like. So we have a

pull request that we have on GitHub so

we know exactly or we have a unit test

that we can run and verify the solution.

So what it does is it rolls out a lot of

different trajectories. Those they

sample the model and maybe one of those

trajectories will reach and a lot of

them will just go go off off track but

one of them will reach the solution by

solving the bug and it reinforces on

that. So that that gets a reward and the

model gets trained that okay you know

this is how you solve these type of

problems. So that's how we're able to

extend these reasoning chains.

>> Got it. and and how it's a two-part

question is how how how good how good

are the models now at long long long

reasoning and and I would say and how do

we know like how how is that

established? Um

there is a nonprofit called meter um

that is uh measuring

uh useful has a benchmark to measure uh

how long a model runs while maintaining

coherence and doing useful useful things

whether it's programming or other

benchmark tasks that they've done. uh

and they put up a paper I think uh late

last year that said every seven months

>> uh the the minutes that a model can run

is doubling.

>> So you go from 2 minutes to you know 4

minutes in 7 months I think they vastly

underestimated that.

>> Is that right? Vastly it's doubling.

It's doubling more often than 7 months.

>> We so Asian 3 we measure that you know

very closely uh and we measure that in

real tasks from real users. So we're not

doing benchmarking. We're actually doing

AB tests and we're looking at the data

that how users are successful or not.

>> For us, the the absolute sign of success

is you made an app and you published it.

Because when you publish it, you're

paying extra money. You're saying this

app is economically useful. I'm going to

publish it. So that's as clear-cut as

possible. And so what we're seeing is in

agent one, the agent could run for 2

minutes

>> and then and then perhaps struggle.

Agent two came out in February, it ran

for 20 minutes. Agent 3 200 minutes.

>> Okay,

>> 200. Some users are pushing it to like

12 hours and things like that. I'm less

confident that it is as good and when it

goes to these stratospheres, but at like

2 3 hours timeline, it is really it's

it's it's it's insanely good. And and

the main innovation outside of the

models is a verification loop. Actually,

uh I remember reading um a research

paper from Nvidia. So what Nvidia did is

they're trying to uh write um GPU

kernels uh using deepseek and that was

like perhaps 7 months ago when deepseek

came out and what they found is that if

we add a verifier in the loop if we can

run the kernel and verify it's working

we're able to run deepseeek for like 20

minutes and it it was generating

actually optimized kernels

>> and so I was like okay the next thing

for us obviously as an as a sort of a

agent lab or like applay our company.

We're not doing the foundation model

stuff, but we're doing a lot of research

on top of that. And so, okay, we know

that agents can run for 10 20 minutes

now or LLMs can stay coherent for

longer, but for you to push them to 200,

300 minutes, you need a verifier in the

loop. So, that's why we spend all our

time uh creating scaffolds to make it so

that the agent can spin up a browser and

do computer use style testing. So once

you put that in the middle, what's

happening is it works for 20 minutes uh

spin up another agent uh spins up a a

browser tests the work of the previous

agent. So it's a multi- aent system

>> and if it is uh if it founds a bug it

starts a new trajectory and says okay

good work let's summarize what you did

the last 20 minutes

>> now that be that plus what the bug that

we found that's a prompt for a new

trajectory right

>> so you stack those on each other and you

can go endlessly but

>> so it's like a mar like setting up a

marathon or like a relay race

>> as long as as long as each step is done

properly you could do in sort of an

infinite number of steps

>> that's right that's right you can always

compress the previous step into a

paragraph And that becomes a prompt. So

it's it's an agent prompting the next

agent.

>> Right. Right. Right. That's amazing. So

um and then when when an agent like when

a modern agent like running on modern

modern LM that are trained this way when

it let's say it runs for 200 minutes

like when you watch the agent run is it

like running is it like processing

through like logic and tasks at the same

pace that like a human being is or

slower or faster? I

>> it's actually I would say it is faster

but not that much significantly faster.

It's not at computer speed, right? What

we expect computer speed to be.

>> It's like watching a per like if you

watch if you if it's describing what

it's doing, it's sort of like watching a

person work.

>> It's like watching John Carmarmacine

work.

>> The world Okay. The world's

the world's best programmer.

>> Yeah.

>> The world's best programmer on a stim on

a stimulant.

>> On a stimulant. Yeah, that's right.

>> Working for you. Working for you.

>> Yeah. There. So, it's very fast and you

can see the uh file diffs running

through, but every once in a while it'll

stop and it'll start thinking. I'll show

you the reasoning. Yeah. It's like, I

did this and I did this. Am I on the

right track? It kind of really tries to

reflect right?

>> Uh and then it might review its work and

decide the next step or it might kick

into the testing agent or you know, so

so you're seeing it do all of that and

every once in a while it calls the tool

for example, it stops and says, well, we

ran into an issue. you know, Postgress

um 15 is not um compatible with this,

you know, database ORM package that that

I have.

>> Um okay, this is a problem I haven't

seen before. I'm going to go search the

web. So, it has a web search tool. Go do

that. And so, it looks like a human

programmer right?

>> And it's really fascinating to watch.

It's one of my favorite things to do is

just to watch the tool chain and

reasoning chain and the testing chain.

And it's yeah it is like watching a

hyperproductive programmer

>> right so you know we're kind of getting

into here kind of the holy grail of AI

which is sort of you know generalized

reasoning um you know by the machine um

so uh you mentioned this a couple times

but this idea of a of a verification so

so just for folks on the listening to

podcast who maybe aren't in the details

let me try to describe this and see see

if I have it right so like a just a just

a large language model the way you would

exper you would have experienced with

like Chad GPT out of the gate two years

ago or whatever would have been it's

like And it's incredible how fluid uh it

is at language. Um it's incredible how

good it is at like writing Shakespearean

sonnetss or rap lyrics. It's it's

amazing how good it is at human

conversation. But if you start to ask it

like problems that involve like rational

thinking uh or problem solving all of a

sudden like you math or the math the

whole show and and in the very beginning

it was you could ask if you ask it very

basic math problems, you know, it would

it would not be able to do them.

>> That's right. Uh but then even when it

got better at those, if you started to

ask it to like, you know, it it could

maybe add two small numbers together,

but it couldn't add two large numbers

together. Or if it could add two large

numbers, it couldn't multiply them.

Yeah.

>> And it's just like, all right, this is

And then it had this there was this

famous the famous was the straw the

strawberry test, the famous strawberry

test, which is how many Rs are in the

word strawberry.

>> That's right.

>> And there was this long period where it

it kept it would it would just guess

wrong. It would say there were only two

Rs in the word strawberry. And it turns

out there are three. Um, so, um, so it

it was this thing and so people were and

there was even this term that was being

used kind of the the slur that was being

used at the time was stoastic parrot.

>> Yeah,

>> I was thinking clanker.

>> Well, well, clanker is the is the new

slur. Clank clanker. Clanker is just the

full-on racial slur against AI as a

species. Um, but the technical critique

was so-called stoastic par stoastic

means random. Uh so sort of random

parrot me meaning basically that this

thing was sort of the large language

models were like a they were like a

mirage where they were like repeating

back to you things that they thought

that you wanted to hear but they didn't

>> in a way it's true in the in the pure

pre-training LLM world

>> right for the for the very basic layer

but then what happened is as you said

over the last year or something there

there was this layering in of of

reinforcement learning and then but the

key to

>> it's not new crucially it's like it's

alpha go right so

>> describe so describe that for a second.

Yeah. So we we had this breakthrough

before in uh 2015 was the Alph Go

breakthrough. I think 2015 2016 where it

is emerging of sort of uh you know the

the the you would know a lot better than

me the old AI debate between the

connectionists uh the the people who who

thinks neuronet networks are the true

sort of way of doing AI and the symbolic

systems I think or like the people that

think that you know discrete reasonings

fates and knowledge bases whatever this

is the way to go and so there was there

was a merging of these two worlds where

the way AlphaG go worked is it had a

neural network but it had a Monte Carlo

research algorithm on top of that. So

the neural network would generate uh

would would like uh generate a list of

potential moves uh and then you had a

more discrete algorithm sort those moves

and find the best based on just uh tree

search based on just trying to verify

again this sort of a verifier in the

loop trying to verify which move might

yield the best based on more classical

way of doing algorithms. Um, and so that

that's a resurgence of of that movement

where we have this amazing generative uh

neural network that is the the LLM and

now let's layer on more discrete ways of

trying to verify whether it's doing the

right thing or not and let's put that in

a training loop and once you do that the

LLM will start gaining new capabilities

such as uh reasoning over math and code

and things like that.

>> Exactly. Right. Okay. And then that's

great. And then and then the the key

thing there though for for RL to work

for LMS to reason the key is that it be

a a problem statement that there is a

defined and verifiable answer. That's

right. Is that right? And so and and and

you might think about this as like let's

give a bunch of examples like in

medicine this might be like um you know

a diagnosis that like a panel of human

doctors agrees with um or or or by the

way or a diagnosis that actually you

know solves the condition. Um in law

this would be a um you know a a argument

that in front of a jury actually results

in an acquitt or or something like that.

Um in u math it's an equation that

actually solves properly. Uh in physics

it's a result that actually works in the

real world.

>> I don't know in civil engineering it's a

bridge that doesn't collapse. Right. So

so so there there there's always some

some test is that the first two do not

work very well just yet. like the the

like I would say uh law and healthcare

they're still a little too squishy a

little too soft it's unlike math or code

like the way that they're training on

math they're using this uh sort of like

a program language uh provable language

called lean for proofs right so you can

run a lean statement you can run a

computer code uh perhaps you can run a

physics simulation or civil engineering

uh sort of physics simulation but you

can't run a diagnosis

>> okay So I would say the

>> but you could verify it with human

answers or or not.

>> Yeah. So that that's a more RL HF in a

way. So it is not the like sort of

autonomous RL train like fully scalable

autonomous which is why coding is moving

faster than any other domain is because

we can we we can generate these problems

and verify them on the fly. But there's

two but with coding as anybody who's

coded knows there's coding there's two

tests which is one is does the code

compile

>> right

>> and then the other is does it produce

the right output and just because it

compiles doesn't mean it produces the

right output and I you tell me but

verifying that it's the correct output

is harder

>> yeah sobbench

is a collection of uh verified pull

request end state uh so so it is it is

not just about compiling we so they they

group of scientists sobbench is the main

benchmark used to test whether AI is

good at software engineering tasks and

we're almost saturating that. So last

year we're at like maybe 5% early 24 or

less and now we're like 82% or something

like that with cloth on at 4.5 that's

state-of-the-art and that's like a

really nice health climb that's

happening right now. uh and basically

they went and looked on GitHub. They

found the the you know most complex

repositories. They found bug statements

that are very clear uh and they found

ProQuest that actually solve those bug

statements with unit tests and

everything. So there is an existing

corpus on GitHub of tasks that that the

AIS can can solve and you can also

generate them. Those are not too hard to

to generate uh you know what's called

synthetic uh data. Uh uh but but you're

right it's not infinitely scalable um

because you you some human verifiers

still need to kind of look at the at the

task but maybe the foundation models

have found a way to have the synthetic

training go all the way

>> right and then what's happening I think

I think because what's happening is the

foundation model companies are in some

cases they are hire they're actually

hiring human experts to generate new

training data. Yes.

>> So they're actually hiring

mathematicians and physicists and coders

to basically sit and you know they're

they're hiring they're they're hiring

human programmers putting them on the

cocaine. Yes.

>> Um and having them probably coffee um uh

and having them actually write code and

then and then write code in a way where

there's a known result of the code

running such that the this RO loop can

be trained properly. That's right. And

then the other the other and then the

other thing these companies are doing is

as you said they're building systems

where the software itself generates the

training data, generates the tests,

generates the valid the validated

results and and that's soal synthetic

training data.

>> That's right. And but yeah, but but

again those work in the very hard

domains. It works to some extent in the

software domains

>> and I think there's some transfer

learning we can you can see the

reasoning work when it comes to you know

tools like deep research and things like

that but we're not making as fast as

progress in the in the more soft domain.

>> So so say softer domains meaning like

domains in which it's harder harder or

even impossible to actually verify

correctness of of result in a sort of a

deterministic factual grounded

>> non-controversial way. Like if you have

a a chronic disease, you could you could

have you know you have a POTS or uh you

know whatever EDS syndrome or and

they're all they're all clusters and

it's because it it is the domain of

abstraction. It is not as concrete as

code and math and things like that. So I

think there's still a long ways to go

there.

>> Right. So sort of the more concrete the

problem like it's the concretness of the

problem that is the key variable not the

difficulty of the problem. Would that be

a way to think about it?

>> Yeah. Yeah. I think the the uh

concreteness in a sense of can you get a

true or false ver verifiable

>> right but like in any domain in any

domain of human effort in which there's

a verifiable answer we should expect

extremely rapid progress.

>> Yes.

>> Right.

>> Yes. Absolutely. And I I think that's

what we're saying.

>> Right. And that and that for sure

includes math. That for sure includes

physics for sure includes chemistry. For

sure includes

>> large areas of code.

>> That's right.

>> Right. What what else does that include

do you think?

>> Bio like we're seeing with a protein

>> genomic. Yeah. Okay. Right.

>> Yeah. Yeah. Things like that. I think

some some areas of robotics, right? Um

there's a clear outcome, right?

>> Uh but but it's not that many. I mean,

surprisingly,

>> well, it depends.

>> Yeah, depends on your point of view.

That's some people might say that's a

lot. Um so, uh and then um you you

mentioned that we you mentioned the pace

of improvement. So, what would you

expect from the pace of improvement

going forward for this?

>> I I think we're we're ripping on coding.

Like I think I think it's just it's

going like I think it's going to be like

what we're working on with with agent

floor right now um is by by next year we

think you're going to be sitting instead

of rep in front of replet and you're

shooting off multiple agents at a time.

You're like planning a new feature. Um

so I I want you know social network on

top of my storefront and another one is

like hey um refactor the database. Hey,

in and you're running parallel agents.

So, you have five 10 agents kind of

working in the background and they're

merging the code and taking care of all

of that, but you also have a really nice

interface on top of that that you're

doing design and you're interacting with

AI in a more creative way. Uh maybe

using visuals and charts and things like

that. So, there's a multimodal angle of

that of that interaction. So I think you

know creating software is going to be

such an exciting

uh area and and and I think that the lay

person will be as good as a what a

senior software engineer that works at

Google uh is today. So I think I think

that's happening very soon. Um but but

you know I don't see them and be curious

about your point of view but like my

experience between as as a sort of a you

know on the let's say healthcare side or

more you know write me an essay side or

more creative side haven't seen as much

of a rapid improvement as what we're

seeing in code. So so I think I think

code is going to go to the moon. Math is

probably as well some some you know

scientific domains bio things like that

those are are going to move really fast.

>> Yeah. So there's this there's this

there's this weird dynamic see if you

agree with this and Eric also curious

your point of view on this like there's

this weird dynamic that we have and we

have this in the office here a lot and I

also have this with like the leading of

entrepreneurs a lot which is this thing

of like

>> like wow this is the most amazing

technology ever and it's moving really

fast and yet we're still like really

disappointed um and like it's not moving

fast enough and like it's like maybe

right on the verge of stalling out

>> and like you know we should both be like

hyper excited but also on the verge of

like slitting our wrists because like

you know the gravy train is coming to an

end,

>> right? And and I always wonder it's like

you know on the one hand it's like okay

like you know not all I don't know

ladders go to the moon like just because

something you know looks like it works

or you know doesn't mean it's going to

you know be able to you're going to be

able to scale it up and have it work you

know to the fullest extent. Um uh you

know so like it's important to like

recognize practical limits and not just

extrapolate everything to infinity. Um

on the other hand like you know we're

dealing with magic here that we I think

probably all would have thought was

impossible 5 years ago or certainly 10

years ago.

>> Like I I didn't you know look I I you

know I got my CS degree in the late '

80s early 90s. I I never I didn't think

I would live to see any of this, right?

Like this is just amazing that this is

actually happening in in in my lifetime.

>> Um

>> but but there's a huge bet on AGI,

right? like whether it's the foundation

models uh I think you know now the

entire US economy is sort of a bet on

AGI and and there are crucial questions

to ask whether are we on track to AGI or

not because there are some ways that I

can tell you it doesn't seem like we're

on track to AGI because we uh because

there doesn't seem to be transfer

learning across these domains that are

that are you know significance right so

if we get a lot better at code

we're not immediately getting better at

like generalized reasoning we need to go

also you know get training data and

create RL environment for bio or

chemistry or physics or math or law or

so so and this this has been the sort of

point of discussion now in the AI

community after the Darkish and Richard

Sutton uh interview where uh you know

Richard Sutton kind of poured this cold

water on the on the bitter lesson. So

everyone was using this uh essay that he

wrote called the bitter lesson. The idea

is that there are um infinitely scalable

ways of uh doing uh uh AI research and

and and and anytime you can pour more

compute and more data and go more

performance out you're just you know

that's the ultimate way of getting to

AGI and some people you know interpreted

that interview that perhaps he's

doubtful that even we're even on a on a

bitter uh lessened path here and perhaps

the current training regime is actually

very much the opposite in which we we

are so dependent on human data and human

annotation and and all of that stuff. So

I think the I I agree with you. I mean

as a company we're we're excited about

where things are headed but but there's

there's a question of like are we on

track to AGI or not and be curious what

you think. So, so and you know Ilia I

think you know Ilioskcover makes a makes

a specific form of this argument which

is basically like we're just literally

running out of training data. It's a

fossil fuel argument right like if we

slurped all the training fundamentally

we've slurped all the data off the

internet that is where almost all the

data is at this point. There's a little

bit more data that's in like you know

private dark pool somewhere that we're

going to go get but like

>> we have it all and then right we're

we're in this business now trying to

generate new data but generating new

data is hard and expensive you know

compared to just like slurping things

off the internet. So

>> there are these arguments. Um you know

having said that you know you get into

definitional questions here really quick

which are kind of a rabbit hole but

having said that like you mentioned

transfer learning. So transfer learning

is the ability of the machine to right

to be an expert in one domain and then

and then generalize that into another

domain.

>> My answer to that is like have you met

people?

>> Um and how many people do you know are

able to do transfer learning?

>> Not many. Right. Well because there's

>> quite the opposite actually. The nerdier

they are in a certain domain the kind of

you know often they have blind spots. We

joke about how everyone's just [ __ ]

in one area or they make some like

massive mistake and and like don't trust

them on this but on this other topic you

know

>> right? Yeah. Well and this is a

well-known thing among like for example

public intellect. So this happens

there's actually been whole books

written about this on so-called public

intellectuals. So you get these people

who show up on TV and they're experts

and what happens is they're like an

expert in economics right and then they

show up on TV and they talk about

politics and they don't know anything

about politics right or they don't know

anything about like medicine or they

don't know anything about the law or

they don't know anything about

computers. You know, this is the Paul

Gregman talking about how the internet's

going to be no more significant than the

fax machine.

>> Facts.

>> He's a brilliant economist. He has no

idea what a how a computer works.

>> Is he a brilliant economist?

>> Well, at one at one point at one point

at one point, let's get even if even if

he's a brilliant Well, this is the thing

like what does that mean? Like should a

brilliant economist be able to

extrapolate, you know, the internet is

is a good question. But um but the point

being like even if he is a you know,

take any take anybody Oh, by the way, or

like Ein like Einstein's like actually

my favorite example. I think you'd agree

Einstein was a brilliant physicist.

>> He was like a he was he was a Stalinist.

Like he was just he was Yeah. He was a

socialist and he was a Stalinist and he

was like he thought like Stalin was

fantastic.

>> Out still.

>> Yeah. Okay. All right.

>> True socialism.

>> All right. All right. Einstein, you

know, I'll I'll

I'll take your word for it. But like

once he got into politics, he was just

like totally loopy or or you know, even

right or wrong. It's just he just

sounded like all of a sudden like an

undergraduate lunatic, like somebody in

a dorm room. Like he there was no

transfer learning from physics into

politics. like he he didn't listen right

or wrong he didn't there was no there

was clearly there was nothing new in his

political analysis it was the same wrote

routine [ __ ] you get out of

>> you know yeah so so in a way the

argument you're making is like we maybe

already a human level AI I mean perhaps

the definition of AGI is is is something

totally different is like above human

level that something that totally

generalizes across domains it's it's not

something that we've seen

>> yeah like we've ideal yeah I was saying

we we've we've and you know look we

should we should shoot big but we we've

idealized a a we've idealized a goal

um that may be idealized in a way that

like number one it's just it it's it's

like so far beyond what people can do

that it's you're no longer it's no

longer relevant comparison to people and

and usually AGI is defined as you know

able to do everything better than a

person can

>> and it's like well okay so if doing

everything better than a person can it's

like if a person can't do any transfer

learning at all

>> right doing even a little little bit a

marginal bit might might actually be

better or it might not matter just

because no no human can do it and so

therefore you just you just stack up the

domains there's also this well-known

phenomenon in AI with you know t

typically this works the other way which

there's a phenomenon AI AI engineers

always complain about and scientists

always complain about which is the

definition of AI is always the next

thing that that the machine can't do and

so like the definition for of AI for a

long time was like can it beat humans at

chess

>> and then the minute it could beat humans

at chess that was no longer AI that was

just like oh that's just like boring

>> that's computer chess it became

>> computer chess it's just like boring and

now it's an app on your iPhone and

nobody nobody and nobody cares right and

it's immediately then

>> cheering test was the test and then we

passed it and nobody

>> we blew this is a really big deal

>> there was no celebration

>> there was no parties That's exactly

right. There was no for 80 years the

Turing test I mean they made a movie

about it like the whole thing that was

the thing and like we blew right through

it and nobody even registered it. Nobody

cares. It gets no credit for it. We're

just like ah it's still you know

complete p piece of [ __ ] like

>> right and so there's this thing where so

the AI scientists are are are used to

complaining basically that they're that

they're they're being they're always

being judged against the next thing as

opposed to all the things they've

already they've already solved.

>> Um uh but but that's maybe the other

side of it which is they're also putting

out for themselves um an unreasonable

goal. an an unreasonable goal and then

doing this sort of self flagagillation

kind of along the way and and and I I

kind of wonder yeah I I wonder kind of

which way that cuts.

>> Yeah. Yeah. It's an interesting question

like I started thinking about this idea

of like it doesn't matter whether it's

truly AGI and the way I define AGI is

that you put in a AI system in any

environment and efficiently learns right

>> um you know it doesn't have to have that

much prior knowledge in order to kind of

learn something but also can transfer

that knowledge across different domains.

Um but you know we can get to like

functional AGI and what functional AGI

is is just yeah collect data on every uh

useful uh economic activity in uh in the

world today and train an LLM on top of

that or train the same foundation model

on top of that and and we we'll go we'll

target every sector economy and and you

can automate a big part of labor that

way. So I think I think yeah I think

we're on that track for sure.

>> Right. um you tweeted after GBG5 came

out that you were feeling the

diminishing returns. Yeah. What were you

expecting and but and and what needs to

be done? Do we need another breakthrough

to get back to the pace of growth or

what are your thoughts there?

>> I mean this this whole discussion is is

sort of about that and and my feeling is

that uh you know GPT5 uh got good at

verifiable domains. It didn't feel that

much better at anything else. the more

human angle of it. It felt like it

regressed and like you had this uh sort

of uh Reddit pitchfork uh sort of uh

movement against against Sam and Open AI

because they felt like they lost a

friend. Gupta felt a lot more human and

closer uh whereas GT5 felt a lot more

robotic, you know, very in its head kind

of trying to think through through

everything. And um and so I I I would

have just expected like when we went

from GP2 2 to 3, it was clear it was

getting a lot more human. It was uh a

lot closer to our experience. It can you

can feel like it's actually all it gets

me like there's something about it that

understands the world better. Similarly

3 to four to five didn't feel like it

was a better overall

being as it were. But is that is that is

that is that a is the question there

like is it emotionality? Is it partly

emotionality but but again partly like I

like to ask models like very

controversial uh things. Um can it

reason through

uh I don't know how deep we want to go

here but like um what happened with

World Trade 7,

>> right?

>> Sure.

>> It's an interesting question, right?

Like I'm not I'm not putting out a

theory, but like it's interesting like

how did it you know and and can it can

it think through controversial questions

>> in the same way that it can go think

through a coding problem and there

hasn't been any movement there like the

all the reasoning and all of that stuff

I haven't se and not just that you know

that's a cute example but like um co

right like you know the origins of co

right

>> you know go you know dig up GPT4 or

other models

and go to GPT5, you're not going to find

that much difference of okay, let's

reason together. Let's try to figure out

what was the origins of CO because it's

still an unanswered question, you know,

and I don't see them making progress in

that. I mean, you play a lot with them.

Do you feel like

>> I use it differently? I don't know,

maybe I have different expectations. Um,

I I'm I the way I my main use case

actually is sort of sort of PhD and

everything at my beck and call. Um, and

so I'm I'm trying to get it to explain

things to me more than I'm trying to

like, you know, have conversations with

it. Maybe maybe I'm just unusual with

that. But

>> and that that that gets back

>> well. So what I what I what I found

specifically is uh a combination of like

GPT5 Pro plus deep reasoning or like

Rock 4 heavy like the you know the the

highest end models um u like that um you

know they now basically generate 30 to

40 page you know essentially books on

demand on any topic. Um and so anytime I

get curious about something you just

take it maybe it's my version of it but

it's something like I don't like a good

here's a good example. um when when when

a when an advanced economy puts a tariff

on on on a raw m you know on a raw

material or on a finished good like who

pays

>> you know is it is it the consumer is it

the is it the importer is it the

exporter or is it the producer and and

this actually a very complicated it

turns out very complicated question it's

a big big big thing that economists

study a lot and it's just like okay who

you know who pays and what I found like

for that kind of thing is it's

outstanding

>> well well but but it's outstanding at um

sort of going out of the web getting

information synthesizing it

>> correct it it gives me it gives me a

synthesized 20 30 40 p basically tops

out 40 p 40 40 pages of PDF. Yeah.

>> Um uh but I can get I can get up to 40

pages of PDF but it's a completely

coherent and as far as I can tell for

everything I've cross-cheed a completely

like it like world class like if I hired

you know for a question like that if I

hired like a great you know econ

posttock at Stanford who just like went

out and did that work like it would

maybe be that good.

>> Yeah. Um but then but then of course the

significance is it's like it's like you

know at least for this is true for many

domains you know kind of PhD and

everything and so

>> but but this is synthesizing knowledge

not trying to create new knowledge.

>> Well but this this gets to the this sort

of you know of course the you get into

the angels dancing on the head of a pin

thing which is like what what you know

what's the difference how many how much

new knowledge ever actually is there

anyway? What do you actually expect from

people when you ask them questions? Um,

and so what what I'm looking for is

like, yes, explain this to me in like

the the the clearest, most

sophisticated, most complex, most like

complete way that it's possible for

somebody to, you know, for a real expert

to be able to to to explain things to

me.

>> Um, and that's what I use it for. And

again, as far as I can tell from the

crossing, like I'm getting, you know,

like almost like basically 100 out of

100, like I don't even think I've had an

issue in months where it's like had had

a problem in it.

>> And it's like, yeah, you can say, yeah,

synthesizing is supposed to create new

information, but like it's it's

generating a 40 p. He's basically

generating a 40-page book.

>> That's amazing.

>> That's like incredibly like fluid. It's,

you know, it's it's it's it's you know,

the the logical coherence of the entire

like it's it's a great writing. Like if

if you if you evaluated an a a human

author on it, you would say, "Wow,

that's a great author." You know, do are

people who write books, you know,

creating new knowledge? Well, yeah.

Well, sort of not because a lot of what

they're doing is building on everything

that came before them is synthesizing a

mind, but also like a book is a creative

accomplishment, right? And so,

>> yeah, one of the thing I'm I'm I'm I'm

interested in I'm hoping AI could help

us solve is just like how confusing the

information ecosystem right now. You

know, everything feels like propaganda.

Like it doesn't feel like you're getting

real information from anywhere. So, I I

really want an AI that could help me

reason from first principles about

what's happening in the world for me to

actually get real information. and and

maybe that's an unreasonable sort of ask

of of the AI researchers, but but I

don't think we're we have made any

progress there. So maybe I'm over focus.

Yeah, maybe I'm over in being in my my

line or maybe I'm over focused on ar

arguing at people as opposed to um

trying to get as trying to get underline

truth. But well here here's the thing I

I do a lot with this is I just say like

take take a provocative point of view um

and then steel man the position take

your co thing steel so I often I often

pair these steel man the position that

it was a lab leak um and steel man the

position that it was natural origins

>> um and and again like is this creativity

or not? I don't know. But like what

comes back is like 30 pages each of like

wow like that is like the most

compelling case in the world I can

imagine with like every you know

everything marshaled against it like the

argument structured in like the most

possible

>> part of the reason that started

happening is because it stopped being

taboo to talk about a human origin when

it was taboo

>> the the AIS would like talk will you

know talk down to like oh you're a

conspiracy theorist and so yes uh so

there's a there's a you know period of

time and so it takes something truly

controversial and they actually they

they can't reason about it because of

all RLHF and answers all the limitations

and as as you know I won't pick no

specific ones here but like there there

are certain certain big models that will

still lecture you

>> that you're a bad person for asking that

question but but you know like I just

there some of them are just like really

really open now to you know being able

to do these things

>> um and then um uh yeah so um okay uh

yeah so okay so yeah so there's this

yeah so so basically like ultimately

what you're looking for like the

ultimate thing would be if there's

something that's like I don't I think

anybody's really defined this really

well because it's not because again it's

like the conventional all the

conventional definitions of AGI are like

basically comparing to people.

>> Yeah.

>> And there there and there it's always

like you know it's the conventional

explanations of of of um of uh of AGI

always for me struck me a lot like the

debate around like whether a

self-driving car works or not which is

is does a self-driving car work because

it's a perfect driver uh or does it work

because it is a is better than the human

driver and better than the human driver

I think is actually quite you know just

like with the the chess thing and the go

thing. I actually think like that that

that's like a real thing. And then and

then and then and then there's the like

is it a perfect driver which is you know

what they're obviously the the

self-driving car companies are working

for but then I think you're looking for

something beyond the perfect driver.

You're looking for the car who like

knows where to go. So I I I I'm of two

minds, right? So one mind is the sort of

practical entrepreneur, right?

>> Uh and I just I have so many toys to

play with to build like stop AI progress

today and replet will continue to get

better for the next 5 years. like

there's so much we can do just on the

app uh app layer and the infrastructure

layer.

>> So you know I but but I think that will

you know the the foundation models will

continue to get better as well and so

it's it's a very exciting time in our

industry. Um the other mind is more

academic because as a kid I've always

been interested in the nature of

consciousness, nature of intelligence. I

was always interested in AI and reading

the literature there and I would point

to the RL uh literature. So Richard

Soden, there's another guy I think

co-founder of deep mind Shane Lag wrote

wrote a paper trying to define what AGI

is. Um and in there I think that the

definition of AI I think is the is the

original perhaps correct one which is uh

efficient continual learning.

>> Okay. Like if you if you truly want to

build an artificial general intelligence

that you can drop in any domain, you can

drop in a car without that much prior

knowledge about cars and within um you

know how long does it take a human to to

learn how to drive you within months be

able to drive a car really well, you

know, generalized skill sort of

generalized skill acquisition,

generalized understanding acquisition,

generalized reasoning acquisition. And I

think that's the thing that will like

truly change the world. That's the thing

that would give us a better

understanding of of the human mind of

human consciousness and that's the thing

that will like propel us to the next uh

level of human civilization.

on a civilization level that's a really

deep question but separ

but but there's an academic aspect of it

that I'm really

>> so what and what odds what if we're on

if we're on Kelsey today what what odds

do we place on that

>> I I I'm kind of bearish on on on true

AGI breakthrough because

>> what we built is so useful and

economically valuable uh so in a way

>> good enough good enough is the enemy

Yeah. Yeah. Do do you remember that

essay? Um,

>> worse is better.

>> Worse is better.

>> Worse is better. Worse is better. And

and

>> so there's like a local there's like a

trap. There's like a local local maximum

trap. We're in a local maximum

>> local maximum trap where it's because

it's because it's good enough for so

much economically productive work.

>> Yes.

>> It relieves the pressure um in the

system to create the generalized answer.

>> Yes. And and then you have the weirdos

like Rich Sutton and others that are

still trying to go that down that path

and maybe they'll succeed,

>> right? Uh but there's enormous

optimization energy behind the current

thing that we're hell climbing on this

like local maximum.

>> Right. Right. Right. And and the irony

of it is everybody's worried about like

the you know the gazillions of dollars

going into building out all this stuff

and and so the the the most ironic thing

in the world would be if the gazillions

of dollars are going into the local

maximum.

>> That's right.

>> As as opposed to a counterfactual world

in which they're going into solving the

general problem.

>> But but it's also potentially

irrational. Like maybe the general

problem is actually you know not within

our lifetimes. Who knows? Right. Um, h

>> how much further do you think like do

you think we squeeze most of the juice

out of out of LLMs in general then? Or

are there any other research directions

that you're particularly um excited

about?

>> Well, that's the thing. I think the

problem is there aren't that that many.

I I think the the the breakthroughs in

RL are incredibly exciting, but we also

knew about them now for like over 10

years where you marry generative uh

systems with uh with tree search and

things like that. Um but but there's a

lot more to go there and I think again

the the the the original minds behind

reinforcement learning are trying to go

down that path and try to kind of

bootstrap intelligence from scratch. Uh

Carmarmac is is going down that path as

as far as I understand Carmarmac you

guys may be invested but the the you

know they're they're not trying to go

down the LLM path. So there are people

that are trying to do that but I'm not

seeing a lot of progress or outcome

there but I watch it kind of from far.

Although, you know, for all we know,

it's already there's already a bot on X

somewhere.

>> You know, you know, you never know. It

might not be a big announcement. It

might just be a you know, one day

there's just like a bot on X that starts

winning all the arguments.

>> Yeah, it could be

>> or a code a user read and all of a

sudden it's

>> generating incredible software. Um,

okay. Let's uh let's spend our remaining

minutes. Let's let's let's talk about

you. So, uh so uh so how so yeah, take

us start from the beginning with your uh

with your life and how how did you get

how did you get from being born to being

in Silicon Valley?

>> Okay.

um

>> in two minutes. Yeah,

>> I'm just I'm joking. But

>> yeah, I I got introduced to computers uh

very very early on. And so for whatever

reason, so I was born in Aman, Jordan

>> and for whatever reason, my my dad who

was just a government engineer at the

time uh decided that computers were

important and he didn't have a lot of

money took out of that, bought a

computer. It was the first computer in

their in our neighborhood. first

computer of anyone I know. And I just

one of my earliest memories I was 6

years old just watching my dad unpack

this machine and sort of open up this

huge manual and kind of finger type CD

LS MKDIR and like I would, you know, be

behind his shoulder and just like

watching him, you know, type these

commands and seeing the sort of machine

kind of respond and do exactly what he's

asked it to do. Um,

>> pop in Tylenol as your

Exactly.

Autism activated.

>> Of course, you have to.

>> You have to.

>> Exactly. What kind of um what kind of

computer was it?

>> Uh it was uh an IBM as far as I

remember. It was IBM PC.

>> What year was this? Uh 1993.

>> 1993. Okay. So, it's DOS. So, did it

have Windows at that point or

>> No, it didn't have Windows.

>> Right before Windows. right before

Windows, but I think Windows had been

out, but you would add

>> you it was an add-on. You wouldn't boot

it up. So, we I think we bought the disc

for uh for uh for Windows and you had to

kind of uh bootloaded, you know, from

the disk and and it will open Windows

and you can click around. It wasn't that

interesting cuz there wasn't a lot on

it. So, a lot of time I just spend in

DOSs and writing batch files and opening

games and messing around with that. Um

but it wasn't until Visual Basic that I

started. So like after Windows 95 that I

started making real software, right?

>> Uh and the first idea I had was um I I

used to be a huge gamer. So I used I

used to go to these uh Lang gaming cafes

and play Counter Strike and I would go

there and you know the whole thing is

full of computers but they don't use any

software to run their business. M it was

just like people running around just

like writing down your machine number,

how much time you spend on it and how

much did you pay and kind of tapping

your shoulders like hey you need to pay

a little more for that. And I asked him

like why don't you like just build a

piece of software that allows me to log

in and have a time or whatever. And I

was like yeah we don't know how to do

that. And I was like okay I think I know

how to do that. So I spent I was like 12

or something like that. I spent like 2

years building that uh and then went out

and tried to sell it and was able to

sell it uh and was making so much money.

I remember McDonald's opened uh in

Jordan uh around the time when I was 13

14. I took my entire class to

McDonald's. It was very expensive, but I

was balling it all this money and I was

showing off um and uh and so that was

the first uh business that I uh created.

And then when it came to and at the time

I started kind of learning about AI, you

know, reading sci-fi and all of that

stuff. And when it came time to go to

college, uh I didn't want to go to

computer science because I felt like

coding is on its way to get automated. I

remember using these um wizards. Do you

remember wizards?

>> Yes.

>> Wizards basically. It's like extremely

crude early bots or that generate code.

Yeah.

>> Yeah. And I remember you could like, you

know, type in a few things like here's

my project, here's what it does,

whatever, and then click click click and

just like scaffold a lot of code. I was

like, oh, I think that's the future.

Like coding is such a

>> it's almost

>> yeah it's solved you know why why should

I go into coding I was okay if AI can do

the code what should I do well someone

needs to build and maintain the

computers and so I went to the computer

engineering and and and did that for a

while uh but then rediscovered my love

for for programming uh reading program

essays on lisp and things like that and

uh started messing around with scheme

and programming languages like that um

but then I found it incredibly difficult

to just like learn different programming

languages. I didn't have a laptop at the

time. And so every time I go to like

wanting to learn Python or Java, I would

go to the computer lab, download

gigabytes of software, try to set it up,

type a little bit of code, try to run

it, you know, run into missing DL issue

or and I was like, man, this is so

primitive. Like at the time it was 2008

something like that you know we had uh

Google Docs, we had Gmail, you could

like open the browser uh and partly

thanks to you and be able to kind of uh

use software on the internet and I

thought the web is the ultimate software

platform like everything should go on

the web. Okay, who's building an online

development environment, right? And and

no one, right? And it felt like I w I

found like $100 bill on the, you know,

on the floor of Grand Association. Like

surely someone should be building this,

but no, no one was building this. And so

I was like, okay, I'll I'll try to build

it.

>> And I got something done in like a

couple hours. Uh, which was a text box.

You type in some JavaScript. We And

there's a there's a button that says

eval. You click eval and evaluates. It

shows you in a in an alert box, right?

Right. So oneplus 1 2 I was like oh I

have a programming environment. I showed

it to my friends people started using

it. I added a few additional things like

saving the program. I was like okay all

right this is there's there's a real

idea here. People love it. And then

again it took me two two or three years

to actually be able to build anything

because you know the browser can only

run JavaScript. And it took a

breakthrough at the time. Uh, Moisella

had a research project called mcripton

that allowed you to uh compile different

uh programming languages like C, C++

into JavaScript. And for the browser to

be able to run something like Python, I

needed to compile C, Python to

JavaScript. So I was the first to do it

in the world. uh so built uh contributed

to that project and built a lot of the

scaffolding around it and we uh my

friends and I compiled Python into

JavaScript and I was like okay we did it

for Python let's do it for Ruby let's do

it for Lo and that's how the emergence

of the idea for replet came is that when

you need a ripple you should get it you

should replet it and so ripple is is the

most primitive programming environment

possible so I added all these

programming languages and again all this

time my friends were using it and

excited about

And I was on GitHub at the time and just

my standard thing is like when I make a

piece of software is open source it. And

so I was open sourcing all the things I

was you know years building just like

this underlying infrastructure to be

able to just run code in the browser

>> and then it went viral uh went viral in

hacker news and it coincided with the

MOO era. So massively online courses

Udacity was coming online Corsera and

and most famously Code Academy. Right.

So, Code Academy uh was the first kind

of website that allowed you to code in

the browser interactively and learn how

to code. And they built a lot of it on

my software that I was open sourcing all

the way from Jordan. And so, I remember

seeing them on Hacker News and they were

going super viral. I was like, "Hey,

that's, you know, I I recognize this.

What are you using?" And so, I left the

Hacker News comments. I was like, "Oh,

you're using my my open source package."

And so, they reached out to me. They uh

they're like, "Hey, would like to hire

you." I was like, "I'm not interested. I

I want to start a startup. I want to

start this thing called Replet." and and

they're like, "Well, no, you know, you

should come work with us. We can we can

do the same stuff." And I kept saying,

"No." I was like, "Okay, I'll contract

with you." They were paying me $12 an

hour. I was really excited about it.

Back from Oman.

>> Um, but they came out to their to their

credit. They came out of Jordan to

recruit me and spend a few a few days

there. And then I, you know, I kept

saying no. And in the end, they gave me

an offer I can't refuse. Um, and they

got me an O1 visa. Came to the United

States.

>> That's when you moved. So when when was

the first cuz you were born what year to

>> 1987

>> 87. What was the first year that you

could remember where you had the idea

that you might not live your life in

Jordan and that you might you might

actually move to the US?

>> Uh when I watched Pirates of Silicon

Valley.

>> Is that right? Okay. Got it. All right.

>> Uh maybe

98 or 99. I don't know when it came out.

>> Okay. That might be a good place to

>> Yeah.

>> Is it worth telling the hacker story

because there's a version of the world

where you didn't actually like if that

changed maybe you wouldn't have gone to

America.

>> Right. Right. Yeah. So uh in in school I

was programming the whole time you so I

I just want to start businesses. I just

like I'm exploding with ideas all the

time. And like the reason Replet exists

is because I have ideas all the time. I

just want to go type on the computer and

like build them. Um so I wasn't going to

school. It was like incredibly boring

for me. Uh and part of the reason why

Replet has a mobile app today is because

I always wanted to program under the

desk just to do things.

>> Um and so the at school they kept

failing me uh for attendance. you know,

so I would get A's, but I just didn't

show up and so they they would fail me.

And so I felt it was incredibly unfair.

And all my friends were graduating now.

This year was like 2011. I've been like

for 6 years in college. It should be

like a three or four year. And I was

like incredibly depressed. I really

wanted to be in Silicon Valley. And so I

was like, "Oh, what if I changed my

grades?"

>> There we go.

>> The university database. And um and and

so I went into my parents uh uh basement

uh and uh implemented uh the polyphasic

sleep. Are you familiar with that?

>> I I I I am

>> uh Leonardo da Vinci's uh polyphysic

sleep. I didn't hear from Ronaldo da

Vinci. I heard it from Seinfeld cuz uh

there's an episode where John Kamemer

goes on on

>> poly sleep what 20 minutes every four

hours. 20 minutes every 24 hours. And

yes, and this this somehow is going to

work well. And it

>> Yeah. And and hacking, if you've ever

done anything,

>> as the meme goes, this this has never

worked for anybody else, but it might

but it might work for me.

>> Yes.

>> And a lot of what hacking is is that

you're you're coming up with ideas for

like finding certain security holes and

like writing a script and then running

that script and that script will take

like a 20 30 minutes to run and so

you'll take that, you know, 20 30

minutes to sleep and go on. So I spent

two weeks just going mad like trying to

hack into the university database and uh

finally I found um a way I found a SQL

injection somewhere on the site uh and I

found a way to like be able to edit the

the records but I didn't want to risk

it. So I went to my neighbor who's going

to the same school. Uh I think till this

day no one caught him. But I went to him

and I said um hey uh I have this way to

change grades like would you want to be

my guinea pig? And I was honest about

it. I was like I'm not going to do it.

are you open to doing?

He's like, "Yeah, yeah, yeah." They call

his human trials.

This is how medicine works.

So, so we we went and and and uh we went

and changed his grades and he he went

and pulled his transcript and the you

know, the update wasn't wasn't there and

went back to the basement. Well, turned

out that I had access to the uh slave

database. I didn't have access to

ambassador database.

>> So, find a way through the network

privilege escalation. It was an Oracle

database that had a vulnerability and

then found the real database and then I

just, you know, did it for myself. Uh,

changed the grades and went and pulled

my scrcripts and sure enough it actually

changed. Went and bought the the the

gown, went to the graduation parties,

uh, did all that. We're graduating. Um,

and then one day I'm at home. It's like

maybe 6:00 or 7:00 p.m. I get a you know

the the telephone at home rings

ominous ominous ring

>> Santa

um hello and he's like hey this is the

university registration system and I

knew the guy that run it. Uh he's like

look you know we we we're having this

problem. The system's been down all day

and it keeps coming back to your record.

there's an anomaly in your record where

you're both pass you have a passing

grade but you're also banned from that

uh final exam of subject I was like oh

[ __ ] well turns out the database is not

normalized so typically that when they

ban you from an exam the the grades

resets to 35 out of 100 but apparently

there's a boolean flag and by the way

all the column names in the database are

single single letters that was the

hardest thing is security by obscurity

>> right

>> and turns out there's a flag that I

didn't track so when when when when you

go over attendance um uh when you don't

attend and they they they want to fail

you, they they ban you from the final

exam. So, I changed the grades and that

that that created uh an issue and

brought down the system. So, they were

calling me and I thought at the time I

was like, you know, I could I could

potentially lie and I'll it'll be a huge

issue or I just like I'll just I'll just

fess up. Yeah. So, I said, hey, listen,

look, um yeah, I might know something

about it. Hey, let me let me come uh

tomorrow and kind of talk to you about

what happened. So, I go in and I open

the door and it's the deans of all the

all the schools. It's a computer science

computer. They were all working on it

for like days because it's like it's

like it's a very computerheavy, you

know, university and it was like a

problem

>> and they're all kind of really intrigued

about what happened. And so I pull up a

whiteboard and started explaining what I

did and and everyone was engaged. I gave

them a lecture basically. your oral exam

for your PhD.

This is great.

>> They were they were they were really

excited and uh and I think I it was

endearing to them. I was like, "Oh, wow.

This is a this is a very interesting

problem."

>> Um and then I was like, "Okay, great.

Thank you." And I was like, "Hey, wait,

>> wait. We don't know what to do with you.

Do we send you to jail? Do we

>> And uh I was like, "Hey, we have to

escalate to the university um uh

president." and and he he was a great

man and I think uh he gave me a second

chance in life and I went to him and I

uh you know I I explained the situation

I said like I'm really frustrated. I

need to graduate. I need to get on with

my life. I've been here for six years

and I just can't sit in in in school the

stuff I already know. I'm a really good

programmer. Uh and and he gave me a

Spider-Man line at the time. was like

with great power comes great

responsibility and you have a great

power and you know and it really

affected me and I think he he was right

at the moment and and so he said well

we're going to let you go but you're

going to have to help the system

administrators secure the system

>> uh for the summer I was like happy to do

it and I show up and all the programmers

there hate me hate my gut%

>> and uh they they would lock me out like

I would see them they would be outside I

would knock on the door and nobody would

listen it's like they don't want to let

me in I try to help them a little bit

they theyen't and collaborative and so I

was like all right whatever. Uh and so

it came time for me to actually

graduate. It was the final project and

one of the computer science dean came to

me and he said look I I need to call a

favor. I was a big part of the reason we

kind of let you go and we didn't kind of

prosecute you. Uh so I want you to work

with me on the um on the final project

and it's going to be around security and

hacking. I was like no I'm I'm done with

that [ __ ] like I just want to I just

want to build programming environments

and things like that. Uh and he's like

no you have to do it. I was like okay.

So I I thought I' do something more

productive. So I wrote a security

scanner uh that was very proud of that

that kind of crawls the different side

that tries to do SQL injection and all

sorts of things. Um and actually my

security scanner found another

vulnerability in the system.

>> Amazing.

>> And so I went to the defense and he's

like you need to run this security

scanner live and show that there's a

vulnerability. And I didn't understand

what was going on at the time, but I

just okay. So I gave the presentation

about how the system works and I was

like, oh, let's run it. And it showed

that there's security vulnerability.

Okay, let's get let's try to get a

shell. So the system automatically runs

all the security stuff and it gets you

gets you a shell. And then the other

dean that turned out he was giving the

mandate to secure the system. And now I

started to realize I'm a pawn in some

kind of rivalry here. and and his his

face turned red and he's like, "No, it's

impossible. You know, we secured the

system. You're lying." I was like, "You

know, you're accusing me of lying." All

right. What should we know? Should we

know your um uh your salary or your

password? What do you want me to look

up? And I was like, "Yeah, I look up my

password." So, I I look up his his

password. Uh and it was like gibberish.

It was encrypted. And I was like, "Oh,

that's not my password. See, you're

lying." I was like, "Well, there's a

decrypt function that the programmers

put in there." So I I do decrypt and it

shows his password. It was something

embarrassing. I forgot I forgot what it

was. And so he gets up really angry,

shakes my hand and leaves to change his

password. Uh so that I I was able to

hack into the university another time.

Luckily I I was able to graduate, gave

them the software, they secured the

system. But um but yeah, later on I

would realize that yeah, he wanted to

embarrass the other guy, which was why I

was in the middle.

>> Politics. Well, I think the the moral

the moral of the story is if if you can

successfully hack into your school

system and change your grade, you

deserve the grade and you deserve to

graduate.

>> I I think so.

>> And and and just for any for any parents

out there just children out there, you

can just you can see you can site you

can sight site me as the moral you can

site you can set out to me as the moral

moral authority moral authority on this.

One maybe lesson I think that is very

relevant for the AI age. Uh I think that

the traditional sort of more conformous

path is paying less and less dividends

and I think uh you know kids coming up

today should use all the tools available

to be able to discover and chart their

own paths cuz I feel like just you know

listening to the traditional advice and

doing the same things that people have

always done is just not as it's not

working out as much as as we'd like.

>> Thanks for coming on the podcast. Thank

you, man. Fantastic.

Wow.

Wow. Wow.

Loading...

Loading video analysis...