John Schulman on dead ends, scaling RL, and building research institutions

By Cursor

Summary

## Key takeaways - **ChatGPT Possible in 2018**: With full hindsight, a few talented people working for a year with a few GPU boxes could have built something at GPT-3.5 level back in 2018 or 2019, building on existing pre-training datasets. NanoGPT by one person in half a year on one box provides an upper bound. [02:05], [03:05] - **Universe RL Dead End**: Universe aimed to joint-train on diverse RL environments like video games and web tasks for a general agent, but it was a decade too early, unwieldy, and models didn't generalize. A scoped-down version focusing on emulated video games proved more successful. [06:06], [07:34] - **Value Functions Not Helping Now**: Value functions don't help much in current RL settings like RLHF with verifiable rewards and short-to-medium horizons, despite providing variance reduction in other tasks. They may make a comeback later. [17:02], [18:19] - **Hands-On vs Hands-Off Managers**: Hands-on managers who code and give detailed technical feedback suit goal-oriented work with less experienced people, while hands-off managers acting as sounding boards fit exploratory research with experienced contributors. Both approaches can succeed depending on context. [10:32], [11:48] - **Early OpenAI Exploratory**: Early OpenAI blended small rag-tag research projects by 1-3 people and bigger engineering efforts inspired by DeepMind's AlphaGo, but many projects failed as the norm. It was more peacetime with no clear scaling direction unlike today's catch-up mode. [04:17], [15:47] - **AI Accelerates Own Timelines**: While engineers consistently underestimate timelines by 2-3x as seen in self-driving cars, AI's positive feedback loop accelerating its own development may lead to shorter AGI timelines than naive predictions suggest. [46:20], [47:28]

Topics Covered

ChatGPT Possible in 2018 with Few People
Universe Project Failed Due to Prematurity
Hands-Off Management Fits Exploratory Research
Avoid Catch-Up Mode to Preserve Exploration
Co-Train Generators and Verifiers for Self-Improvement

Full Transcript

Back around.

>> If the group of people that started OpenAI went back to 2015, 2016 and wanted to speedrush building chatbt, how fast could they do it? And uh what would

be the bottlenecks in not doing it even faster and uh what would what moves would that group play that that would be different than than what actually happened?

>> Yeah, I think if you wanted to make chatbt with a lot less compute, you could and we've seen uh things like nano GPT that uh that sort of do this. Um I

mean uh like sometimes it's easier to do to uh do something with more compute and but then by adding more clever tricks you can do it with uh less compute. Um,

I mean also I guess we could have scaled a lot faster um or or it would have been possible to to scale um if we knew if we knew that the returns uh would be what

they were.

>> Yeah, I think if you wanted to do it a lot earlier, you if you if you had the whole recipe in mind, you probably could build it a lot earlier. You you could like um put together a big cluster and

pre-train a model and then uh like given all the things we know now about post- training uh you can uh you can get a lot of um effective like you you can

effectively increase your compute a lot by doing post- training better. So even

if it takes like um like a GPD3 level model to create a good like viewshot prompted chat model uh you can um if you're willing to do a lot of fine-tuning um and and construct the

fine-tuning data set in a clever way you can do that you can get a much uh smaller model to be um quite good.

>> How many people do you think it would have required and what what year do you think it could have been done and how many GPUs?

Um I mean if we assume full hindsight I think full >> hindsight. Yeah.

>> hindsight. Yeah.

>> So nanohat um is just programmed by one person and it it runs on one box that took took him probably like half a year to to write. So that's at least an upper bound.

>> Obviously this is on like H100s and we would have had like a V100s or something earlier. Uh

earlier. Uh >> so but I think we could have worked together a few um GPU boxes. you could

have um gotten something that was chatbt 3.5 level maybe back in 2018 or 2019 with um with a couple people if you uh I

might be um underestimating all the different parts of the stack but uh I think I think you could uh yeah if you

had a few pe a few talented people um working for um a year or so with full hindsight I think you'd get something Actually, this is also building on um

like pre-training data sets and scrapes that other people have done. So, yeah,

I'm I haven't thought this through fully, but I' yeah, I'd say you could probably you could probably do something back in 2018 or 2019 with a few people that would get to um like GPD 3.5 level.

And maybe in the future we'll get get even more extreme and there will be like the demo scene uh chatbt that's like uh like one file that trains the whole thing and scrapes all scrapes the web

and does the whole thing in a day of training.

>> Well, so OpenAI is one of the biggest companies in the world now from a market capitalization standpoint and you know uh among technology companies maybe

maybe capex investment. Um, but I think it's easy to lose sight of um how informal and kind of like rag tag of a group it was early on. And curious if

you agree with that premise um that it really was a group that felt very uh scaled down informal, you know, maybe stuff felt less uh much less weighty in kind of 2016 2017. And then maybe to

illustrate uh just a picture, help us fill in a picture of what early open AI looked like. I'm curious, what was one

looked like. I'm curious, what was one false start that the group worked on?

Like a project that just was a complete dead end, didn't work, uh, and now doesn't really get talked about as much um, in 2025.

>> Yeah, I'd say early on it was a more um, it was sort of um, yeah, more rag tag, maybe even it was a little bit like an academic uh, group. Um, I mean there

were there were just a bunch of different research projects that people were working on uh sort of driven by their own taste and uh work people were working in groups of one two three

people on on some kind of research project that would turn into a paper or blog post. So I'd say the first um the

blog post. So I'd say the first um the first couple years of OpenAI um had a lot of that flavor. I mean there was also the idea of um uh big projects uh

and the idea that you could put together you you could um like compared to academia we could go a lot further by doing serious engineering and putting together bigger groups of people on a

project. So I'd say that um that idea

project. So I'd say that um that idea was was with us the whole time and uh like we we were also influenced by deep mind who had uh pioneered this way of

working to a large extent with uh projects like Alph Go. So we so that we had that in mind, but I'd say the company was a blend of these uh like

smaller research projects and and bigger projects um where the idea would be to put together a bunch of researchers and engineers. And I'd say I'd say not all

engineers. And I'd say I'd say not all the projects were successful. I mean

obviously a lot of research projects didn't go anywhere. Maybe even the the norm is for a project not to not to turn out to be um part of the the main branch

of Yeah. The tech tree. Yeah. Uh but uh

of Yeah. The tech tree. Yeah. Uh but uh I'd say there were yeah some of the bigger projects might not have been the most successful. Um like there was a

most successful. Um like there was a there's an early project um called Universe where the idea was to create lots of different RL environments and uh

build a whole data set of them and put them all together and uh the idea would be that you would if you joint trained on all of them uh you would generalize to other things and you'd get a general RL agent that was really good and we

were going to like collect lots of different video games and uh like web sort of web navigation tasks and put them all together And actually I think uh like the funny thing is uh I think it

was a deeply correct idea but it was just way too early like maybe even a decade too early and there were a lot of prerequisites that were missing at the time but at the time it ended up that people got together and built this

system and started doing experiments on it but the whole system was very unwieldy and it was sort of bad for RL experiments and since we were training models from scratch uh these models

didn't really generalize that well to anything else. So it ended up um at the

anything else. So it ended up um at the time being unsuccessful and um we ended up having more m getting more mileage

out of um scoped down version of this.

like I um like I ended up um leading the uh team working on uh reinforcement learning research uh for a few years and we were we we were still working on

these um collections of video game environments. But we instead of trying

environments. But we instead of trying to uh like just uh create this big data set of any uh like anything you could do in front of a computer, we focused on uh

like emulated video games. and that

ended up being much more uh favor favorable to work with. So that was one of the unsuccessful projects. I'd say

there were some other ones like uh robot robotics that ended up being somewhat of a dead end for the company but also being useful in the long run by just building up capacity to do big

engineering projects and research projects and like training a lot of people to do this kind of work. Do you

remember what the biggest engineering projects looked like before 2020 for OpenAI or just what the state of the research infrastructure was was overall?

Like maybe there's a particular system that sticks out as being one that was quite useful or quite complicated or caused lots of problems for researchers all the time. I guess there were a handful of these bigger research

projects like uh like the robotics um project and Dota was probably the the earliest really successful like big big project uh with a lot of compute. So

yeah, these projects ended up having um like being some combination of like system ML systems work where there was like some big code base and system that was built for it and then like a bunch

of research on like RL and a certain regime. So um

regime. So um >> and and the underlying engineering projects there would it be the like how you hook into Dota and actually take it over and programmatically control it or

like the uh the training infrastructure.

>> Yeah, there would be both the uh the environment infrastructure like how you hook into the like the software or how you build the the training environment and then there would also be uh like a a

training system which is ideally decoupled from that but usually not completely decoupled. that would involve

completely decoupled. that would involve like um like doing large scale like uh uh large scale rollouts and uh like training in parallel maybe async RL and

that sort of thing.

>> How would you describe the ideal of the perfect research manager? Um because

it's this weird role where more and more big science is being done in ML. there

are bigger and bigger teams working together uh and that requires more coordination and the the people that are being managed and need to work together

I assume are like a you know an big set of personalities um and yeah it's it's this new field um and so there's probably not that many people with technical specialty who also want to go

into management um and so yeah in your experience what works what doesn't uh in the role of a research manager or manager of managers and how would you describe the the ideal for that role.

>> Yeah, it's a tough question because I've seen people take very different approaches and uh and be successful with very different approaches and also I think the field is changing. So it could

be that it's a non-stationary problem and uh what might have worked uh seven or eight years ago is not the right approach now. Um, so I've seen um I've

approach now. Um, so I've seen um I've seen uh like one model where you have like a small group and uh the person in

charge of it is very hands-on and uh is writing a lot of code themsel and like reading all of the the code from all their reports and is is giving them very

detailed technical feedback. I I've also seen groups where uh there's a more hands-off manager who's uh sort of just um just sort of helping people with uh I

mean by being a sounding board, giving them career type advice as opposed to detailed technical advice, just keeping people uh happy and motivated and uh

like uh and just sort of letting people uh letting people do their own thing.

And um I mean I think both of these sorts of uh both of these modes of operation work in different places. If

you're doing more exploratory research and you have uh like fairly experienced people uh doing the um individual contributor work, I think it makes sense

to be more hands-off uh and um just let people uh let people do their thing and uh sometimes someone will uh discover something interesting and great and um

but if you're sort of more goal oriented and and also if you have less experienced people uh or or people who have I I guess if you're just trying to um

like execute on more specific things, I think it's it might make more sense to have a model where the manager is is more hands-on and is is sort of giving more technical oversight.

>> The term member of the technical staff, I assume um OpenAI borrowed it from Bell Labs. Um but I actually was was lucky

Labs. Um but I actually was was lucky enough to talk to an early research executive at OpenAI once and I asked him um how inspired were you by you know

past examples of successful industrial research labs like maybe Xerox Park Bell Labs and he said not at all like you know that was not at least for him he didn't look to them for inspiration at

all. Yeah. Um, what were these

all. Yeah. Um, what were these institutions places that people at Open talked about or like were there any groups that uh people took inspiration from or it was kind of just all figured out on the fly?

>> I'd say uh some people might have been inspired by um past research labs like in practice we were probably more we probably drew more from uh previous

places we had worked. So uh like our like most of us had some kind of lineage that involved uh grad school uh like working at Google brain or deep mind. I

mean I'd say those uh like I'd say almost everyone had worked at Google at some point. Uh so so we probably got a

some point. Uh so so we probably got a lot um we sort of were influenced by how they did things there. Um, I mean I' I'd

say uh I remember some discussions about um like um people would talk about like the the making of the atomic atomic bomb and

other so the Manhattan project and other institutions like that um but I don't remember a really deliberate effort to really analyze the previous uh like most successful research institutions and and

sort of build on build on their strengths >> and then I guess grad school early OpenAI, middle to late open AI, anthropic thinking machines, Google, maybe some subset of these, like how

would you characterize their differences? And what types of problems

differences? And what types of problems are those environments best suited to solve?

>> Um, and like, you know, maybe the popular conception of each of these places is uh Google's a place with great engineering, but it maybe is historically been a little slow moving.

>> Um, anthropic is more safety focused.

I'd imagine maybe thinking machines is a bit more product focused than some of these other environments. Um, but

anything that you think is maybe underd discussed about those research environments you've been in? Uh, and how what what problem uh is each especially

uh built to solve? Well,

>> yeah, that's a um like that's a pretty broad question and it's hard to like talk about all these different places.

Also, they they've all changed over time.

>> Perhaps uh early open AI versus thinking machines. Actually, I do see a lot of

machines. Actually, I do see a lot of similarities because I think uh there is a lot of um like there are there are several different things that people are working on in parallel and we're still

sort of shaping the vision of the company and uh the vision is going to sort of emerge out of um like uh seeing these different projects

take shape. It's also a different point

take shape. It's also a different point in the history of the field. So um like the field is moving really fast now and there are other uh like companies that

are moving quickly. So uh there's some like pressure to catch up to the um like current state-of-the-art alongside whatever um new things we want to do at thinking machines. Whereas at like in

thinking machines. Whereas at like in the early days of open AI the field was uh like there was um I mean obviously there was deep mind but it wasn't like

everyone was uh competing in some coherent uh like direction. There wasn't

like a clear direction to go in. Um

maybe there's uh like some idea that um like you wanted to um like scale up RL and make RL work better and discover the uh like better architectures

>> and and so forth, but I'd say it wasn't like there was a um there was some axis that everyone was trying to scale on and so so to some extent it was more like peace time uh in the early days of open

AI and that led to a lot more exploratory work and uh whereas I think a lot of companies that have started ed more recently are uh like more compelled

to be in catch-up mode for a while and and first to sort of uh replicate the state of the art. Um, so actually I've been try um like I've been uh pretty

aware of this and trying and and trying to um I I've definitely tried to make sure we're not just in catchup mode and we're also like building up a lot of muscle around doing exploratory research

and exploring new ideas that aren't necessarily along the main path of uh that the rest of the field is going in

because um I think it's easy to If you just if you're just in catch-up mode, it's harder to build up that like exploratory research muscle later

>> and the culture. It it takes a lot of uh like building the right culture is is hard to do later.

>> That makes sense.

>> Why aren't value functions popular in RL >> right now?

>> Yeah, I'd say they don't seem to help very much uh in the the settings where um that pe people are doing RL on right

now. So, for example, doing um uh like

now. So, for example, doing um uh like RL from human feedback and RL on uh on these verifiable rewards with a fairly

short um like time horizon. Um or or or actually um I don't want to just say um we're working on short time horizons now because if you're sampling tens of thousands of tokens, that's a pretty

long time horizon. But uh yeah, the current um the current set of um tasks that people are doing RL on um for some reason value functions don't seem to

help very much. And

um it's a little um yeah, it's hard to it's hard to know why. Um I mean, I'd say

why. Um I mean, I'd say uh I'd say value functions help a lot.

Value functions um give you variance reduction. That's the their main

reduction. That's the their main purpose. And uh so uh so for some reason

purpose. And uh so uh so for some reason you don't get that much variance reduction on this current uh like set of tasks. I don't think uh whereas you do

tasks. I don't think uh whereas you do get a lot better variance reduction on some other tasks the people have used um for RL research. So uh why that's true I

I couldn't say. Um I would expect value functions to make a comeback at some point though.

>> What's your best guess at how we solve continual learning? Um, and do you think

continual learning? Um, and do you think Laura will play a part in that?

>> Yeah, I mean I'd say continual learning uh might mean several different things like there I mean there are several different kinds of learning. Uh there's

um sort of like uh like to use more of a psychological analogy. There's like

psychological analogy. There's like motor learning and there's like episodic memory and and learning new knowledge

like uh procedural uh memory. Um, so I'd say there's different kind kinds of learning that might benefit from different things. I'd say um I'd expect

different things. I'd say um I'd expect um I'd expect the uh like in context um methods or context management to go to continue to get better and long context

abilities to continue being important.

I'd say Laura um I would expect Laura to stack um stack on top of that or basically like parameter uh fine-tuning to stack on top of that and uh be better for some kinds of memory than others

especially ones that require a lot of capacity and absorbing a lot of knowledge and maybe like it's hard to say exactly uh >> do you think we will need ideas other than putting the right things in the

context window and then a bit of uh parameter fine tuning on top to solve the problem of weep deploy these systems into the world we want them to learn new things on the fly.

>> It's hard to say what we need to what we need because um I'd say if we keep scaling models and making them better like all of whatever metrics we write down will continue to improve. On the

other hand um yeah so it could be that we'll get uh even if we don't change any of our methodologies and even if we don't do parameter fine-tuning uh we'll we'll eventually solve all these

problems. Though it's also likely that um there's some new ideas that'll solve those same problems faster and might uh give you a different scaling law where it's either like a different uh

um like an effect an increase in effective compute by either a fixed multiplier or maybe you have like a different slope of the scaling law um if you use a different method. So I could

see um yeah I could see other methods um giving you a much uh better uh scaling law or

faster like more effective uh continual learning. Um it's hard to say exactly

learning. Um it's hard to say exactly which um tasks will benefit the most from parameter fine-tuning, but I would expect um like it's a I would expect it

to help in a certain inter intermediate regime where uh like um you have more I would expect like in context learning to help in the in a very short horizon

regime and that to be like really hard to beat in a uh over a short time horizon, but I would expect um weight updates to win over a longer time horizon.

How worried are you about generalization being a blocker to having general AI um working useful ways across many areas of knowledge work?

>> Um are you worried that we're going to um do good pre-training work and that will get us only so far? Uh and then we will use RL. Uh RL will um work for the

domains and distributions in which we're RLing things on. there won't be great transfer.

>> Yeah, I'd say it's hard to speak clearly about how well models generalize uh and like what their sample effic what what

the um how their sample efficiency compares to humans because uh like in within context learning models can have very good sample efficiency that's uh on

par with uh humans or or better. Um and

then um but then it seems like um certain kinds of training requires a lot more uh data than um uh like what it

takes humans to learn the same thing.

So um I'd say um yeah I'd say there's there are some ways in which um models are a lot more brittle in humans but it's uh yeah it's

hard to give a really clean description of uh what those are. Um and I think it's a sort of um I think it's a human

start to um do better at longer time scales and uh yeah because we have been optimized by evolution to operate over an 80year like

time horizon. So we have there are a lot

time horizon. So we have there are a lot of sort of self-correction mechanisms that that occur. Um I mean people aren't perfect at uh self at like correcting

their errors but they're um pretty good at it. And people um if you give people

at it. And people um if you give people a goal and u motivation uh to pursue that goal they will uh like really they

will be um very uh like resourceful and try a lot of different things. So I'd

say um and like uh and models will tend to um models can also be uh very persistent and uh uh in some cases more persistent but uh

like also tend to get stuck um more easily over when doing larger chunks of work. So to what it's hard to uh to say

work. So to what it's hard to uh to say um whether this is just a temporary phenomenon and uh like the time horizons of models are about to increase radic

like drastically or um or whether this is just like some fundamental weakness that or whether this this will take a really long time to get up to the human time horizon. Um, and it's actually hard

time horizon. Um, and it's actually hard to figure that out because if if we're talking about like a time horizon of decades, then obviously it takes decades

to test out um like uh to evaluate the models for this in a world where it becomes much more popular to co- train models together. So maybe just you have

models together. So maybe just you have a general setup where you have a generator um trying to solve RL problems and maybe the way you assess rewards starts to involve models too. You're

maybe co-raining judges and generators together. Do you think any ideas from

together. Do you think any ideas from 2010's GANs or like any old ideas that have been forgotten about training models together will be useful or important um or things we we should take

inspiration from or not sure? Yeah, I

think um like co-raining generators and verifiers makes a lot of sense uh because you can uh like in in theory you get some kind of self-improvement

because uh if you have a model that's um like doing reasoning and instruction following as part of the verification process and you're using that to um provide learning signal to the

generative model. then uh like as the

generative model. then uh like as the model gets better at reasoning and following instructions, it also becomes a better verifier and you have somewhat of a virtuous cycle there. So uh I think

that makes a lot of sense. I also think um I'm pretty uh fond of ideas around like multi- aent training or games uh

like setting up uh games where you have either a zero sum game or a multiplayer game uh where um

uh yeah where you can sort of design the game so they um the equilibrium is is something very interesting and like games give you a lot of uh give you a

lot of nice properties like they give you an automatic curriculum uh like because uh like the u if you're a player in a game uh like if you're playing

against copies of yourself uh your opponents are getting better at the same as you get better. Um there are also sort of um they're sort of theoretical

CS uh flavored reasons why um why sort of uh setting up games uh uh like games

might be uh a good idea like uh it turns like there's these ideas around um uh like if you like there's these different complexity classes uh that are defined

in terms of uh like two players zero sum games where uh you have like a polinomial time uh judge and uh like two players and the equilibrium of this game

like solves a really hard problem. So

like using an inexpensive uh like a computationally cheap process you can create an incentive such that the equilibrium involves solving a very hard

problem and uh like there was some uh alignment literature about this idea. I

mean in particular the idea of the debate game uh which uh I think was pretty compelling and I I haven't seen um yeah there's been some uh some work on that idea but I I expect that to be

more and more import that kind of idea to become more and more important.

>> How do you personally use AI?

>> I use AI to for coding a lot. I mean I use yeah I use cursor a lot and uh cloud code and some other tools. Um

I uh I mean I spend a lot of time uh I have like uh chat windows open with different uh like uh with different models a lot and I ask them a lot of questions u multiple times a day

>> like the types of questions you go to Wikipedia for or um do you actually involve them in the research process at all?

>> I definitely use models for research a lot. I mean, I'd say uh I'd say if I

lot. I mean, I'd say uh I'd say if I have an idea now, I'll just um I'll just like fire off a bunch of questions to GPD5 Pro and uh have it do a bunch of literature searches for me. Or sometimes

if I have a vague idea, I'll like write a paragraph or two and uh just tell like tell the model to flesh it out a bit more. Um and I'd say um yeah, I find

more. Um and I'd say um yeah, I find this quite helpful and uh yeah, definitely the definitely the literature search ability is is extremely useful

because uh it used to take a lot longer to to find relevant literature and um I mean this this extends to also things like finding open source libraries uh

that like that's also much easier now.

Um so yeah I definitely use models a lot for uh like yeah finding relevant literature and sort of iterating on

ideas. Uh

ideas. Uh yeah I also use them for um like writing giving feedback on writing a lot like yeah I'll often have to um do most of

the um like uh most of the thinking myself. Uh, but then I'll I'll usually

myself. Uh, but then I'll I'll usually um the I'll use the chat models as like my uh like first uh first round of feedback. What does a day in your life

feedback. What does a day in your life look like when you're doing research? I

can imagine some people would guess that you know um to be as productive as you you know maybe working wall to wall staring at data talking with colleagues.

Uh there's also another version of the world where uh you know maybe you're someone who uh actually uh is working in very short concentrated bursts but is taking lots of time to walk around and

and think and generate ideas. Um but

yeah, what is a day in the life of of John Scholman look like when when doing research?

>> Actually, I come to uh coffee shops a lot. I like thinking at coffee shops

lot. I like thinking at coffee shops where there's sort of a uh there's just a buzz of activity around and I can just uh like sit with my coffee and the

notebook and just kind of jot down some ideas and remove uh distractions. So I

definitely do some of that where when I'm in more of the idea formation phase of projects or when I'm thinking more about what's the right thing to work on then at a certain point it becomes about

execution and at those those times I'm uh like if uh if we're in a project that's more in execution mode um I'll I'll just I'll be spending more time

either uh coding if I'm working on it myself or um I mean I spend a lot time actually just like reading docs and uh like messages that other pe people have

written or looking at their looking at their plots and and code. So I mean I'd say I spend a lot of time doing research advising now and and there I'm just uh usually like reviewing other people's

work. Do you think the skills someone

work. Do you think the skills someone would need to do effective research in 2019 or 2020 are the same as one needs now? Uh and in particular you wrote this

now? Uh and in particular you wrote this blog post in 2020 on how to do effective research. Um and curious if you have any

research. Um and curious if you have any updated recommendations for folks or if it's if broadly you think it stands the test of time.

>> Yeah. Um

so thinking back to that blog post I'd say um a few yeah I made a few points about different kinds of research uh like there's like you have goal-

directed research and more ideal keeping a research notebook and building up your taste by reading a lot of papers. I think all of that advice is

papers. I think all of that advice is mostly uh mostly still holds. Um so

yeah, I would still endorse that advice.

I I'd say um yeah, what's what's changed? I mean, I'd say actually I'd

changed? I mean, I'd say actually I'd say uh keeping a lab notebook is probably even more useful now now that we have LLMs. Like context is so important. So if you want to get good

important. So if you want to get good feedback on what you're doing, you can probably uh like paste your notebook into the LLM, get some get some uh some

feedback. Um yeah, I'd say probably um

feedback. Um yeah, I'd say probably um the biggest change I can think of is just that uh like now it is really useful to figure out how to incorporate

like uh LMS into your work and uh I mean that uh I so actually I haven't thought um I haven't thought hard enough about what's

the best way to like accelerate your research other than things that just generally accelerate your engineering.

And it's actually a little bit um yeah, it's a little bit not obvious. The

the advice might be different for research than other areas of software engineering because I do think uh there's a lot of value in like understanding every line of code and uh like having having something very simple

where you understand every line of code rather than just sort of uh like having the having the model just like uh write large amounts of code that you've never read. So I I think like this style of AI

read. So I I think like this style of AI assisted coding where you just have a model uh like write a big code write it the whole implementation for you might work well if uh in in other domains

where you kind of just want to define the spec and have the the model just write something that seems to satisfy that spec. I think for research there is

that spec. I think for research there is a lot of value in like knowing about exactly exactly what's going on in every line of code and the people uh I think the people who have done the best work

really have have that understanding of the whole uh like uh down all the way to the nuts and bolts probably since 2012

but it feels like especially since you know 2020 and uh uh the advent of scaling laws uh lots more researchers have entered the field of ML both in

academia but then also in industry. Um

but and and please disagree if you do uh it seems like the rate of consequential big idea generation has been fairly constant like the story of ML feels like

the actual progress has been so big in building useful systems but uh to build these systems there's the you know underlying few big ideas and then lots of details filled in to make them work

and then a whole graveyard of lots of ideas that just haven't worked. Um, but

how would you explain consequential idea generation being constant even though the number of researchers in the field has 10xed or 100xed? I feel like these questions about like quantifying the rate of scientific progress are always a

bit tricky because you have to think about lowhanging fruit being picked and uh and there's also it's also very hard

to um to really uh like uh measure the rate of progress in the recent past because uh you sort of don't know yet

which ideas are important. Um so

I um yeah I would be a little hesitant to conclude that like the progress the rate of progress is um constant and hasn't really accelerated even with the

large number of people. I mean I think uh I think if you look back to um papers

in uh like the like 70s ' 80s 90s um I think like the um the experimental rigor is lower. I think the standards

have increased inh certain ways for like uh definitely for the level of experimental rigor in terms of like uh trying out baseline methods and doing a lot of uh a

lot of different experiments on like different tasks. Like you might like in

different tasks. Like you might like in the old days you might have an RL paper that had like um like some very elaborate set of ideas and just one

experiment on a toy task that like uh was like very questionable. Um and uh like this could be a seminal paper. Um

so uh and a lot of the ma like the mathematical uh ideas weren't very sophisticated either. I wouldn't be

sophisticated either. I wouldn't be surprised if actually the rate of idea generation has increased a lot and like standards and sort of the level of um

like actually quality uh has increased in in certain ways as more people have entered the field.

>> Does that match your intuition? That's

>> I think that would be my intuition.

Yeah. Yeah.

>> And I also think um like there are a lot of problems with uh lot with the like academic like publishing system and reviewing and everything and it's it's

definitely like frustrating um to a lot of people who are part of this system.

But I also think um >> I also think it's it's not terrible because I think the uh like uh like the field is driven by a lot of uh like

objective improvements uh that people are seeing in reality like like it is um the field is driven by real goals and real problems. So I I think that grounds

it to a great extent. So even though you have a lot of problems and there is a lot of fake research I think overall the field seems to make progress. And

actually, as an aside, how do you think the academic publishing system compares to the internal coordination system of these large AI companies uh and their Slack channels around ideas? Uh, and do

you think that there's anything to borrow from how a thousand person research organization, one of these companies works that can then be transported into open academia?

>> Oh, yeah. That's uh that's really interesting. Um yeah, I'd say the um if

interesting. Um yeah, I'd say the um if you look at the internal uh like the presentation of internal results at one of these big research labs, I'd say I'd

say it's better in some ways and worse in other ways than the the publishing world. I'd say um it's better in terms

world. I'd say um it's better in terms of having a higher accuracy of like drawing real conclusions about things like what what improves pre-training.

I'd say much better like people have much better methodologies and uh like uh and like these experiments are much more driven in like real consequences as

opposed to just getting a paper published. So so the successful

published. So so the successful companies have gotten better at at sort of drawing accurate conclusions. Um I'd

say overall like no one uh is going to take the time to write a tech report in anything near the uh level of detail of anything published externally. like no

one writes really detailed tech reports.

I'd say even though the like the overall level of accuracy is is higher in terms of uh claims. Um, I'd say the um just

the thoroughess of experiments is usually less um in sort of internal research. like people aren't going to uh

research. like people aren't going to uh try the baselines try as many baselines or um I mean they'll uh I mean a lot of academic papers have sort of uh have

baselines that are nerfed in some way that you can't and you can't really trust the results but at least uh at least someone the best work actually does is actually quite thorough and and does a lot of good baseline uh

comparison. So yeah, I'd say um I'd say

comparison. So yeah, I'd say um I'd say overall there's like uh some yeah, the writeups are much more detailed in the outside world and uh the

uh and there's more thoroughess in certain ways, but also less uh accuracy.

Um and I would say overall um yeah, it would be nice. Um, I'd say at um I've been interested in like uh sort of at at

these institutions trying to uh improve the like research writing culture and uh and try to get people to write like how to get people to write u more detailed

tech reports that really like uh that go deeply into the science as opposed to just doing like the minimal thing needed to like find the recipe improvement that's shippable. And uh it's it's

that's shippable. And uh it's it's definitely been a bit of a challenge because it like the incentives of the companies usually are to do like really uh to build up a nice theory and and do really thorough science.

>> Yep.

>> How have the characters entering the field shifted in distribution um from what things looked like in you know 2015, 2016, 2017? Like were people

back then you know you on average more or less skilled? Um you know more you know worse or better engineers like uh more creative um is there a difference

in any way of like you know people attracted to doing this work back then uh to the set of people uh entering the field now? I mean, I'd say um

field now? I mean, I'd say um >> we >> Yeah, I'd say maybe people were a bit weirder back then because I think now um it's sort of more obvious. Uh it's yeah,

more obvious and um like conventional wisdom that uh like AI is uh AI is the most important thing going on and um so

it's going to attract a lot of people who have more like conventional career paths and um are sort of uh less risk

tolerant. Um, so

tolerant. Um, so I say um, yeah, I'd say the PE previously the people were a bit weirder. Now there's more conventional

weirder. Now there's more conventional um, or conventionally minded people. Um,

I'd say overall it's it's hard to compare the talent distributions, but just given the numbers, I would say the uh, yeah, I'd say the bar is high has gotten higher just because so many people are trying to get into the field

and applying for all these roles. Yeah,

I'd say probably um like engineering skill matters more now uh is more important now than it was before as opposed to like research uh taste and um

like the ability to do uh like exploratory research because I think now scaling things has driven so like so many improvements and there's so much lowhanging fruit just from sort of

scaling the simple ideas and executing on them well that I think people and also because the field has gotten more mature so you're usually building on you're not like writing a bunch uh some

code from scratch in a Jupyter notebook anymore. You're uh like you're building

anymore. You're uh like you're building on someone else's codebase and there's a lot of like good uh like uh infrastructure to build on. So yeah,

since um since you're integrating a lot with other people's code and um and tools uh like people who have more of a software engineering background have more of an advantage now.

>> What do you think the future of RL research looks like broadly? Um, and uh, over the last 10 years, it looks like a bunch of topics came in and out of focus.

>> Um, and then it looks like the thing that's publicly worked well on LMS has actually been fairly simple and was >> pretty close to the set of ideas that worked in other domains.

>> Um, do you think there's much more to do in RL research? Um, and

>> do you think that, uh, you know, the most capable RL LM systems in a couple years look very different than some of the old ideas? Yeah, like you said, I think some ideas go in and out of

fashion. Um, and uh sometimes they're uh

fashion. Um, and uh sometimes they're uh they um they become fashionable too early and then they they don't live up to their promises, but then they come

back later. So I expect that to happen a

back later. So I expect that to happen a bit more. Um so uh yeah it's hard to say

bit more. Um so uh yeah it's hard to say exactly what what the important ideas will be but I mean I think uh I think like offline RL is a pretty interesting

idea and set of ideas and and it's sort of um as a yeah to some extent um what we're doing in the LLM world now is like

uh sim tore or what in robotics they call sim toreal where you build a bunch of uh like simulated environments and you try to do uh you try do RL on them at at scale and then like you try to

random have enough diversity that you'll generalize to the real world and uh I think uh like actually I think sim to real is still yielding uh yielding good

results in robotics so it's not like sim tore uh has been sort of uh discredited there I think I think it is a very effective technique but I think there's also a lot of value in learning from the

real world so I expect that to um that to some uh like at some we come back to the LLM world where uh we figure out how to learn from real like uh sort of

deployment. If some of the biggest AI

deployment. If some of the biggest AI labs um did develop really really really powerful AI systems to the point where it would be pretty important to

coordinate with uh both with each other and also with you know other society important institutions like governments.

How confident are you that they would coordinate well if that were going to be necessary for, you know, the future of AI to go well? And how worried are you that that they wouldn't get along and that they wouldn't coordinate well?

>> I feel sort of medium worried or confident. Uh I'd say, um I'd say there

confident. Uh I'd say, um I'd say there is like a reasonable amount of uh commonality of viewpoint and vision between like the leading AI labs. Um,

so, um, I could see them, uh, and also there's been, uh, some collaboration between the labs recently, um, on safety

things. Um, I'd say, uh, I'd say there's

things. Um, I'd say, uh, I'd say there's definitely some, uh, like bad blood.

There's a little bit of bad blood between uh, like the personalities involved, so that that might uh, make it a little harder. But um

>> yeah, I could see it um I could see it working out if if um if it became sort of more clear that this is the the right thing to do.

>> So with uh this being a time where technology is getting better fast, um there's lots of talk about predictions of how fast the technology uh is going to get better uh and in particular lots

of people talk about, you know, what's the estimate uh when do they think AGI is going to happen? It's, you know, common topic. um uh where most people

common topic. um uh where most people when they're talking about API are talking about something like um all knowledge work on computers being done

by AIS instead of humans. Um and

one way in which um I've tried to reason about this in the past is AGI seems like a big complicated engineering and research project and at least in my personal experience engineers and

researchers are kind of abysmal at estimating for much smaller projects when they're going to finish them. Um,

and you know, I think uh in particular like the the systematic bias I've seen from from engineers is they will always assume that they're going to finish something much earlier than they actually do. And you know, maybe like uh

actually do. And you know, maybe like uh the rule I apply is like apply like a constant 3x factor on top of their their predictions to get something closer to the actual uh time it's going to take them to finish something. Uh do you

think that this is uh like a reasonable critique of most people's AGI timelines?

um and they will, you know, underestimate how long it's going to take because researchers and engineers generally underestimate how long things are going to take. Uh and then in your personal experience, you know, the researchers that you've been around, have they and engineers who've been

around, have they been good about estimating project timelines?

>> Yeah, I agree that there's a consistent bias to underestimate timelines and yeah, maybe it's a two or 3x in the good

in the good case. uh I'd say um and um yeah using that uh sort of huristic um I think it is reasonable to uh to predict

that uh like AGI will be a little further out than um people's uh timelines predict when they're using um sort of uh they have like a definite um

prediction for how it'll play out. Um

and um I guess we've seen um this in other the most analogous um problem is maybe self-driving cars and uh we've seen um we've seen this take longer than

people expected uh to sort of get to the um yeah to to get to full autonomy and get to like robo taxis and everything.

Um so yeah, I think that's a reasonable hypothesis. I think um I mean the whole

hypothesis. I think um I mean the whole on the other hand there is this positive feedback loop where AI accelerates its own development that's also also probably going to defy intuition. So I

think uh people people who are incorporating that effect are um coming up with um like pretty short timelines and I also think that is somewhat compelling of a line of reasoning. So,

um I I think yeah, there's a lot of uncertainty about like how much uplift people get from AI and if there's like bottlenecks around humans understanding what's going on and so forth. Um so

yeah, I wouldn't make a really confident prediction either way.

>> So you and Thinking Machines have released Tinker. Uh what is it and who

released Tinker. Uh what is it and who is it for? Tinker is a um it's a low-level fine-tuning API that basically

uh um it it gives you a small set of um low-level primitives to do training and sampling uh that lets you express uh like um almost all um post-training

algorithms that you might want um while not having to worry about um like uh GPUs or or accelerators um and not having to worry about like a lot of um

like distributed um like distributed systems issues you otherwise would. Uh so it's sort of

otherwise would. Uh so it's sort of trying to um it's finding something that we think is like a good uh layer to abstract away and and building that as a

service like people don't usually use um like services for ML training uh and uh the services that exist are much more high level um and so it's kind of novel

in in that it's this uh yeah it's a service that uh is around this lower level primitive like the closest analogy would would actually be uh like the sampling APIs that you can get from

OpenAI and Enthropic and so forth where you uh like you don't have to spin up your own um like GPU box to do sampling.

You you just uh you just do an API call from uh Python or JavaScript or whatever. Tinker lets you basically

whatever. Tinker lets you basically write a lot of um like training code um by writing uh just writing some Python scripts and and having that just work

rather and not and not having to worry about installing a bunch of uh installing a bunch of stuff to run on GPUs.

>> Is your ambition that the next thinking machines that get started by a group of researchers would would actually just build on top of Tinker? Yeah, I would hope that a lot of companies uh would be able to just build on Tinker instead of

developing their own infrastructure, build really uh sophisticated custom models just building on top of Tinker and the scale. Oh yeah, and to answer the question you asked before like who's

it for? I'd say uh like right now it's

it for? I'd say uh like right now it's for people who are um are pretty sophisticated in um like in terms of their their knowledge of ML and uh who who uh like who want to use these

low-level primitives. I mean we we uh

low-level primitives. I mean we we uh shipped a lot of open source code that goes along with Tinker so you don't have to write all the training algorithms yourself but I think I think Tinker is best for people who do want to look at the details and dig into those details.

Uh but I'd say over time we're going to make it more and more uh user friendly and build a lot of like tooling and sort of higher level components on top of

Tinker so that you um so that it becomes more of a like a like a full full stack thing that you don't have to be an expert to use it and you can you can just come in with your understanding of

the business problem you want to solve or the the like the uh the spec of the model you want to build and and uh like the software we we ship will do that for you. And then uh what should we expect

you. And then uh what should we expect from thinking machines in the next in the next year or so? Anything you can share publicly? You'll see some things

share publicly? You'll see some things uh coming out with our own models that's uh um sometime uh like next year and um

also Tinker expect us to keep improving Tinker uh to add a lot more models functionality like multimodal uh train like various kinds of multimodal input

and output uh like really sca um scaling up the sizes of uh jobs you can do with Tinker.

>> Thank you John. It's been fun. Thanks

for having me.

Loading...

Loading video analysis...