John Schulman on dead ends, scaling RL, and building research institutions
By Cursor
Summary
## Key takeaways - **ChatGPT Possible in 2018**: With full hindsight, a few talented people working for a year with a few GPU boxes could have built something at GPT-3.5 level back in 2018 or 2019, building on existing pre-training datasets. NanoGPT by one person in half a year on one box provides an upper bound. [02:05], [03:05] - **Universe RL Dead End**: Universe aimed to joint-train on diverse RL environments like video games and web tasks for a general agent, but it was a decade too early, unwieldy, and models didn't generalize. A scoped-down version focusing on emulated video games proved more successful. [06:06], [07:34] - **Value Functions Not Helping Now**: Value functions don't help much in current RL settings like RLHF with verifiable rewards and short-to-medium horizons, despite providing variance reduction in other tasks. They may make a comeback later. [17:02], [18:19] - **Hands-On vs Hands-Off Managers**: Hands-on managers who code and give detailed technical feedback suit goal-oriented work with less experienced people, while hands-off managers acting as sounding boards fit exploratory research with experienced contributors. Both approaches can succeed depending on context. [10:32], [11:48] - **Early OpenAI Exploratory**: Early OpenAI blended small rag-tag research projects by 1-3 people and bigger engineering efforts inspired by DeepMind's AlphaGo, but many projects failed as the norm. It was more peacetime with no clear scaling direction unlike today's catch-up mode. [04:17], [15:47] - **AI Accelerates Own Timelines**: While engineers consistently underestimate timelines by 2-3x as seen in self-driving cars, AI's positive feedback loop accelerating its own development may lead to shorter AGI timelines than naive predictions suggest. [46:20], [47:28]
Topics Covered
- ChatGPT Possible in 2018 with Few People
- Universe Project Failed Due to Prematurity
- Hands-Off Management Fits Exploratory Research
- Avoid Catch-Up Mode to Preserve Exploration
- Co-Train Generators and Verifiers for Self-Improvement
Full Transcript
Back around.
>> If the group of people that started OpenAI went back to 2015, 2016 and wanted to speedrush building chatbt, how fast could they do it? And uh what would
be the bottlenecks in not doing it even faster and uh what would what moves would that group play that that would be different than than what actually happened?
>> Yeah, I think if you wanted to make chatbt with a lot less compute, you could and we've seen uh things like nano GPT that uh that sort of do this. Um I
mean uh like sometimes it's easier to do to uh do something with more compute and but then by adding more clever tricks you can do it with uh less compute. Um,
I mean also I guess we could have scaled a lot faster um or or it would have been possible to to scale um if we knew if we knew that the returns uh would be what
they were.
>> Yeah, I think if you wanted to do it a lot earlier, you if you if you had the whole recipe in mind, you probably could build it a lot earlier. You you could like um put together a big cluster and
pre-train a model and then uh like given all the things we know now about post- training uh you can uh you can get a lot of um effective like you you can
effectively increase your compute a lot by doing post- training better. So even
if it takes like um like a GPD3 level model to create a good like viewshot prompted chat model uh you can um if you're willing to do a lot of fine-tuning um and and construct the
fine-tuning data set in a clever way you can do that you can get a much uh smaller model to be um quite good.
>> How many people do you think it would have required and what what year do you think it could have been done and how many GPUs?
Um I mean if we assume full hindsight I think full >> hindsight. Yeah.
>> hindsight. Yeah.
>> So nanohat um is just programmed by one person and it it runs on one box that took took him probably like half a year to to write. So that's at least an upper bound.
>> Obviously this is on like H100s and we would have had like a V100s or something earlier. Uh
earlier. Uh >> so but I think we could have worked together a few um GPU boxes. you could
have um gotten something that was chatbt 3.5 level maybe back in 2018 or 2019 with um with a couple people if you uh I
might be um underestimating all the different parts of the stack but uh I think I think you could uh yeah if you
had a few pe a few talented people um working for um a year or so with full hindsight I think you'd get something Actually, this is also building on um
like pre-training data sets and scrapes that other people have done. So, yeah,
I'm I haven't thought this through fully, but I' yeah, I'd say you could probably you could probably do something back in 2018 or 2019 with a few people that would get to um like GPD 3.5 level.
And maybe in the future we'll get get even more extreme and there will be like the demo scene uh chatbt that's like uh like one file that trains the whole thing and scrapes all scrapes the web
and does the whole thing in a day of training.
>> Well, so OpenAI is one of the biggest companies in the world now from a market capitalization standpoint and you know uh among technology companies maybe
maybe capex investment. Um, but I think it's easy to lose sight of um how informal and kind of like rag tag of a group it was early on. And curious if
you agree with that premise um that it really was a group that felt very uh scaled down informal, you know, maybe stuff felt less uh much less weighty in kind of 2016 2017. And then maybe to
illustrate uh just a picture, help us fill in a picture of what early open AI looked like. I'm curious, what was one
looked like. I'm curious, what was one false start that the group worked on?
Like a project that just was a complete dead end, didn't work, uh, and now doesn't really get talked about as much um, in 2025.
>> Yeah, I'd say early on it was a more um, it was sort of um, yeah, more rag tag, maybe even it was a little bit like an academic uh, group. Um, I mean there
were there were just a bunch of different research projects that people were working on uh sort of driven by their own taste and uh work people were working in groups of one two three
people on on some kind of research project that would turn into a paper or blog post. So I'd say the first um the
blog post. So I'd say the first um the first couple years of OpenAI um had a lot of that flavor. I mean there was also the idea of um uh big projects uh
and the idea that you could put together you you could um like compared to academia we could go a lot further by doing serious engineering and putting together bigger groups of people on a
project. So I'd say that um that idea
project. So I'd say that um that idea was was with us the whole time and uh like we we were also influenced by deep mind who had uh pioneered this way of
working to a large extent with uh projects like Alph Go. So we so that we had that in mind, but I'd say the company was a blend of these uh like
smaller research projects and and bigger projects um where the idea would be to put together a bunch of researchers and engineers. And I'd say I'd say not all
engineers. And I'd say I'd say not all the projects were successful. I mean
obviously a lot of research projects didn't go anywhere. Maybe even the the norm is for a project not to not to turn out to be um part of the the main branch
of Yeah. The tech tree. Yeah. Uh but uh
of Yeah. The tech tree. Yeah. Uh but uh I'd say there were yeah some of the bigger projects might not have been the most successful. Um like there was a
most successful. Um like there was a there's an early project um called Universe where the idea was to create lots of different RL environments and uh
build a whole data set of them and put them all together and uh the idea would be that you would if you joint trained on all of them uh you would generalize to other things and you'd get a general RL agent that was really good and we
were going to like collect lots of different video games and uh like web sort of web navigation tasks and put them all together And actually I think uh like the funny thing is uh I think it
was a deeply correct idea but it was just way too early like maybe even a decade too early and there were a lot of prerequisites that were missing at the time but at the time it ended up that people got together and built this
system and started doing experiments on it but the whole system was very unwieldy and it was sort of bad for RL experiments and since we were training models from scratch uh these models
didn't really generalize that well to anything else. So it ended up um at the
anything else. So it ended up um at the time being unsuccessful and um we ended up having more m getting more mileage
out of um scoped down version of this.
like I um like I ended up um leading the uh team working on uh reinforcement learning research uh for a few years and we were we we were still working on
these um collections of video game environments. But we instead of trying
environments. But we instead of trying to uh like just uh create this big data set of any uh like anything you could do in front of a computer, we focused on uh
like emulated video games. and that
ended up being much more uh favor favorable to work with. So that was one of the unsuccessful projects. I'd say
there were some other ones like uh robot robotics that ended up being somewhat of a dead end for the company but also being useful in the long run by just building up capacity to do big
engineering projects and research projects and like training a lot of people to do this kind of work. Do you
remember what the biggest engineering projects looked like before 2020 for OpenAI or just what the state of the research infrastructure was was overall?
Like maybe there's a particular system that sticks out as being one that was quite useful or quite complicated or caused lots of problems for researchers all the time. I guess there were a handful of these bigger research
projects like uh like the robotics um project and Dota was probably the the earliest really successful like big big project uh with a lot of compute. So
yeah, these projects ended up having um like being some combination of like system ML systems work where there was like some big code base and system that was built for it and then like a bunch
of research on like RL and a certain regime. So um
regime. So um >> and and the underlying engineering projects there would it be the like how you hook into Dota and actually take it over and programmatically control it or
like the uh the training infrastructure.
>> Yeah, there would be both the uh the environment infrastructure like how you hook into the like the software or how you build the the training environment and then there would also be uh like a a
training system which is ideally decoupled from that but usually not completely decoupled. that would involve
completely decoupled. that would involve like um like doing large scale like uh uh large scale rollouts and uh like training in parallel maybe async RL and
that sort of thing.
>> How would you describe the ideal of the perfect research manager? Um because
it's this weird role where more and more big science is being done in ML. there
are bigger and bigger teams working together uh and that requires more coordination and the the people that are being managed and need to work together
I assume are like a you know an big set of personalities um and yeah it's it's this new field um and so there's probably not that many people with technical specialty who also want to go
into management um and so yeah in your experience what works what doesn't uh in the role of a research manager or manager of managers and how would you describe the the ideal for that role.
>> Yeah, it's a tough question because I've seen people take very different approaches and uh and be successful with very different approaches and also I think the field is changing. So it could
be that it's a non-stationary problem and uh what might have worked uh seven or eight years ago is not the right approach now. Um, so I've seen um I've
approach now. Um, so I've seen um I've seen uh like one model where you have like a small group and uh the person in
charge of it is very hands-on and uh is writing a lot of code themsel and like reading all of the the code from all their reports and is is giving them very
detailed technical feedback. I I've also seen groups where uh there's a more hands-off manager who's uh sort of just um just sort of helping people with uh I
mean by being a sounding board, giving them career type advice as opposed to detailed technical advice, just keeping people uh happy and motivated and uh
like uh and just sort of letting people uh letting people do their own thing.
And um I mean I think both of these sorts of uh both of these modes of operation work in different places. If
you're doing more exploratory research and you have uh like fairly experienced people uh doing the um individual contributor work, I think it makes sense
to be more hands-off uh and um just let people uh let people do their thing and uh sometimes someone will uh discover something interesting and great and um
but if you're sort of more goal oriented and and also if you have less experienced people uh or or people who have I I guess if you're just trying to um
like execute on more specific things, I think it's it might make more sense to have a model where the manager is is more hands-on and is is sort of giving more technical oversight.
>> The term member of the technical staff, I assume um OpenAI borrowed it from Bell Labs. Um but I actually was was lucky
Labs. Um but I actually was was lucky enough to talk to an early research executive at OpenAI once and I asked him um how inspired were you by you know
past examples of successful industrial research labs like maybe Xerox Park Bell Labs and he said not at all like you know that was not at least for him he didn't look to them for inspiration at
all. Yeah. Um, what were these
all. Yeah. Um, what were these institutions places that people at Open talked about or like were there any groups that uh people took inspiration from or it was kind of just all figured out on the fly?
>> I'd say uh some people might have been inspired by um past research labs like in practice we were probably more we probably drew more from uh previous
places we had worked. So uh like our like most of us had some kind of lineage that involved uh grad school uh like working at Google brain or deep mind. I
mean I'd say those uh like I'd say almost everyone had worked at Google at some point. Uh so so we probably got a
some point. Uh so so we probably got a lot um we sort of were influenced by how they did things there. Um, I mean I' I'd
say uh I remember some discussions about um like um people would talk about like the the making of the atomic atomic bomb and
other so the Manhattan project and other institutions like that um but I don't remember a really deliberate effort to really analyze the previous uh like most successful research institutions and and
sort of build on build on their strengths >> and then I guess grad school early OpenAI, middle to late open AI, anthropic thinking machines, Google, maybe some subset of these, like how
would you characterize their differences? And what types of problems
differences? And what types of problems are those environments best suited to solve?
>> Um, and like, you know, maybe the popular conception of each of these places is uh Google's a place with great engineering, but it maybe is historically been a little slow moving.
>> Um, anthropic is more safety focused.
I'd imagine maybe thinking machines is a bit more product focused than some of these other environments. Um, but
anything that you think is maybe underd discussed about those research environments you've been in? Uh, and how what what problem uh is each especially
uh built to solve? Well,
>> yeah, that's a um like that's a pretty broad question and it's hard to like talk about all these different places.
Also, they they've all changed over time.
>> Perhaps uh early open AI versus thinking machines. Actually, I do see a lot of
machines. Actually, I do see a lot of similarities because I think uh there is a lot of um like there are there are several different things that people are working on in parallel and we're still
sort of shaping the vision of the company and uh the vision is going to sort of emerge out of um like uh seeing these different projects
take shape. It's also a different point
take shape. It's also a different point in the history of the field. So um like the field is moving really fast now and there are other uh like companies that
are moving quickly. So uh there's some like pressure to catch up to the um like current state-of-the-art alongside whatever um new things we want to do at thinking machines. Whereas at like in
thinking machines. Whereas at like in the early days of open AI the field was uh like there was um I mean obviously there was deep mind but it wasn't like
everyone was uh competing in some coherent uh like direction. There wasn't
like a clear direction to go in. Um
maybe there's uh like some idea that um like you wanted to um like scale up RL and make RL work better and discover the uh like better architectures
>> and and so forth, but I'd say it wasn't like there was a um there was some axis that everyone was trying to scale on and so so to some extent it was more like peace time uh in the early days of open
AI and that led to a lot more exploratory work and uh whereas I think a lot of companies that have started ed more recently are uh like more compelled
to be in catch-up mode for a while and and first to sort of uh replicate the state of the art. Um, so actually I've been try um like I've been uh pretty
aware of this and trying and and trying to um I I've definitely tried to make sure we're not just in catchup mode and we're also like building up a lot of muscle around doing exploratory research
and exploring new ideas that aren't necessarily along the main path of uh that the rest of the field is going in
because um I think it's easy to If you just if you're just in catch-up mode, it's harder to build up that like exploratory research muscle later
>> and the culture. It it takes a lot of uh like building the right culture is is hard to do later.
>> That makes sense.
>> Why aren't value functions popular in RL >> right now?
>> Yeah, I'd say they don't seem to help very much uh in the the settings where um that pe people are doing RL on right
now. So, for example, doing um uh like
now. So, for example, doing um uh like RL from human feedback and RL on uh on these verifiable rewards with a fairly
short um like time horizon. Um or or or actually um I don't want to just say um we're working on short time horizons now because if you're sampling tens of thousands of tokens, that's a pretty
long time horizon. But uh yeah, the current um the current set of um tasks that people are doing RL on um for some reason value functions don't seem to
help very much. And
um it's a little um yeah, it's hard to it's hard to know why. Um I mean, I'd say
why. Um I mean, I'd say uh I'd say value functions help a lot.
Value functions um give you variance reduction. That's the their main
reduction. That's the their main purpose. And uh so uh so for some reason
purpose. And uh so uh so for some reason you don't get that much variance reduction on this current uh like set of tasks. I don't think uh whereas you do
tasks. I don't think uh whereas you do get a lot better variance reduction on some other tasks the people have used um for RL research. So uh why that's true I
I couldn't say. Um I would expect value functions to make a comeback at some point though.
>> What's your best guess at how we solve continual learning? Um, and do you think
continual learning? Um, and do you think Laura will play a part in that?
>> Yeah, I mean I'd say continual learning uh might mean several different things like there I mean there are several different kinds of learning. Uh there's
um sort of like uh like to use more of a psychological analogy. There's like
psychological analogy. There's like motor learning and there's like episodic memory and and learning new knowledge
like uh procedural uh memory. Um, so I'd say there's different kind kinds of learning that might benefit from different things. I'd say um I'd expect
different things. I'd say um I'd expect um I'd expect the uh like in context um methods or context management to go to continue to get better and long context
abilities to continue being important.
I'd say Laura um I would expect Laura to stack um stack on top of that or basically like parameter uh fine-tuning to stack on top of that and uh be better for some kinds of memory than others
especially ones that require a lot of capacity and absorbing a lot of knowledge and maybe like it's hard to say exactly uh >> do you think we will need ideas other than putting the right things in the
context window and then a bit of uh parameter fine tuning on top to solve the problem of weep deploy these systems into the world we want them to learn new things on the fly.
>> It's hard to say what we need to what we need because um I'd say if we keep scaling models and making them better like all of whatever metrics we write down will continue to improve. On the
other hand um yeah so it could be that we'll get uh even if we don't change any of our methodologies and even if we don't do parameter fine-tuning uh we'll we'll eventually solve all these
problems. Though it's also likely that um there's some new ideas that'll solve those same problems faster and might uh give you a different scaling law where it's either like a different uh
um like an effect an increase in effective compute by either a fixed multiplier or maybe you have like a different slope of the scaling law um if you use a different method. So I could
see um yeah I could see other methods um giving you a much uh better uh scaling law or
faster like more effective uh continual learning. Um it's hard to say exactly
learning. Um it's hard to say exactly which um tasks will benefit the most from parameter fine-tuning, but I would expect um like it's a I would expect it
to help in a certain inter intermediate regime where uh like um you have more I would expect like in context learning to help in the in a very short horizon
regime and that to be like really hard to beat in a uh over a short time horizon, but I would expect um weight updates to win over a longer time horizon.
How worried are you about generalization being a blocker to having general AI um working useful ways across many areas of knowledge work?
>> Um are you worried that we're going to um do good pre-training work and that will get us only so far? Uh and then we will use RL. Uh RL will um work for the
domains and distributions in which we're RLing things on. there won't be great transfer.
>> Yeah, I'd say it's hard to speak clearly about how well models generalize uh and like what their sample effic what what
the um how their sample efficiency compares to humans because uh like in within context learning models can have very good sample efficiency that's uh on
par with uh humans or or better. Um and
then um but then it seems like um certain kinds of training requires a lot more uh data than um uh like what it
takes humans to learn the same thing.
So um I'd say um yeah I'd say there's there are some ways in which um models are a lot more brittle in humans but it's uh yeah it's
hard to give a really clean description of uh what those are. Um and I think it's a sort of um I think it's a human
start to um do better at longer time scales and uh yeah because we have been optimized by evolution to operate over an 80year like
time horizon. So we have there are a lot
time horizon. So we have there are a lot of sort of self-correction mechanisms that that occur. Um I mean people aren't perfect at uh self at like correcting
their errors but they're um pretty good at it. And people um if you give people
at it. And people um if you give people a goal and u motivation uh to pursue that goal they will uh like really they
will be um very uh like resourceful and try a lot of different things. So I'd
say um and like uh and models will tend to um models can also be uh very persistent and uh uh in some cases more persistent but uh
like also tend to get stuck um more easily over when doing larger chunks of work. So to what it's hard to uh to say
work. So to what it's hard to uh to say um whether this is just a temporary phenomenon and uh like the time horizons of models are about to increase radic
like drastically or um or whether this is just like some fundamental weakness that or whether this this will take a really long time to get up to the human time horizon. Um, and it's actually hard
time horizon. Um, and it's actually hard to figure that out because if if we're talking about like a time horizon of decades, then obviously it takes decades
to test out um like uh to evaluate the models for this in a world where it becomes much more popular to co- train models together. So maybe just you have
models together. So maybe just you have a general setup where you have a generator um trying to solve RL problems and maybe the way you assess rewards starts to involve models too. You're
maybe co-raining judges and generators together. Do you think any ideas from
together. Do you think any ideas from 2010's GANs or like any old ideas that have been forgotten about training models together will be useful or important um or things we we should take
inspiration from or not sure? Yeah, I
think um like co-raining generators and verifiers makes a lot of sense uh because you can uh like in in theory you get some kind of self-improvement
because uh if you have a model that's um like doing reasoning and instruction following as part of the verification process and you're using that to um provide learning signal to the
generative model. then uh like as the
generative model. then uh like as the model gets better at reasoning and following instructions, it also becomes a better verifier and you have somewhat of a virtuous cycle there. So uh I think
that makes a lot of sense. I also think um I'm pretty uh fond of ideas around like multi- aent training or games uh
like setting up uh games where you have either a zero sum game or a multiplayer game uh where um
uh yeah where you can sort of design the game so they um the equilibrium is is something very interesting and like games give you a lot of uh give you a
lot of nice properties like they give you an automatic curriculum uh like because uh like the u if you're a player in a game uh like if you're playing
against copies of yourself uh your opponents are getting better at the same as you get better. Um there are also sort of um they're sort of theoretical
CS uh flavored reasons why um why sort of uh setting up games uh uh like games
might be uh a good idea like uh it turns like there's these ideas around um uh like if you like there's these different complexity classes uh that are defined
in terms of uh like two players zero sum games where uh you have like a polinomial time uh judge and uh like two players and the equilibrium of this game
like solves a really hard problem. So
like using an inexpensive uh like a computationally cheap process you can create an incentive such that the equilibrium involves solving a very hard
problem and uh like there was some uh alignment literature about this idea. I
mean in particular the idea of the debate game uh which uh I think was pretty compelling and I I haven't seen um yeah there's been some uh some work on that idea but I I expect that to be
more and more import that kind of idea to become more and more important.
>> How do you personally use AI?
>> I use AI to for coding a lot. I mean I use yeah I use cursor a lot and uh cloud code and some other tools. Um
I uh I mean I spend a lot of time uh I have like uh chat windows open with different uh like uh with different models a lot and I ask them a lot of questions u multiple times a day
>> like the types of questions you go to Wikipedia for or um do you actually involve them in the research process at all?
>> I definitely use models for research a lot. I mean, I'd say uh I'd say if I
lot. I mean, I'd say uh I'd say if I have an idea now, I'll just um I'll just like fire off a bunch of questions to GPD5 Pro and uh have it do a bunch of literature searches for me. Or sometimes
if I have a vague idea, I'll like write a paragraph or two and uh just tell like tell the model to flesh it out a bit more. Um and I'd say um yeah, I find
more. Um and I'd say um yeah, I find this quite helpful and uh yeah, definitely the definitely the literature search ability is is extremely useful
because uh it used to take a lot longer to to find relevant literature and um I mean this this extends to also things like finding open source libraries uh
that like that's also much easier now.
Um so yeah I definitely use models a lot for uh like yeah finding relevant literature and sort of iterating on
ideas. Uh
ideas. Uh yeah I also use them for um like writing giving feedback on writing a lot like yeah I'll often have to um do most of
the um like uh most of the thinking myself. Uh, but then I'll I'll usually
myself. Uh, but then I'll I'll usually um the I'll use the chat models as like my uh like first uh first round of feedback. What does a day in your life
feedback. What does a day in your life look like when you're doing research? I
can imagine some people would guess that you know um to be as productive as you you know maybe working wall to wall staring at data talking with colleagues.
Uh there's also another version of the world where uh you know maybe you're someone who uh actually uh is working in very short concentrated bursts but is taking lots of time to walk around and
and think and generate ideas. Um but
yeah, what is a day in the life of of John Scholman look like when when doing research?
>> Actually, I come to uh coffee shops a lot. I like thinking at coffee shops
lot. I like thinking at coffee shops where there's sort of a uh there's just a buzz of activity around and I can just uh like sit with my coffee and the
notebook and just kind of jot down some ideas and remove uh distractions. So I
definitely do some of that where when I'm in more of the idea formation phase of projects or when I'm thinking more about what's the right thing to work on then at a certain point it becomes about
execution and at those those times I'm uh like if uh if we're in a project that's more in execution mode um I'll I'll just I'll be spending more time
either uh coding if I'm working on it myself or um I mean I spend a lot time actually just like reading docs and uh like messages that other pe people have
written or looking at their looking at their plots and and code. So I mean I'd say I spend a lot of time doing research advising now and and there I'm just uh usually like reviewing other people's
work. Do you think the skills someone
work. Do you think the skills someone would need to do effective research in 2019 or 2020 are the same as one needs now? Uh and in particular you wrote this
now? Uh and in particular you wrote this blog post in 2020 on how to do effective research. Um and curious if you have any
research. Um and curious if you have any updated recommendations for folks or if it's if broadly you think it stands the test of time.
>> Yeah. Um
so thinking back to that blog post I'd say um a few yeah I made a few points about different kinds of research uh like there's like you have goal-
directed research and more ideal keeping a research notebook and building up your taste by reading a lot of papers. I think all of that advice is
papers. I think all of that advice is mostly uh mostly still holds. Um so
yeah, I would still endorse that advice.
I I'd say um yeah, what's what's changed? I mean, I'd say actually I'd
changed? I mean, I'd say actually I'd say uh keeping a lab notebook is probably even more useful now now that we have LLMs. Like context is so important. So if you want to get good
important. So if you want to get good feedback on what you're doing, you can probably uh like paste your notebook into the LLM, get some get some uh some
feedback. Um yeah, I'd say probably um
feedback. Um yeah, I'd say probably um the biggest change I can think of is just that uh like now it is really useful to figure out how to incorporate
like uh LMS into your work and uh I mean that uh I so actually I haven't thought um I haven't thought hard enough about what's
the best way to like accelerate your research other than things that just generally accelerate your engineering.
And it's actually a little bit um yeah, it's a little bit not obvious. The
the advice might be different for research than other areas of software engineering because I do think uh there's a lot of value in like understanding every line of code and uh like having having something very simple
where you understand every line of code rather than just sort of uh like having the having the model just like uh write large amounts of code that you've never read. So I I think like this style of AI
read. So I I think like this style of AI assisted coding where you just have a model uh like write a big code write it the whole implementation for you might work well if uh in in other domains
where you kind of just want to define the spec and have the the model just write something that seems to satisfy that spec. I think for research there is
that spec. I think for research there is a lot of value in like knowing about exactly exactly what's going on in every line of code and the people uh I think the people who have done the best work
really have have that understanding of the whole uh like uh down all the way to the nuts and bolts probably since 2012
but it feels like especially since you know 2020 and uh uh the advent of scaling laws uh lots more researchers have entered the field of ML both in
academia but then also in industry. Um
but and and please disagree if you do uh it seems like the rate of consequential big idea generation has been fairly constant like the story of ML feels like
the actual progress has been so big in building useful systems but uh to build these systems there's the you know underlying few big ideas and then lots of details filled in to make them work
and then a whole graveyard of lots of ideas that just haven't worked. Um, but
how would you explain consequential idea generation being constant even though the number of researchers in the field has 10xed or 100xed? I feel like these questions about like quantifying the rate of scientific progress are always a
bit tricky because you have to think about lowhanging fruit being picked and uh and there's also it's also very hard
to um to really uh like uh measure the rate of progress in the recent past because uh you sort of don't know yet
which ideas are important. Um so
I um yeah I would be a little hesitant to conclude that like the progress the rate of progress is um constant and hasn't really accelerated even with the
large number of people. I mean I think uh I think if you look back to um papers
in uh like the like 70s ' 80s 90s um I think like the um the experimental rigor is lower. I think the standards
have increased inh certain ways for like uh definitely for the level of experimental rigor in terms of like uh trying out baseline methods and doing a lot of uh a
lot of different experiments on like different tasks. Like you might like in
different tasks. Like you might like in the old days you might have an RL paper that had like um like some very elaborate set of ideas and just one
experiment on a toy task that like uh was like very questionable. Um and uh like this could be a seminal paper. Um
so uh and a lot of the ma like the mathematical uh ideas weren't very sophisticated either. I wouldn't be
sophisticated either. I wouldn't be surprised if actually the rate of idea generation has increased a lot and like standards and sort of the level of um
like actually quality uh has increased in in certain ways as more people have entered the field.
>> Does that match your intuition? That's
>> I think that would be my intuition.
Yeah. Yeah.
>> And I also think um like there are a lot of problems with uh lot with the like academic like publishing system and reviewing and everything and it's it's
definitely like frustrating um to a lot of people who are part of this system.
But I also think um >> I also think it's it's not terrible because I think the uh like uh like the field is driven by a lot of uh like
objective improvements uh that people are seeing in reality like like it is um the field is driven by real goals and real problems. So I I think that grounds
it to a great extent. So even though you have a lot of problems and there is a lot of fake research I think overall the field seems to make progress. And
actually, as an aside, how do you think the academic publishing system compares to the internal coordination system of these large AI companies uh and their Slack channels around ideas? Uh, and do
you think that there's anything to borrow from how a thousand person research organization, one of these companies works that can then be transported into open academia?
>> Oh, yeah. That's uh that's really interesting. Um yeah, I'd say the um if
interesting. Um yeah, I'd say the um if you look at the internal uh like the presentation of internal results at one of these big research labs, I'd say I'd
say it's better in some ways and worse in other ways than the the publishing world. I'd say um it's better in terms
world. I'd say um it's better in terms of having a higher accuracy of like drawing real conclusions about things like what what improves pre-training.
I'd say much better like people have much better methodologies and uh like uh and like these experiments are much more driven in like real consequences as
opposed to just getting a paper published. So so the successful
published. So so the successful companies have gotten better at at sort of drawing accurate conclusions. Um I'd
say overall like no one uh is going to take the time to write a tech report in anything near the uh level of detail of anything published externally. like no
one writes really detailed tech reports.
I'd say even though the like the overall level of accuracy is is higher in terms of uh claims. Um, I'd say the um just
the thoroughess of experiments is usually less um in sort of internal research. like people aren't going to uh
research. like people aren't going to uh try the baselines try as many baselines or um I mean they'll uh I mean a lot of academic papers have sort of uh have
baselines that are nerfed in some way that you can't and you can't really trust the results but at least uh at least someone the best work actually does is actually quite thorough and and does a lot of good baseline uh
comparison. So yeah, I'd say um I'd say
comparison. So yeah, I'd say um I'd say overall there's like uh some yeah, the writeups are much more detailed in the outside world and uh the
uh and there's more thoroughess in certain ways, but also less uh accuracy.
Um and I would say overall um yeah, it would be nice. Um, I'd say at um I've been interested in like uh sort of at at
these institutions trying to uh improve the like research writing culture and uh and try to get people to write like how to get people to write u more detailed
tech reports that really like uh that go deeply into the science as opposed to just doing like the minimal thing needed to like find the recipe improvement that's shippable. And uh it's it's
that's shippable. And uh it's it's definitely been a bit of a challenge because it like the incentives of the companies usually are to do like really uh to build up a nice theory and and do really thorough science.
>> Yep.
>> How have the characters entering the field shifted in distribution um from what things looked like in you know 2015, 2016, 2017? Like were people
back then you know you on average more or less skilled? Um you know more you know worse or better engineers like uh more creative um is there a difference
in any way of like you know people attracted to doing this work back then uh to the set of people uh entering the field now? I mean, I'd say um
field now? I mean, I'd say um >> we >> Yeah, I'd say maybe people were a bit weirder back then because I think now um it's sort of more obvious. Uh it's yeah,
more obvious and um like conventional wisdom that uh like AI is uh AI is the most important thing going on and um so
it's going to attract a lot of people who have more like conventional career paths and um are sort of uh less risk
tolerant. Um, so
tolerant. Um, so I say um, yeah, I'd say the PE previously the people were a bit weirder. Now there's more conventional
weirder. Now there's more conventional um, or conventionally minded people. Um,
I'd say overall it's it's hard to compare the talent distributions, but just given the numbers, I would say the uh, yeah, I'd say the bar is high has gotten higher just because so many people are trying to get into the field
and applying for all these roles. Yeah,
I'd say probably um like engineering skill matters more now uh is more important now than it was before as opposed to like research uh taste and um
like the ability to do uh like exploratory research because I think now scaling things has driven so like so many improvements and there's so much lowhanging fruit just from sort of
scaling the simple ideas and executing on them well that I think people and also because the field has gotten more mature so you're usually building on you're not like writing a bunch uh some
code from scratch in a Jupyter notebook anymore. You're uh like you're building
anymore. You're uh like you're building on someone else's codebase and there's a lot of like good uh like uh infrastructure to build on. So yeah,
since um since you're integrating a lot with other people's code and um and tools uh like people who have more of a software engineering background have more of an advantage now.
>> What do you think the future of RL research looks like broadly? Um, and uh, over the last 10 years, it looks like a bunch of topics came in and out of focus.
>> Um, and then it looks like the thing that's publicly worked well on LMS has actually been fairly simple and was >> pretty close to the set of ideas that worked in other domains.
>> Um, do you think there's much more to do in RL research? Um, and
>> do you think that, uh, you know, the most capable RL LM systems in a couple years look very different than some of the old ideas? Yeah, like you said, I think some ideas go in and out of
fashion. Um, and uh sometimes they're uh
fashion. Um, and uh sometimes they're uh they um they become fashionable too early and then they they don't live up to their promises, but then they come
back later. So I expect that to happen a
back later. So I expect that to happen a bit more. Um so uh yeah it's hard to say
bit more. Um so uh yeah it's hard to say exactly what what the important ideas will be but I mean I think uh I think like offline RL is a pretty interesting
idea and set of ideas and and it's sort of um as a yeah to some extent um what we're doing in the LLM world now is like
uh sim tore or what in robotics they call sim toreal where you build a bunch of uh like simulated environments and you try to do uh you try do RL on them at at scale and then like you try to
random have enough diversity that you'll generalize to the real world and uh I think uh like actually I think sim to real is still yielding uh yielding good
results in robotics so it's not like sim tore uh has been sort of uh discredited there I think I think it is a very effective technique but I think there's also a lot of value in learning from the
real world so I expect that to um that to some uh like at some we come back to the LLM world where uh we figure out how to learn from real like uh sort of
deployment. If some of the biggest AI
deployment. If some of the biggest AI labs um did develop really really really powerful AI systems to the point where it would be pretty important to
coordinate with uh both with each other and also with you know other society important institutions like governments.
How confident are you that they would coordinate well if that were going to be necessary for, you know, the future of AI to go well? And how worried are you that that they wouldn't get along and that they wouldn't coordinate well?
>> I feel sort of medium worried or confident. Uh I'd say, um I'd say there
confident. Uh I'd say, um I'd say there is like a reasonable amount of uh commonality of viewpoint and vision between like the leading AI labs. Um,
so, um, I could see them, uh, and also there's been, uh, some collaboration between the labs recently, um, on safety
things. Um, I'd say, uh, I'd say there's
things. Um, I'd say, uh, I'd say there's definitely some, uh, like bad blood.
There's a little bit of bad blood between uh, like the personalities involved, so that that might uh, make it a little harder. But um
>> yeah, I could see it um I could see it working out if if um if it became sort of more clear that this is the the right thing to do.
>> So with uh this being a time where technology is getting better fast, um there's lots of talk about predictions of how fast the technology uh is going to get better uh and in particular lots
of people talk about, you know, what's the estimate uh when do they think AGI is going to happen? It's, you know, common topic. um uh where most people
common topic. um uh where most people when they're talking about API are talking about something like um all knowledge work on computers being done
by AIS instead of humans. Um and
one way in which um I've tried to reason about this in the past is AGI seems like a big complicated engineering and research project and at least in my personal experience engineers and
researchers are kind of abysmal at estimating for much smaller projects when they're going to finish them. Um,
and you know, I think uh in particular like the the systematic bias I've seen from from engineers is they will always assume that they're going to finish something much earlier than they actually do. And you know, maybe like uh
actually do. And you know, maybe like uh the rule I apply is like apply like a constant 3x factor on top of their their predictions to get something closer to the actual uh time it's going to take them to finish something. Uh do you
think that this is uh like a reasonable critique of most people's AGI timelines?
um and they will, you know, underestimate how long it's going to take because researchers and engineers generally underestimate how long things are going to take. Uh and then in your personal experience, you know, the researchers that you've been around, have they and engineers who've been
around, have they been good about estimating project timelines?
>> Yeah, I agree that there's a consistent bias to underestimate timelines and yeah, maybe it's a two or 3x in the good
in the good case. uh I'd say um and um yeah using that uh sort of huristic um I think it is reasonable to uh to predict
that uh like AGI will be a little further out than um people's uh timelines predict when they're using um sort of uh they have like a definite um
prediction for how it'll play out. Um
and um I guess we've seen um this in other the most analogous um problem is maybe self-driving cars and uh we've seen um we've seen this take longer than
people expected uh to sort of get to the um yeah to to get to full autonomy and get to like robo taxis and everything.
Um so yeah, I think that's a reasonable hypothesis. I think um I mean the whole
hypothesis. I think um I mean the whole on the other hand there is this positive feedback loop where AI accelerates its own development that's also also probably going to defy intuition. So I
think uh people people who are incorporating that effect are um coming up with um like pretty short timelines and I also think that is somewhat compelling of a line of reasoning. So,
um I I think yeah, there's a lot of uncertainty about like how much uplift people get from AI and if there's like bottlenecks around humans understanding what's going on and so forth. Um so
yeah, I wouldn't make a really confident prediction either way.
>> So you and Thinking Machines have released Tinker. Uh what is it and who
released Tinker. Uh what is it and who is it for? Tinker is a um it's a low-level fine-tuning API that basically
uh um it it gives you a small set of um low-level primitives to do training and sampling uh that lets you express uh like um almost all um post-training
algorithms that you might want um while not having to worry about um like uh GPUs or or accelerators um and not having to worry about like a lot of um
like distributed um like distributed systems issues you otherwise would. Uh so it's sort of
otherwise would. Uh so it's sort of trying to um it's finding something that we think is like a good uh layer to abstract away and and building that as a
service like people don't usually use um like services for ML training uh and uh the services that exist are much more high level um and so it's kind of novel
in in that it's this uh yeah it's a service that uh is around this lower level primitive like the closest analogy would would actually be uh like the sampling APIs that you can get from
OpenAI and Enthropic and so forth where you uh like you don't have to spin up your own um like GPU box to do sampling.
You you just uh you just do an API call from uh Python or JavaScript or whatever. Tinker lets you basically
whatever. Tinker lets you basically write a lot of um like training code um by writing uh just writing some Python scripts and and having that just work
rather and not and not having to worry about installing a bunch of uh installing a bunch of stuff to run on GPUs.
>> Is your ambition that the next thinking machines that get started by a group of researchers would would actually just build on top of Tinker? Yeah, I would hope that a lot of companies uh would be able to just build on Tinker instead of
developing their own infrastructure, build really uh sophisticated custom models just building on top of Tinker and the scale. Oh yeah, and to answer the question you asked before like who's
it for? I'd say uh like right now it's
it for? I'd say uh like right now it's for people who are um are pretty sophisticated in um like in terms of their their knowledge of ML and uh who who uh like who want to use these
low-level primitives. I mean we we uh
low-level primitives. I mean we we uh shipped a lot of open source code that goes along with Tinker so you don't have to write all the training algorithms yourself but I think I think Tinker is best for people who do want to look at the details and dig into those details.
Uh but I'd say over time we're going to make it more and more uh user friendly and build a lot of like tooling and sort of higher level components on top of
Tinker so that you um so that it becomes more of a like a like a full full stack thing that you don't have to be an expert to use it and you can you can just come in with your understanding of
the business problem you want to solve or the the like the uh the spec of the model you want to build and and uh like the software we we ship will do that for you. And then uh what should we expect
you. And then uh what should we expect from thinking machines in the next in the next year or so? Anything you can share publicly? You'll see some things
share publicly? You'll see some things uh coming out with our own models that's uh um sometime uh like next year and um
also Tinker expect us to keep improving Tinker uh to add a lot more models functionality like multimodal uh train like various kinds of multimodal input
and output uh like really sca um scaling up the sizes of uh jobs you can do with Tinker.
>> Thank you John. It's been fun. Thanks
for having me.
Loading video analysis...