OpenAI’s Chief Scientist on Continual Learning Hype, RL Beyond Code, & Future Alignment Directions

By Unsupervised Learning: Redpoint's AI Podcast

Summary

Topics Covered

Math as AI's northstar benchmark
AI proving novel insights in mathematics
Chain-of-thought is alignment's new window
Private reasoning preserves alignment insights
Societies unprepared for automated intellectual work

Full Transcript

I definitely agree that continual learning is really the thing. It's

really the thing that we're building.

But I don't really think this is like a problem that's ignored and and off the path of what we're doing currently. I

think it is what we're working toward.

What are like the other research areas within alignment that you're paying attention to or that you think are promising?

A lot of the like longerterm challenge with alignment is about generalization.

What are the values that the model falls back on?

What are the things that you need to figure out to be able to really make models work well in some of these other spaces?

I come back to this.

Akopi is the chief scientist of OpenAI.

I think literally one of the most important people on the planet. And

today on Unsupervised Learning, I got to ask him literally everything that I've been thinking about and I know a bunch of people in the ecosystem have too. We

talked a lot about model progress, what's required to make longrunning agents work, as well as the really interesting work Open AI has done in the AI for science world and the progress he sees in that over the next years. We

talked a lot about how companies should be thinking about model building in this moment, when they should be doing reinforcement learning, how they should be thinking about the evolution of harnesses and the impact that will have.

We hit on a lot of his really interesting research, including the work he's done around alignment, the work that OpenAI broadly has done around math competitions. And we also talked about

competitions. And we also talked about this focusing moment in OpenAI and what it means for the research organization and how he runs his team. literally just

such an awesome opportunity to talk to someone who is driving so much of the change that has revolutionized this space in the world. I hope folks enjoy this wide-ranging conversation as much

as I did.

I feel like you are the perfect person to talk to about all the questions everyone has in the ecosystem. Uh what's

you know happening with model progress.

A lot of companies are thinking about how they should be building things based on what's happening with the models. A

lot of people at a societal level are thinking about the impact AI is going to have on science and broader society. Uh

and you've been at the forefront of the space for pretty much every generation of uh of improvement uh these past years and so really excited to have you on the podcast.

Happy to be here.

I think I'll start with one of the mo the juiciest things you said which is you know four months ago I think you and the open team talked about aiming for a system with research level intern

capabilities by September of this year.

So coming up uh I think that's uh what 6 months from now. and then a more fully automated AI researcher by March 2028.

And so I guess you know checking in four months later, how are you feeling about those timelines?

Yeah, I think you know over I think over over the last months I think like the change that really happened is we've seen this explosive growth of coding tools.

Yeah.

Um it's an understatement. Yeah, we've

definitely like really kind of gone um to a place uh in OpenAI where we use Codex for the um for the majority of um you know actual coding. Um and so I

think I think for most people like the kind of the act of programming has has has changed quite a bit. Um

so I definitely see this as a signal that like you know something here is on track. The other kind of like very

track. The other kind of like very interesting update over the last few months to me has been the progress on the math research capabilities. Uh also

the results we've kind of seen in physics in other fields. I think I think this kind of level of capability this level of like ability to provide insight when combined with

ability to access infrastructure ability to use maybe uh more computed test time that's something that cod is using currently uh and very strong improvement in general intelligence which I also

expect over over the next couple of months. Yeah, it's something we're still

months. Yeah, it's something we're still very much planning for and very focused on.

And how do you like know when you've you've gotten there? like what's like a a workflow you might look to to say hey okay I think we've got these you know research intern level capabilities the the way I would distinguish you know

a research intern from from full automated researcher uh is um the kind of span of time that that we would have it work um mostly autonomously or the

kind of like specificity of the task that has to be given so I don't expect uh you know we'll have systems where you kind of just tell them oh like you know go improve your model capability go

solve align ignment uh and you know and they will do it not this year you know I think we might get there at some point uh but I think for like more specific technical ideas like I I have this particular idea how to improve the

models how to like you know run this evaluation differently I think I think we have the pieces that we mostly just need to put together Carpathy released you know a pretty viral version of of uh

using some of these models to you know improve some of his uh you know obviously way less complex models than what you guys are building here but did that feel like generally in this uh you know in the spirit of of uh some of what

these tools might look like.

Yeah, I think it's in the spirit. Yeah,

I mean I I I expect it to look like a pretty continual evolution uh from kind of where Codex is now. I think towards a bit more autonomy uh running for a

longer time. Um but yeah, I I think I

longer time. Um but yeah, I I think I think we'll see a lot of this sort of application. I think in general we'll

application. I think in general we'll see we'll see like more autonomous and higher compute use of these models for different things. you mentioned kind of

different things. you mentioned kind of like the math and physics side and obviously you've had these really impressive breakthroughs uh in math on you know uh some interesting like different kinds of competition uh you

know problems maybe you know I think for our listeners it like intuitively makes sense how progress in coding directly translates to something like you know helping with AI research how does like math and physics progress like also tie

into this the the biggest role that like u you know focusing on these math benchmarks has played for us as as a general yeah like benchmark and and and and a

northstar for like how to improve this technology. Like math is very

technology. Like math is very measurable, right? It's much easier to

measurable, right? It's much easier to tell whether you've actually solved the math problem than whether you've even like produced a good uh you know piece of software and also it can get very hard right so you can have things where like it's very definite whether you've

solved them but it can be like arbitrarily pretty much hard to to actually solve them. You know, I would say like up until not too long ago like um you know, my perspective has been like well okay like we you know our

models are not you know maybe able to solve like simple math problems. Okay, our models are able to solve simple enough problems but are not able to solve like IMO level problem. So clearly

there is just like a gap in just like this uh you know intelligence of these models that like that that is very measurable very you know very easy to run at. It's very

clear what we need to do and you know and this has be kind of our northstar for like reasoning models and so forth.

Now of course um that is changing quite a bit right and we are um you know we have kind of reached these milestones that we've been working towards of like

yeah IMO goals level solving IMO problem six and you know and making forests into research level mathematics um and you know from this point I think

I think there still is uh you know there definitely still is utility like continuing to measure progress on this I think there's also like you know there's definitely like transfer that that you can get from like getting better at

mathematical reasoning to getting better at AI research. You know, a lot of our uh best researchers uh are uh you know mathematicians we're training or from other kind of theoretical fields. But

definitely we are uh you know we are very much uh changing how we think about you know these nerf stars and we are

very focused on how the models the next models that we're producing are actually useful in the real world you know useful you know especially for a research but also for other kind of economically

valuable activities and for other uh fields of science uh and especially maybe more applied sciences. And the

reason for this shift is because we believe the models are now capable enough, not as smart as people and always, but capable enough to actually materially change the economy, change

how things are done. And so, uh, yeah, we feel a lot of urgency about that.

In the early days, uh, picking a domain like math that is so, uh, hard to solve, but then easily to verify whether you did it, like it's kind of the the perfect place to get started. And I

think code obviously shares a lot of attributes to that. You know, uh possible to check uh and verify and great for reinforcement learning. I

think one question that a lot of people are are thinking about is okay, we've seen reinforcement learning work incredibly well in these domains where you can verify it rather easily. A lot

of, you know, valuable tasks in the world, medicine, law, finance, you know, there's some level of of the ability to do that, but it's certainly not to the same extent that math and code are. And

so I think a lot of people are trying to figure out, you know, are we going to see similar improvements? You know,

obviously code and math the the rates of improvement have been so astronomical and shocking.

Yeah, I definitely expect so. Um I think an interesting duality that we think about a lot is um you know for this more general task for these tasks are kind of

harder to evaluate. They share a lot lot of common uh commonalities with um just longer horizon tasks, right? Because if

you think about even like a very well specified math or coding problem again like if it's it's something that you need to work on for like a year then uh you know even it's very clear what the criteria of success are in the long term

like what to do on your first day of working on it is a pretty open-ended problem. Yeah. And so I I kind of

problem. Yeah. And so I I kind of believe this these difficulties coincide and they're very clearly the next the next frontier uh for for how these systems develop.

And I think we've definitely seen very encouraging signs both on just like our ability to scale RL on these more general domains. And I I think also like

general domains. And I I think also like we can we can scale um efforts that that that that that's a lot of promise.

In these other domains, it feels like one of the hardest things to know is just what was success in a task, right?

And you can imagine you know there's going to be you know whatever the problems you are that are facing code of math that are short-term tasks and then longerterm tasks feels it will be amplified in the space that is you know outside of those right where a

short-term uh legal task or medical task may be harder to run thousands of iterations on right and figure out you know was that done correctly and then those longer term tasks like even harder I'm curious like how you even

conceptualize that research challenge like what are the things that need to be that you need to figure out to be able to to really make models work well in some of these other spaces.

Yeah, I think I think I I come back to this reality of like how do we make the models work for a very long time and how do we teach them to evaluate kind of partial progress.

Yeah. I mean I think if if you look at like even outside of RL like like where that sort of progress on longer horizons is coming from right like I mean as the models kind of become more consistent

from just like pure supervision in pre-training um they uh they gain some idea of like you know oh what what does like a good partial artifact here look like and so I

think I think even if we weren't like scaling RL very meaningfully we would see an alongation of these horizons over time yeah it's definitely um you know a

research challenge to like to figure out how to like leverage this new ideas from RL and so forth to to apply this to general domains. But I'm quite

general domains. But I'm quite optimistic about that.

Yeah. And it's interesting. It sounds

like part of your mental model is like the models themselves being able to check progress with some at some sort of cadence that is, you know, reliable enough from the outside at least. It's

not totally clear if we've seen like generalization in RL yet. feels like we yeah clearly you seem to have some techniques that really optimize models around whatever we choose to focus on but it's like almost feels like an older

school version of of ML of like one one thing at a time is that like you know I guess would you agree with that characterization and like you know how do you kind of see this this current climate

well we are buying a lot of compute right because we we don't I mean we still believe a bit less and we believe you know more than ever to some degree

yeah we've seen you know, new techniques and I think new ways to scale, but like that that is kind of the the lens through which we've been viewing things.

Yeah, I think there is a certain amount of complexity that we needs to grapple with and kind of

everyone needs to grapple with because, you know, we're no longer really like purely building like um um you know, brain the sky that's completely isolated

from the real world, right? Like if you actually you know if you want this model to do like medical research if you want it to cure cancer at some point it needs to like learn about the real world is a meaningful way you know maybe conduct

some experiment and learn from its results and for that you you need to figure out how to actually connect it right and that is going to involve something that is yeah that that goes in the direction you described but I I

don't think that goes counter to actually scaling the the like finding and scaling the simple algorithms that that we've been developing. I feel like I talk to a lot of companies and like I

one of the main questions everyone seems to be asking these days is like should we be doing you know our own reinforcement learning like take an open source model and like we have some data

on a task that people do. um we have evals cuz we know our domain pretty well like is this something that makes sense for us to do or like should we just wait for the models to continue to get better at at some of these things. you know

what advice would you guess would you give for like the many builders that listen to the podcast as they think through you know uh the extent to which they invest on the on the reinforcement learning side reinforcement learning definitely can be a very data efficient

way to like really improve the model as some sort of task right there is a much more data efficient way of learning that we know right which is like learning in context right and this is maybe the most fundamental way that people you know

teach these models you just prompt them with like examples with with with instructions for what you want I expect that learning is going to get much better over time. And so I think it definitely really matters that the

models can adapt to your context. They

can adapt adapt to kind of the the kind of tasks you care about. So I think that will be very important. I'm not sure if like you know replicating the kind of current a pipeline is going to be like

the right way to go about it. But yeah,

it's definitely a problem that that we're thinking about.

Yeah. So it's almost like yeah you still have to do the work like you still should you know figure out what the eval are that matter gather the data the examples but like it may just turn out in the future you're far better off just feeding that into this context than trying to like do anything on on you

know your own model. Yeah, I think I think that's quite plausible. And I

think that like you know obviously people have seen the success of of tools like Codex which I know you know you've obviously been a key part of and um and wondered like you know hey do we need to build like our own kind of you know

should we build our own harnesses or our own ways of of using these things or you know uh for for our own domains whether it's like you know uh legal or finance or or healthcare or do we kind of just

like take the harnesses that the large models do um and and kind of use them within you know with with the context that we have. uh any any thoughts around like that like the implementation of the harness

shouldn't really be a limitation for a very long time. I think we'll be able to get like much more general harnesses that people can use for uh for all sorts of other domains. I mean I think codex is pretty good actually if you try using

it for things beyond coding.

That's so interesting. Like a much more general harness being something that's almost like uh adaptive to or like just works across whatever the you know specific set of tools you have in your domain or specific set of things you

want to expose to the model.

Yeah. I mean I I think and you know I think it's also worth thinking about like you know why like you know what what what is kind of the kind of ultimate interface that we want to interact to the model with. So, so the

model gives some the models gives some UI hard forensicness, right? They can

build their own UIs. They can kind of do things that uh you know people would find very timeconuming. Um but I yeah I definitely think there is also just like a lot of space to kind of enable the

models to access like the current interfaces that we use for for people right. So I think like we want to have

right. So I think like we want to have um um you know AIs on Slack for example or that that are kind of plugged into our our context and uh and yeah and are able

to to learn from it and a able to kind of yeah to realize this existing things right so definitely like there is some meet in the middle here but definitely I believe like longterm like uh you know

like by default the AI should kind of meet you where where you are uh and if Not that would be because it kind of it has new abilities, not because it has limitations.

Yeah, it's an interesting point that basically today it feels like these harnesses are so bespoke to certain environments, but like over time as you add more and more skills and tools and models can navigate uh across those effectively, it's like there just be a

general like you know the way humans have uh that that makes a tremendous amount of sense. I guess I'm curious like you know you uh obviously I'm I'm sure like every day you see kind of

crazy stuff on the research side at this point like what are the milestones that are like still meaningful to you as you think about like it would be pretty crazy if I you know uh did a run one day

and saw like X or Y like what are the things you're paying most attention to?

Yeah. Um I mean at this point it really is about um research right like is it about it is about can the model discover new things

can it execute on like a longer horizon um research problem.

It's almost like looking for some sort of insight that you're like oh someone on my team had come up with that that would I've been pretty intrigued by Yeah, we we've actually had like some minor uh um but I think I think quite

impactful ideas uh come from uh even like GPT 5.2 Pro uh that that we're using entirely. But you know, I think

using entirely. But you know, I think it's still very very small compared to where I expect it to be.

Yeah, I mean it seems like almost inevitably like these models are going to get better. They will be used in research. They'll be used in science

research. They'll be used in science more generally. You're like one of the

more generally. You're like one of the first people interacting directly with these models as like research partners almost at this stage. anything like

you've learned around the right way to do that or do you think about like what a research organization you know as these models continue to get better might look like? Yeah, I I I think we're

definitely kind of at um at a transition point where kind of the shortterm immediate quality of the model uh is about to be a quite determining factor

for the pace of our research progress because the models are going to drive a lot of that. And so that definitely requires um you know rewiring some intuitions about how to um run a

research organization. Uh you know

research organization. Uh you know normally you kind of try to not be too focused on like immediate quality. you

try to be much more focused on like the longer term. I think we have like a lot

longer term. I think we have like a lot of very exciting uh stuff queued up that we are kind of working towards but I feel a lot of urgency to kind of yes to actually

u execute on it and to actually use this advances in model intelligence to um accelerate research on the AI and especially AI alignment. Yeah, it's such a fascinating point because I've heard you talk before about running a research

organization and I feel like in the past it was like giving people the space to, you know, pursue a lot of things that weren't like directly, you know, hey, this is for a month or two months of progress, but it's like what are the ideas that are really going to drive

things forward, but it makes total sense that we're in a time now where uh you're like, look, everything we do will be so much better if we just focus on this in the in the short term and make it better. It must be like fascinating to

better. It must be like fascinating to navigate uh that and like these maybe further off research ideas at the same time and like running an organization.

Yeah. Yeah. It's definitely Yeah, it's definitely something we we spend a lot of time on with Mark nowadays. Yeah.

Right now you have um you know a a ton of compute as a company, but you obviously you have great scaling laws on the pre-training side, you have great scaling on the RL side, you have probably lots of experiments going on that have nothing to do with either of

those vectors, but are like interesting new ways. How do you even think about

new ways. How do you even think about like allocating compute across all of this stuff?

Yeah, it can get very complicated, right? Because there's so many things

right? Because there's so many things that we need to do. One thing we've been one kind of discipline we've started keeping is we um we try to make sure we just like explicitly budget like a large chunk of our compute to the most

scalable methods to the things that we believe are the most responsible for driving general model intelligence. And

you know even if it's not the most efficient allocation of comput at all times because you know if you're allocating so much compute to like one experiment or like one set of experiments you know there's so many things you can accelerate a little bit

of that compute elsewhere. Uh but you know but I think it's easy to kind of like with all the all all the all the interesting and important things that we're doing I think it'll be very easy to kind of partner all of it and like

not not really end up doing the things that we believe are most important. You

definitely want to like understand the kind of empirical evidence. You

definitely want to make sure your evaluations are in order and the kind of experimental rigor is there. And then

you also want to apply some regularization based on like okay do we understand this method? Do we actually expect it will scale? Do we expect this is something you can actually build on in the future? Is this kind of a one-off? Right. And I think and based on

one-off? Right. And I think and based on that uh determine the priority.

Yeah, it's so interesting. probably find

all the yeah ways that you like know you could improve things but they feel maybe like uh off off a little bit to the side of where you think the overall arc of progress is and so you end up leaving some of these like lowhanging fruits to some extent because really the most

important thing is finding the future direction and then the scaling within that and uh devoting compute toward that obviously the the place where we talked about codeex a lot and and the success of coding and it feels like you know

last year was like the year of just incredible hill climbing on on coding I'm curious you know obviously Codex has been a super successful product in many ways like anthropic was kind of first to this market you know claude code you

know was it was a dominant product there what do you kind of like you know reflecting on that I guess like what do you make of the success anthropics had in this space yeah I think I think it's a matter of you know really focusing your product

direction or on where where you believe the kind of the the next application of the technology is right and um you know if you look at the kind of priorization

we've had on the on our product right I mean we have been right like working on on cutting products but they have kind of been like a secondary thing right compared to like our main priorities and

the interesting thing is that is not very reflective of like the priorities of the research organization within open AI uh I think you know given that like we've kind of had this you know

explosive success of charg you know charging as it was you know I I think charging quite a bit and it's going to evolve quite a bit but as it was in 23 right is this particular you know product that's

maybe not, you know, I think it's definitely quite aligned with our vision of like where AI is going, but but like it's not really like the like representative of like everything that

that that that it enables. And so the majority of like our work in research has been focused on like that that future thing. And I think increasingly

future thing. And I think increasingly it has decoupled from our our our kind of like short-term product strategies, right? Yeah. I'm very kind of um

right? Yeah. I'm very kind of um confident about um the things we've been building and the things we we we are building on on on the research on the

model intelligence side. You know, a lot of our our rep refriation and increased focus on the on the product side is about actually kind of getting to deploy them and the belief that actually they are uh the thing that really matters now.

Yeah. And now it feels like you know the uh clearly the whole company priority you know is so locked in and focused on this and you've seen just incredible improvement in codecs in recent months for all the developers that listen to

the podcast like if again it's almost like hard to comprehend like what the world looks like as these models keep hole climbing on longer and longer tasks like what do you think will look different in their lives or like how

will they be using codecs in you know three six months. I realize 3 months and six months are very different timelines in this world, but take whichever uh whatever in between point you'd like.

I would expect um just a a gradual increase in just the level of autonomy uh you feel comfortable uh foring the model just the the fagness of description that can work with you know

the level of supervision it needs. I

think we're not very far for models that can work autonomously for a couple days.

Um maybe use quite a bit more computer than they're using now and produce much higher quality artifacts on their own.

Do you have a gut instinct on like what like you know there's always been this question of like will the world you know do you need that software engineering skill set to supervise these models running for a few days or like hey does it turn out at some point of like being

able to run for a while you know anybody can can use coding agents and supervise them to to some sort of output. I mean I think definitely for like a lot of outputs you already don't need much

experience right I think I think still the distinction I would draw between like you know an intern here and like really an autonomous researcher software engineer would be that like if you want to build something bigger like you know

you probably still want to apply supervision you still kind of want to have like an overarching thing you want to recognize like what what what building blocks fit in and what which don't but yeah I definitely expect that

like that desired skill set uh to shift quite a bit over Yeah, towards towards this like more general uh vision setting.

You know, I guess on on the on the research side, I feel like there's been uh you know, maybe maybe like a month ago, I feel like all anyone could talk about was continual learning and there's just you know, it was in the Zeitgeist.

There's all these neolabs starting to go focus on continual learning. Some folks

left OpenAI to go focus on that. Um I'm

curious like you know I think it part maybe part behind that is a belief that like you know uh RL alone you know either won't get us there or will get us to like some level of very inefficient scaling and it's kind of different than

the way you know humans learn. I think

even I've heard you say before like that you know RL is still very different today than the way that humans learn.

What's your take on on like that you know that whole movement?

Yeah, I I am a little bit confused by it because you know in my mind like the whole kind of like excitement that like we've had I mean even even if you

look at the titles of like the GPT uh you know three paper right like it is that like oh you know this class of models is actually capable of continue learning right it's capable of like

learning uh um learning to learn in context right that has been really you know the driving force behind the kind of excitement to like scale these GPD

models further. That has been like the

models further. That has been like the premise for why we really need to teach them with RL like learn in context more efficiently. And so I definitely agree

efficiently. And so I definitely agree that continual learning is really the thing, right? Like it's really the thing

thing, right? Like it's really the thing that we're building, but I I don't really think this is like a problem that's like, oh, you know, it's kind of ignored and off the path of what we're doing currently. I think it is what

doing currently. I think it is what we're working towards.

Yeah. Like in your mind, this is like the single best path to get there is to continue to kind of scale uh the pre-training in RL. I think that is kind of how we've made the most progress on this problem so far and you know I think

there are I think that there definitely are like more ideas more steps um I think also a lot of improvements that will just come from scale yeah and I guess like you know we have a lot of folks listening that maybe have

you know have been able to do a lot of simpler things with these models and then they try to do like some of these more complex you know I don't know call it 100 step or longer term tasks and they're like oh you know the the models don't work for this yet and I think it's

harder you on the inside constantly feel this improvement but for them it feels like hey this is like night and day away from you know being able to do this much longer thing. How do you kind of

longer thing. How do you kind of articulate to them I guess the set of things that need to be true for these like much longer steps to happen. Is it

around kind of checking in more often as you were talking about before or I feel like there's just this belief uh among the research community of like oh all of these tasks will be solved in the next year or two and then in the wild a lot

of people maybe not totally groing that like improvement line that we've been seeing.

Yeah. I mean I think a lot of that prediction comes from just looking at like historical improvement lines, right? And but I think increasingly we

right? And but I think increasingly we can we can roughly see the the the the shape here. I do think a lot of this is

shape here. I do think a lot of this is about just the models becoming intelligent enough to recognize like whether you know they're making progress. Um I think some of this is

progress. Um I think some of this is like yeah this very kind of pragmatic work of like are the models actually you know can they actually access you know all the context all the files all

the infrastructure they need to do the work you want them to do which yeah I remember like in the past when we were discussing you know the kind of the the road map uh that we're taking with RL

you know I definitely view like okay we just need to teach the model to kind of reason with its own tokens as kind of the priority and then of course we'll need it to use tools like the environment, you know, at some point we definitely need to teach it to see,

right? At some point, we need to teach

right? At some point, we need to teach it to use a physical body, right? Like,

but like uh yeah, I mean, I think we're definitely like well into the stage where, you know, really needs to like interact with the environment and it really needs to see uh and you know, someday soon we'll we'll really cover about robots, but yeah.

Yeah. I mean, it does feel like a lot of the times when I hear people complain about, oh, a model can't do X or Y, it's like literally just because you haven't fed, you know, or connected it to systems or fed enough context into it.

Actually, I do wonder if like context was universally applicable and able to flow into these things. Like I feel like a lot of these problems would actually just be solved with today's models. You

know, I want to talk about some of the AI for science stuff um that you guys have been working on. And one thing in particular, you know, I feel like the coding stuff is something that everyone feels very viscerally um you know, in every company they're using these tools

and getting tons of productivity. You

know, on the math side, not all of us competed in in in IMO competitions and uh necessarily have as much of like an intuitive feel for some of these breakthroughs. And so one of them I know

breakthroughs. And so one of them I know that was really interesting that you guys did is you use some compelling work around like first proof, right? And I

think these are like very different problems than kind of traditional competition math. I wonder if you could

competition math. I wonder if you could just speak a little bit to that because I think it's just a space that our listeners might be less familiar with and kind of less familiar with understanding the implications of models being able to do pretty cool work here.

Yeah, I mean you know I think yeah I I was very excited with the first proof challenge and you know again like I I kind of you particular one is kind of a benchmark right it's like a couple you know respected mathematicians theoretical computer scientists

releasing problems that like they believe are like representative of their day-to-day work but haven't been published anywhere so that we can really have our models take a crack. We were so excited about this challenge, but you

know, it was kind of dropped um without any any any advanced warning um with like a week-l long deadline to actually execute. Um we

had a we had a very exciting model training uh at the time. And so uh um uh um one of the people in charge of

training James Lee kind of started prompting the uh that model just um by hand and and and and uh and yeah and actually kind of seeing

oh okay it's actually solving these problems was really a fascinating things to see. uh you know one of these powers

to see. uh you know one of these powers actually is from a domain that I I I I did my PhD in and yeah seeing the model kind of come up with these ideas which I would you know quite proud to come up

with like in a in a week or or two uh seeing it come up with them in like an hour or so that was very uh yeah it's a very weird feeling right like like yeah

I think like in the past the when I felt like that was like when watching our data bot like play just like very interesting data games infinitely right and it feels like just there's some sort

of magic happening because like you know interesting things should not be like indefinite.

Yeah. And so seeing that happened for math right for something that I believe like you know is actually like quite representative of of of our our or you know a precursor to a lot of the work that we're doing and a lot of the work

that like really matters in the world.

Um yeah definitely really increase my feeling of urgency. One thing that's fascinating too is the idea that you're you're training these models and it's like you know you pro you throw these problems in and it's like nobody knows whether you know how good will they be

at solving them and and I think just like it must just be fascinating to see uh something that you know so well and and a space that you spend so much time in and and realizing hey probably the previous generation of models wouldn't have been able to do that and you

wouldn't even thought necessarily that this was like the the benchmark to do but it's like just generally showing the the general purpose capabilities and and improvements of the models. I mean it it is at a stage where like you know we

needed to like seek out experts in the in the particular domains to be to be able to tell us whether these particular proofs are correct or not but you know it's still much easier to like tell whether you've you've actually made

progress than you know than for something like uh even coding right like because sure like competitive programming you can evaluate but most programming is not competitive programming and it's you know it's about like are the abstractions right are handling all the all the cases and yeah

yeah I guess like you know I feel like there was this maybe common critic system a year ago and I don't know if it's as strided now that like okay these models are like pattern matchers but like you really want AI for science like we're not going to get new ideas or like

you know entirely novel things out of out of pattern matching feels like we continue to like chip away at that narrative are we getting closer to kind of fundamentally disproving that I believe so yeah I mean I think kind of

on schedule we're starting to see like minor advancements right like not huge things right like a small idea here or there I mean maybe maybe some like bigger papers in collaboration with with

scientists, right? But, you know, was

scientists, right? But, you know, was Alpha Zero a pattern match, Alpha Go a pattern matcher? You know, our our datab

pattern matcher? You know, our our datab match like they did kind of come up with new strategies for the respective games.

Yeah.

Um, it's funny that there's counter examples to it all the way back to, you know, 2016 2017.

Right. Right. And and, you know, and you can say like, well, I guess you can always fall to flaws in that which I think is interesting like AlphaGo can be beaten with some strategy. our data bots could have been been bitten with some

with some strategy. I think I think there will be a lot of definitiones for a while of of like these models, right?

But but I think also like they they are able to discover new things because they have a lot of these capabilities and like the way you know yeah I mean it's you know taken a couple years to like

get go from like this like very tiny game environments to like this much more um general scientific research. it

required kind of going through um you know like a decent approximation of like all human knowledge in the meantime and you know learning all the human languages and so forth but but um but I think the basic principle is is is very

similar.

Yeah. You know, it's funny. I think like when you guys had these first proof results, um I remember like the organizers said, you know, they were commenting on these AI solutions and they were like this feels like, you know, 19th century mathematics of like

brute force, you know, computationheavy approaches rather than these like elegant modern techniques. Um which I'm not sure is a feature or bug of of you know, obviously the the way these models work, but like you know, hearing that I

mean does that like does that concern you, excite you?

It doesn't concern me. I mean I think it's expected that like I I'm sure I I thought for at least one of the problems like actually actually our produced pretty pretty nice pro that was quite a bit shorter than like the intended one

you know but I think in general you would expect like yeah this models kind of you know they can produce so much more reasoning in a short time than like a person can right just like in terms of just raw number of like tokens or thoughts I don't expect that to be like

kind of a long-term feature it feels like there's so much momentum behind AI for science right now and you mentioned obviously like you know at some point you do have to connect these these models to the physical world and you guys released some cool stuff with

GKO and like some of these other things you've been experimenting with. I'm sure

you've thought a lot about like AI for a bunch of different areas of science. You

know, as you've kind of dug into some of this stuff, have you dealt with any intuition for as you think about like 3 years from now, the spaces where of science where you're like, "Oh, that there's going to be crazy progress there versus the ones that might prove like a

little more resistant to immediate change." You know, a tempting answer

change." You know, a tempting answer would be that like oh, you know, it's really about like um you know, do you uh you know, what are the things that kind of require some some you know,

manual work like where the models are not like not not quite plugged in the ecosystem or you know like the that the the different laboratories will also kind of evolve pretty quickly to adopt to like these new technologies

within those STEM fields. Obviously, you

know, I feel like there's a question of is it like an LLM with access to the physical world or you've obviously had companies that are have been started specifically around these domains, right? Like an isomorphic in biology or

right? Like an isomorphic in biology or periodic in in material sciences or physical intelligence and robotics.

What's your kind of gut instinct on the extent to which it makes sense to pursue some of these things like independent with different model architectures versus like all within the context of one place?

Yeah, I think it's kind of similar to you know my answer about like the um UI for you know for codex which like I I would build around the capabilities of a technology and not around it limitations

so much. Um so you know you definitely

so much. Um so you know you definitely like if you have something that like can suddenly design like a huge amount of like interesting like chemical or biological experiments like yeah I mean

it makes sense to uh you know build labs that enable that. You know, I think if we if we did get to a place where like the model is like very capable of designing high quality experience. It

also makes sense to like have it work with humans in a loop, right? Like we

shouldn't think of it as like oh it's either you kind of automated fully and you have this like fun thing using some tools on the side. Like we will get to a world where like it's just very natural to be collaborating with um you know AI

scientists that are that are working hard on a problem.

Yeah, it's so interesting. It's almost

like a different vision. It's like one world where this works is like hey you just train a model you know to basically run these endto-end tasks and like be the automated like you know uh biologist or you know chemist or whatever it is

and there's another one which is like well you're building really tools to you know both propose run kind of work in tandem with a bunch of human researchers I mean you know I wouldn't necessarily categorize it as I mean you know of

course there are tools in some sense but I think like you know we will get to a point where they're driving a lot of the like design and and ideation for the whole process. Yeah, with with like an

whole process. Yeah, with with like an LLM architecture, but just like you know being able to figure out the right way, the right kinds of experiments to run and and then actually design it. And

yeah, when it comes to like different architectures and you know, I mean, you know, for sure like you know like natural language reasoning like the kind of the kind of things u that that we're

prioritizing that gives you a lot of generality like there there are things that are that you know you kind of want to train it you want to train a different model to to model right you know I think even like yeah if if you

want to create a very good you know G model I I don't think like large language models are like the most efficient way to go about this although they might result in the best model eventually but uh you know I think it's

similar for like uh you know protein folding or or other task of this kind.

Yeah. So you think it makes sense to have like some independent efforts around that but obviously the like you know that will end up being paired with like a core really good researcher large language model that is you know helping drive a bunch of this stuff.

Yeah. I want to also make sure just to talk about AI safety because I think that's an area that you've done a lot of really pioneering work on. Um and you know I'm not sure all our listeners will be familiar with uh you actually did some really interesting work across the

labs right uh and and were focused on you know chain of thought monitoring and so maybe to start just talk tell us a little bit about that work and and you know uh you know what you found.

Yeah so this is um a realization that actually we had um around the time we actually saw like the first um reasoning

models of kind of the current crop. We

realized that like okay like well this works right and we were pretty uh you know we were thinking a lot about what this means we kind of were like okay like probably the word really changes over the next I don't know year or two

or three you know we were thinking what this means for for safety and for for our ability to kind of understand what these models are doing and we realize that because of the way we train these models that because we don't supervise

the reasoning process directly right it's not like you know chpt is trained to kind of um you know be be polite and nice and like Um, and it always tells me I have great ideas.

Yeah. Well, you know, that's a separate issue, right? Like, but but you know,

issue, right? Like, but but you know, but but like even assuming it's like aligned exactly in the way we would want it to, which is definitely not, you know, uh, sick ofic like it's still kind

of not going to be uh, you know, there are just still still some things it's not going to reveal about its motivations and time because, you know, maybe it would be unsafe or maybe it

would be unkind. um um or you know or maybe because it's not maybe it's actually not aligned the way we think but it wants to hide that right and uh and the way we train the reasoning

models like the the the train of thought doesn't have any of that it's not optimized to uh to be in any particular way because it's just not not directly

great it's only great in how it relates to like producing a high quality output um and realize this is actually a very powerful paradigm time for being able to

interpret what the model is doing, right? It's actually not a very

right? It's actually not a very different idea from uh um mechanistic interpretability, right? Because in

interpretability, right? Because in mechanistic like the idea is again like you kind of have this model, you have these activations of the model um that you know are not directly supervised to predict any label. they're they're kind

of like indirectly supervised but you know the model kind of has never been trained with like any sort of like uh you know inspection of the of these activations and so these activations might reveal something about this in inner workings but the big advantage of

the chains of thought is that you know by default they are in English right and so it's so much easier to understand what is going on especially you know as the concepts get more advanced u and the

other interesting thing is um you know we were just talking about how probably you know how how we believe in in the future where we go uh well these models work for a very long time they work

autonomously right and so there there is much more of this reasoning uh and so you know if this is a big axis of how the capability of these models increases

um that the sort of our ability to supervise them will will scale uh uh comately. Yeah, this really comes down

comately. Yeah, this really comes down to this principle though that like you know you're not supposed to supervise the train of thought and so this is actually something uh when we originally you know

we're releasing the preview model like we made this decision to like hide the chains of thought and yeah I remember and um you know for me that was the primary motivation that was the reason

like I didn't really even want to consider releasing it in different ways you know there definitely was a bit of internal discussion about this but like the reason I felt very strongly like we should we should just hide it is because

of this. Uh then there was this other

of this. Uh then there was this other concern that like I didn't initially think about but I think was also like very valid of like well you know like this model is going to be distilled to some extent blah blah uh and you know and that's definitely also been like a

big factor here. Uh but but yeah but I actually think that like this uh you know allowing the models some sort of private space uh oh and by the way like why do I think it's important that we

don't like you know show this change of thought in product you know um if if if I'm saying like the important thing is not to supervise them during training well I think if we did show in if we like established a paradigm where like

oh you just show this chains of thought in product uh eventually you kind of have to train them right like you'll have to train them for the same reasons you have to train like whatever models you ship. Um and I just think that

you ship. Um and I just think that we might not all want to know what the chain of thought our model has that gets to a response for right I mean you know I think I think it'll be useful to some extent and we are trying to capture most of that value

you know either with like chain of summaries uh which I think are kind of like a little bit of a stop gap. I think

the longer term solution here is having the model actually talk to you in real time which you know the later the latest version of Codex kind of do latest version of of the reasoning GP models kind of do but I think I think that will

get much better um yeah but but yeah I think there's something very exciting here about just like not u not having the training

signal fight against us right and not not Yes because yeah I think if you If you want to be able to understand what the model does in the long term, but you know you're scaling a method

that is like kind of going directly against that, it's you're probably not going to have a good time, right? That's

the other side of the better lesson. Uh

and so this decoupling I think is a very it's an idea that gives me a lot of hope for our ability to at least understand um you know how these models motivations and generalization evolve as they get

better as they as they work for longer.

Um yeah, I don't think it's a complete solution to AI as alignment by a long shot. I think it's just another tool in

shot. I think it's just another tool in our in our toolbox. Uh but I am hopeful that building our toolbox with technical tools like this, we can actually continue chipping away at the fundamental problems here.

Yeah, it seems like almost like over the, you know, medium term, it's like something that's going to be incredibly helpful. Probably not the catchall

helpful. Probably not the catchall solution for for long-term alignment.

Yeah, I mean I think it's a tool that can help us understand like I think it's actually very useful to like build understanding of long-term alignment, right? For example, there has been this

right? For example, there has been this very exciting quark um from um um um from a planning collaboration with other

labs uh on uh model scheming where they investigate uh you know depending on kind of what environment you pro you put the model in, how you train it like is it is it prone to like start kind of

like having hidden objectives that it pursues and you know what enables that that whole line of work is chain of fat monitoring right is this notion of like oh you can actually inspect what the

most motivations are uh so you know and I think from that like that might take us in a completely different in terms of mitigations right like maybe the right way is like changing the pre-training data of the model or maybe it's something like uh you know the

inoculation prompting from a topic like I think I think those are very interesting ideas but I think like having this ability to like understand is very helpful to to evaluate these yeah it's almost like foundational for any further uh area of research what are like the other research areas within

alignment that you're paying attention to or that you think are promising you know areas to focus on Um yeah, I think I think a lot of the a lot of the like longer term challenge

with alignment is about generalization, right? Like we can train our models to

right? Like we can train our models to do well and and and and or you know at least mostly to some extent like we we can mostly kind of control their behavior in the in the things that that

you know are in distribution that that we train for. Um, but you know the things that are worrisome is like well what happens when animal is asked to do something very very different or it finds itself in a very different situation or it's like much smarter than

it ever was before and and and you know it has all these capabilities. It's like

we haven't really kind of thought about how to train for and so yeah so so I think I think you know the study of like this kind of longer term value alignment is really a study of generalization like

what are the values that the model falls back on. Um like one line of research

back on. Um like one line of research I'm very excited about here and something that we're uh investing in quite a bit is uh understanding like how

that um how the generalization falls back onto the pre-training data. Um

um yeah and yeah I I I think there's quite a lot there. I guess over like you know the last six months have your concerns around alignment increased decreased like how do you you know where

are we kind of trending overall uh you know with this work I I I will speak to like the the the longer term challenges of like fignment right or like what happens when you have very smart models the the way my thinking about the problem has evolved

over the past few years is definitely kind of gone from you know oh is this like very nebulous problem that like is just like very hard to even grapple with or define uh to

like oh you know I think we can actually make prog progress at it by very concrete technical solutions and technical insights. And this is why

technical insights. And this is why we've really been uh viewing alignment as like just a core part of of research and really uh you

know making sure that like we are you know designing our reasoning models uh thinking about this and we are you know and we are kind of like conducting our alignment research with like these reasoning models in mind and so forth.

Um so I think my general kind of uh belief that there's like a research path here that actually gets us to an extremely

happy world uh has increased quite a lot. Um,

lot. Um, at the same time, right, I think uh my timelines to very capable models have definitely decreased a lot, right?

I think we're we're not that far, right?

Again, I don't think these are models that are smarter than all the ways, but I think these are models that are just very transformative. And so, I'm quite

very transformative. And so, I'm quite optimistic like we can keep a good grip on like how we're doing on the alignment problem, how to roughly evaluate the

risks of of of of our models or or the problems with them.

you know, but I do think we have to be, you know, as an industry as really prepared to like take trade-offs and, you know, and possibly, you know, slow down development uh um depending on what we see. It

we see. It it's already interesting to see a lot of this work happening across the major labs. You know, the fact that you did

labs. You know, the fact that you did this in collaboration with I think Anthropic and Deep Mind and you know, it seems like uh has that just come up organically or imagine like is there a lot of like alignment talk between you

know, the the major players, you know, uh given I guess the three of you are really at the forefront of all this?

There's definitely some I mean there's definitely like shared interest in this topics. Yeah.

topics. Yeah.

I want to shift a little bit to going inside OpenAI. I feel like no no company

inside OpenAI. I feel like no no company probably or the world has been more interested in over the last uh 2 three years and you know I think particularly what it's like to run a research organization. You know we talked a

organization. You know we talked a little bit about this uh previously but you talked before about how it's you know important part of your job is giving researchers you know uh to to kind of have comfort and space to you

know almost be cave dwellers right and think about what the models will look like in a few years. Um, you know, we were kind of alluding to it earlier.

We're also in a time where it feels like there's just massive competitive race and you know, uh, it's it's it's certainly, you know, everyone's going really gung-ho on these coding models.

I'm wondering like how do you actually operationalize this balance today and and you know, anything you've kind of changed in your thinking, you know, overseeing this organization around the right way to do this? you know I focus

on on just high quality experiments recognizing you know are we actually making progress being honest with ourselves and you know and promoting honesty about about the results um I don't think that has changed right and

and uh you know even though our work will evolve a lot I believe we still have quite a lot of work left to do and so I don't think it's like oh you know we need to wrap up all our projects uh

um you know very very quickly so yeah I don't think those fundamentals change I think what what does change is uh you know a level of urgency to really kind of bring some of these things that we think are most promising uh to fruition

and then obviously you know I feel like there's been um you know some very public internal moments of open AI over over the years you've been here for a long time as you kind of reflect back like what were some of the difficult

decisions that you guys made that maybe were like 5149 that really you know defined the company or any any any as you think back of the movie of the last you know seven eight years of your life um you know the key moments that kind of

stick out to you. Well, yeah. I mean,

there's certainly a number of, you know, dramatic moments, uh, like this. Um, you

know, I think the ways the company underwent the most change is not really this like snap changes, snap decisions, but more like just like shifts and and how it operates, right? I would say like

opening has gone for a couple phases.

you know when I joined at the start of 2017 2017 very much kind of uh felt like very academic lab pursuing like a lot of different ideas not so you know scaling

pill in practice uh and I think that was like the first like big change with the data product with GPT we've kind of moved to okay like we actually are going to have to buy big computers we're actually going to have to um scale

things we going to have to develop the science of scaling we'll have to develop the infrastructure for it um and so that kind of started the second phase of of okay now we're scaling right like we're

we're we're still going to pursue like a lot of these basic research ideas but we are going to evaluate them like for the act are this are they scalable um um then yeah then there was this

interesting period I talked about earlier right where you kind of have chat GPT is this big thing yeah I mean I thought it would look a little bit differently right like I

think I I was actually surprised that like text models I was pleasantly surprised like text models are actually kind of the first thing. I thought we would be in a world

thing. I thought we would be in a world where like it's more the kind of like you know video style uh uses of generative AI are kind of like the first uh the first big thing to take off and

like and we'll have to like trade off like pursuing the kind of longer longer term text based research. Uh so yeah so so so but yeah but I think definitely

like we anticipated that like this sort of tension would arise right where like you have a thing that is kind of like popular now but it's like you know you believe it's going to evolve quite a lot before you get to where you're going and

so I think that's kind of the phase we've been in for a while um and yeah I think now we're we're like uh

um well yeah I mean we believe we are kind of like starting to be in this phase where yeah we're actually deploying AGI or you know deploying models that are actually very economic transformative.

No, it's uh it certainly seems that way.

Well, I guess we always like to end interviews with a standard set of quickfire questions which are basically me just stuffing all my overly broad questions I couldn't fit anywhere else.

Uh so if you you'll shamelessly indulge me uh you know I guess to kick it off would love what's one thing you've changed your mind on in the AI world in the last year? Yeah, I mean I I think I

think it's really, you know, starting to reconcile this tension between, you know, the AI that you build ultimately is something that affects the world, but, you know, until you until you kind

of get pretty close, it's like a pretty theoretical thing that you're just kind of, you know, u training and developing algorithms for. And so, you know,

algorithms for. And so, you know, recognizing that okay, now we actually need um we really need to um

you know make a lot of pro progress and focus on like how actually we're deploying this technology and um in a while. This is definitely something I've

while. This is definitely something I've been I've been thinking about a lot lately.

Yeah, it's so interesting. basically

like you know uh outside of chat it was almost like more in the in the abstract or research hill climbing you know with some usage in the real world and then in this last year we've obviously seen you primarily via coding agents just you

know it it trickle in you know in in a pretty massive way.

Yeah I I I I think I I believe is kind of going in the same direction as like the coding models where like it's actually going to be something um you know very useful it's going to be something that's like a meaningful part

of of of people's lives. when you say going in the same way you mean just like executing longer term tasks or more like you know the I feel that's part of it right but also just um you know coming to become like a

dependable trustworthy assistant or compion yeah it's amazing to watch the way younger people use jet I'd argue it's it's already pretty much there for uh the way a lot of folks in in high school and college and you know uh seem

increasingly you know comfortable using it um you know I wouldn't be a shameless podcaster if I didn't ask a top researcher you know timelines for a few things I think particularly interesting is the stuff outside of the core LM

world and so think there's a lot of buzz around robotics these days. Do you have any like in I mean obviously it's hard to pinpoint like a moment robotics quote works but I think you know whether it's finding scaling laws or finding some

sort of like chatbtesque moment for robotics.

Yeah. I mean I definitely think there are like very promising algorithmic ideas there that I I believe are going to work that are you know not too dissimilar from the space of ideas. So

I'm I'm quite optimistic about about timelines there. Uh although I do think

timelines there. Uh although I do think they're longer than like the kind of the virtual um AI.

Obviously I'm sure you think a lot about you know cuz you're always thinking about the next frontier for what these models can do. Um you know just the impact on on society as a whole as you think about this kind of pace of continued model improvement. You know

what's maybe one thing that you think we're underthinking right now as a society in terms of the impact of these models? Yeah, I I I think getting to a

models? Yeah, I I I think getting to a point where so much intellectual work um can be automated I think comes with

pretty big problems that I don't think have obvious solutions. One natural is a question of jobs and you know concentration of wealth and I suspect

this requires like real policy maker involvement. Yeah, I've heard some kind

involvement. Yeah, I've heard some kind of optimistic takes on how is this resolved, but I think I think at a at fundamental level it does seem like you know some things that like used to be

very valuable used to kind of cost a lot and used to provide something like now can be done pretty cheaply and you know in the long term it should be a good thing but I think it does lead like I think it can happen quite quickly.

Um and there is a related question of you know you really can like if you actually have you know an automated

research laboratory an automated company that can do so many things like it can be controlled by a very small number of people right it can be it can do a lot right and this gets this gets you know even more crazy when you have robots but

but you don't need to have robots and you know I think figuring out like what does governance of such things looks like look like right like what are these like organizations that like so powerful and yet maybe made of like only a couple

of people like what how to think about these things I think is uh it's a new question we have to grapple with our society when speaking of other new questions one thing that's very top of mind for me I I recently had a kid and I've been thinking a lot about like you

know what is his life going to look like in in 10 years um you're really close to this stuff how has your work on on on AI changed the way you think about like the way in in which you know this next

generation should be raised a task for all of us right is to build the AI right build a world in a way where uh you know at the end of the day humans have the agency right humans set

the the direction right and you know maybe a lot of the the technical challenges that we cherish right now will become more of a you know past time that's something that we really kind of like needs to do in order

to make progress and and the challenges will be more and like figuring out like what are the things that are important what are the things we should go do you know I think that that will still be you

know I think I think you know in that world like people can end up with you know more things to do and definitely more more exciting things to to do and you know I think I think you still want

like to have an understanding of you know of like uh you know some understanding of like you know technology like all all the kind of like uh basic you know education however you want to acquire it for the sake of being

able to think about these problems. Well this has been fascinating man I really appreciate you sitting down and and talking about so many different things. Um, I want to make sure to leave

things. Um, I want to make sure to leave the last word to you. Like anything you uh want to point our listeners to, whether it's research you're doing or products you're excited about or really anything you'd like to uh to plug uh the

floor is yours. Um, you know, anything I'm sure there's tons of threads people want to uh pull out of this conversation.

I think the set of problems we just discuss, right, and also the questions around alignment, monitorability, I I I think I think those are growing to be very urgent challenges. And I don't

think there are challenges only for AI researchers, right? I think there are

researchers, right? I think there are challenges challenges for policy makers, but also also just things we have to think through as a society and uh yeah,

I I'm you know, I'm happy to see some discourse starting to arise and I I think we need more of it.

Yeah. Well, I thought I could talk to you for hours more, but I'd be doing the world a great disservice by keeping you from your actual work of continuing to improve these models. Thank you so much for doing this. This was a ton of fun.

Thank you. I'm Jacob Efron and this has been Unsupervised Learning, a podcast where I get to talk to the smartest people in AI and ask them tons of questions about what's happening with models and what it means for businesses

in the world. As I hope is clear, I have a ton of fun doing this. It's a nights and weekends project in addition to my day job as an investor at Redpoint. But

our ability to get these incredible guests on really comes from folks like you subscribing to the podcast, sharing it with friends. It's really what ultimately makes this whole thing work.

And so, please consider doing that. And

thank you so much for your support and listening. We'll see you next episode.

listening. We'll see you next episode.

Loading...

Loading video analysis...