OpenAI: How AI is reshaping the craft of building software - The Pragmatic Summit

By The Pragmatic Engineer

Summary

Topics Covered

From Tool to Teammate in Six Months
The Bottleneck Keeps Moving: Solve Coding, Face Review
Codex Runs QA on Itself While You Sleep
Stop Counting Tokens. Start Hiring AI Teammates
Code Will Become Invisible Infrastructure

Full Transcript

So [music] what a time to be alive, EJ.

Yes.

Can you tell us a question that a lot of us are asking? What is happening inside OpenAI right now? [laughter]

More specifically, when it comes to building software with how engineers are are doing stuff and how the whole thing is changing.

I'm glad you clarified that lots is hap lots are happening. Um I've

been there for about six months and one of the things that I've learned is uh there is so much to learn from uh the kind of research that's happening in the

company. Um and just to project out what

company. Um and just to project out what are the possibilities is just mind-blowing. So I'll tell you this

mind-blowing. So I'll tell you this right um the way we write software um has fundamentally changed um it's changed so dramatically uh and even in

the last 6 months I've seen us go from um codeex as a tool to an extension uh to an agent now to a teammate I fully

expect engineers to name their agents now um and call themselves as their teammates and this is happening so past.

Um I was looking through some of the leaderboards of like the people that are using um codecs internally and some of the engineers routinely hit hundreds of billions of um tokens every week. And

this is not just one agent. We're

talking about um uh last week we uh released uh Codex Box internally which is a way for us to like actually reserve dev boxes on the server and fire off

prompts and it's doing the the work.

doing the job while you're um on your laptop orchestrating all of this stuff and then people like shut down their laptop, go to a meeting, come back and then like all of the the work has been done. So this is this is happening in

done. So this is this is happening in parallel. This is how fundamentally

parallel. This is how fundamentally software has changed um internally at OpenAI and I can't wait for all of this in the kind of like the center of Silicon Valley and then to expand

further and further in a few months. I

think this will be the norm. Everybody

is going to be developing software this way. Uh, and that's pretty cool.

way. Uh, and that's pretty cool.

So, like if I would just take myself back, you know, 6 months, even a year, and I would hear you you you say this, I would think like, oh, it's it's like a magical fairy tale. You're making half of this up. However, actually like a lot

of us are using it. I'm using it. I'm

seeing what's happening. And I've been talking with engineers inside of OpenAI.

I love talking with engineers because they have hashtag no filter like like this is the secret to part of the prag why the pragmatic engine works. I talk

with engineers who they don't have this thing called media training or all that and and they just [laughter] they they just tell me how it is and inside of OpenAI one thing that was

pretty comforting to me I'll be honest is not all engineers are writing 100% of their code with with codeex they're all using it a lot more but it's there's there's lot levels there one team that is absolutely on the cutting edge though

and again I've talked with a bunch of engineers is a codeex team and uh they're even ahead of others inside open AAI so Tibo like you leading the the Codex team. Can you tell me how the

Codex team. Can you tell me how the Codex team works today and what the typical workflow of an engineer is like right now as of like yesterday or this morning?

Right. It's a it's a fast evolving situation.

Uh the thing that's delightful about how the Codex team operates is that they're sort of like constantly reinventing how they're working like almost on a week- toeek basis. And the thing that we go

toeek basis. And the thing that we go after is like you know we sort of like identify every single little bottleneck and the bottlenecks keeps shifting. So

you know it used to be code generation and then you know then it moved to like code review and then now it's very much like hey how do we understand the user needs faster? How do we try tickets?

needs faster? How do we try tickets?

like how do we like figure out you know what everyone is saying on Twitter, Reddit, you know, all the important surfaces and sort of like synthesize that into a strategy and like everyone is using you know and trying to like

leverage agents for that to the very best effect and an interesting thing is like the other day it was like the first time in a negotiation like you know someone was trying to join the critics team and like this person asked me like

how much compute am I going to get to build products at OpenAI. I was like huh that's an interesting question. I mean

we do have a lot of compute but I haven't really thought about you know so it's like a compute envelope per employee. Um usually that's more like

employee. Um usually that's more like reserved to like researchers who are actually training like really phenomenal models. So I think there's like this

models. So I think there's like this this shift there where you know people realize you can hyper leverage yourself in you know all sorts of like novel ways and if you do have great taste great

ideas you know you know how to build software it's like you know what a time to be alive really like it's just like incredible what you can do and taking a little bit step back outside of the codeex team VJ you're you

you have a lot of visibility inside of open AI how is the work of a software engineer or should I say a product engineer changing open AI has always hired software engineers who are product

engineers very clearly. How is their work changing? How how is are things

work changing? How how is are things morphing with with with product or are they not morphing?

Um fundamentally we're still building products for humans to use. And so

there's a lot of like product intuition that comes into play even when um so I've been messing around with codeex thanks to the new onep app that uh makes it even more accessible for everyone to

like um start coding. Um even in the lot of cases where we have to imagine what the product that we have to build and ship and that's where it starts and then you have to constantly tweak it to get

it to the right place. I don't think that's going to change. I as long as we continue to build software for humans. I

mean at some point in the future we may build software for agents but then maybe the agents will become the product engineers or product managers at that point. Um, but I I think the the uh the

point. Um, but I I think the the uh the velocity makes it a lot more appealing and compelling and actually more fun to be honest. Um, I was coding in on a

be honest. Um, I was coding in on a plane. Um, and at that time, you know, I

plane. Um, and at that time, you know, I didn't have access to the the dev boxes, but so you kind of like keep the laptop open when the flight attendant comes over like you have to shut down your laptop. No, no, but I don't want to

laptop. No, no, but I don't want to like, you know, have the agent stop, so I like keep it slightly open and then put it down.

Everyone just runs around with their laptop like, you know, half closed right now. It's like

now. It's like yeah what are we doing? I I I think I think that's u you know I actually think you know it's more fun now uh building software is because the the the cycle

the gratification cycle is so much shorter and it's so really cool to see the product that you're building test it verify it and then go back to codeex

and as as engineers we're engineers what are new different or weird engineering practices that you're now starting to see that kind of you know it starts to make sense as weird it

Um it used to be that you know you had like you difficult like technical trade-offs and you sort of like you know do like a design dock and discuss it all and then you know maybe you're like oh what are

the other viable alternatives and then you know you sort of like discard that.

I think a delightful thing is that now I see people explore like know multiple different implementations like all in parallel and then we can like actually zoom in on the one that you know we sort

of like prove to work better. Um the

other thing is I also see like rules like blur. Um so like our designers are

like blur. Um so like our designers are like shipping more code than like you know engineers were shipping like six months ago. And that's just also because

months ago. And that's just also because like the models have become like sufficiently good that the code that they're producing you know is actually code that we would want to merge just as is.

Do you do you have do you see anything else like in the broader OpenAI? Um I

have like noticed I don't know do you all remember like the command line for every one of the command line tools you use? I I don't want to pick an it's like

use? I I don't want to pick an it's like I was talking to Tibo his team edits um video files and like you know f if you know ffmpeg it's like

I don't think anyone remembers the command lines coex is like such a great tool for you like okay well I want to do this and then craft the command line go execute it. Um so those are kind of like

execute it. Um so those are kind of like new ways that we're seeing people use um codecs specifically. I also think that

codecs specifically. I also think that we've now moved on from just coding um to code reviews, security reviews and then um as Dibo said we're going to find

more bottlenecks. So once you solve

more bottlenecks. So once you solve coding for example now you've just made every engineer five times more um five times more um productive. What's going

to happen is like there's going to be more uh code being written which means that code reviews will become the bottleneck and then after code reviews uh integrations and deployment CI/CD will become the bottleneck. So we're

going to have to constantly go solve the next set of problems uh which is really exciting actually. Then TB, one really

exciting actually. Then TB, one really interesting thing when we talked about what you're doing at Codeex that I've never heard before is these overnight runs and the self- testing. C can you

tell us about that because that is like net new. Yeah, I I think it's easy to

net new. Yeah, I I think it's easy to sort of get stuck in like, oh, this is, you know, autocomplete on steroids and, you know, it's just going to implement a little feature and sure it will get done

in like 10 minutes. But what we're sort of seeing is that the model is like much much more capable actually if you give it like a very large task. It's capable

of like running for multiple hours. So,

we assemble like we've assembled like the environment and the skills so that Codex can like fully autonomously test itself. uh we run this overnight so that

itself. uh we run this overnight so that you know it just basically performs like QA in a loop uh and like flags like regressions. The other thing was I keep

regressions. The other thing was I keep talking to this researcher on the team who's actually training the models and he's like every time I think I'm more capable than codeex is just I figure out

I'm wrong and I just like didn't prompt it right uh or I hadn't set it up in the right way. And you know this is both

right way. And you know this is both exciting and a little bit depressing at the same time. Um because he's like oh now it's just you know training a model fully independently uh and like writing

a little PDF report at the end you know with like its own insights and findings and then we just take that and then find like you know the most promising things to like iterate on and then just like

reput that into codeex. Um, and so like these like very very long running tasks and like achievements that you know just like it's incredible to see like a model do this like independently.

Yeah. And one more thing that we talked about that felt to me a bit like from a sci-fi is you said that sometimes you have meetings you the codec team has meetings about codecs and like issues that you have and you told me something interesting that you know like people

you know get together in a meeting room and then you like fire off codeex threads to diagnose stuff with codecs can you tell us a little bit of of how that's playing because that is like

really like a loop of itself.

Yeah, there are two big things that we do there. So we have this like weekly

do there. So we have this like weekly analytics review where we go over you know like feature adoption um you know retention like you know we analyze our funnel and we always start a meeting

with like questions we have that you know just not answered in our dashboards or you know we haven't looked into like you know we're just like oh this looks interesting and then our data analyst is just like okay let's just meet let's

fire off like a little codex thread in the background like you know it will like come back in 20 minutes we'll have the answer like by you know just like and we can talk about it like in the last 10 minutes of the And then we do that you know for five six questions that people have in the room and it's

sort of like this magical experience where you know just have like this little consultants like you know working for us uh in the background and then the other thing is like for um whatever um you just like we get paid as call is

like you know just codex is there like you know helping figure out like what went wrong what is the fastest path to recovery um and there it just sort of feels like you know so much accelerated and like you know how much uh how much

information we can gather and like how quickly can solve for things. So this is one and it's absolutely accelerating right and we see this we see it elsewhere as well. One big question that

is keeps coming back across the industry is what about new grads? What about

junior engineers? And what I was talking with head of engineering at OpenAI uh he was saying something interesting that you are hiring early career engineers.

Can you talk a little bit about this is great to hear. Can you talk about how it's going what you're seeing with them?

How much are the fears of you know juniors are are not great because now seniors can just use like an AI agents.

how how are this founded and you know how are they getting up to speed?

Um we are hiring a lot of uh um new grad folks uh straight from college. We're

also having u so this year we have a pretty robust internship program um I actually truly believe that the new uh software engineers that are being created are going to be AI native.

They're going to know these tools in a native way. um and they're going to be

native way. um and they're going to be able to leverage um our AI tools uh from day one. And I think giving them the

day one. And I think giving them the opportunity is going to be critical and important and growing them in this kind of like the environment is going to be amazing. I can't wait to see this. And

amazing. I can't wait to see this. And

so this summer um is our kind of like a first batch of uh new grads that are going to be coming into OpenAI and I'm really excited for that. Uh it's going to be about 100 people or so. And then

uh I want to like continue growing our internship program uh within OpenAI. So

yeah, so this is going to be a really really cool thing to witness um in this age.

And then Tibo, how are you onboarding people to the code experience specifically? Even within OpenAI, my

specifically? Even within OpenAI, my sense is that the Codex team is maybe a little bit you know like a few months or or weeks or ahead of of how you're working. when someone new either from

working. when someone new either from the outside or even from OpenAI comes like how do they get up to speed on how the team works?

So we I run the team in a very it's like a very flat uh organization like I I have 33 direct reports uh on the team and they just you know run around and

like do cool things and uh it's you know I don't want to be the bottleneck. I

think this is like one of the things where as leads I think it's very um it's it's very tempting to not change organizational structure fast enough for like you know how quickly people can actually build and like a single person

being the bottleneck on every single decision is just like obviously not going to work anymore but the first thing that people um you know get introduced to obviously is like Codex itself right so like Codex is

responsible for the onboarding um you know you just like ask codeex questions you navigate the codebase like understand like what other people are doing you receive like you know daily reports but then the people who are

responsible for the onboarding and like you know the culture and how we built are like also the people that just most recently onboarded onto the team. Um and

I I find that actually like you know just talking about the new grats is like you know I have this like phenomenal new grat joined the team like you know 6 months ago and he's absolutely crushing it. Um and that was like a little bit of

it. Um and that was like a little bit of a surprise but like I understood you know this person has like sort of like unbound unbounded energy like much more than I do. Um and you know it's just like you know super super quick. Uh I

think you know my my brain is probably already in decline. Um you know this this person like Ahmed's brain is just like absolute peak peak. Um and you know just phenomenal person and he's been

like so successful on the team and that's been like really delightful to see. Now playing a bit of devil's

see. Now playing a bit of devil's advocate [clears throat] a lot of us more experienced folks who have seen like you know like new grads grow into like really successful professionals we

have seen that at least up to now foundations were so important and so what do you think will happen if we have news whose foundations are are using AI

coding and they probably skipped the stuff that we did for 10 20 more more years. Are they building the right

years. Are they building the right foundations or or are are we asking the right question here? Even

foundations remain super important, right? So we we take great care in like

right? So we we take great care in like designing the overall codebase, you know, just like taking care like overall architecture. You know, we do code

architecture. You know, we do code review as you said like you know we don't fully rely on like you know codeex like writing everything and just like closing our eyes and like being like this is going to be fine. um you know we

have like the very best engineers like working on this as well but I find like new grads are able to sort of like absorb that and then you know it's like if you have like the right structure for your codebase and you know you set like the right guard rails then you know

they're like incredibly productive and so I think it's just about the environment that you're setting up um and like you know thinking ahead of time of like you know like how is this like codebase going to evolve

and how is the role of of starting with like software engineers uh changing compared to even like six or eight months ago. go. What does a software

months ago. go. What does a software engineer do? Like if if you had to

engineer do? Like if if you had to explain to a new journey what they're going to ask like, "Hey, VJ, what am I going to do dayto-day?" What are they going to do?

Yeah, I think um so the idea of foundations, foundations will never go out of fashion. So that is going to be always important no matter what. Um I

think we're all here because we have strong foundations um that's brought us here. Um and then in terms of like you

here. Um and then in terms of like you know the role of a software engineer, it's changed quite a bit. I don't know if you I may be dating myself 25 years

um uh in the industry I've seen so many paradigm shifts and uh I actually worked on uh developer tools in Microsoft uh wrote the editor for visual studio and

language services. So when first time I

language services. So when first time I saw IntelliSense that was kind of like a really cool moment where you could kind of like type hit the dot and then the options showed up.

Yeah. But do you remember I I was joining the industry around that time and the devs around me were saying like you're not a developer if you use intellisense.

Yes. [laughter] And I mean I' I've seen those I mean like this is probably be before my time when people probably saw like okay if you're not writing assembly um you're not a good um software

engineer and then C++ and then um you know the abstractions kept going up and up and then people used to complain about JavaScript. Remember those days?

about JavaScript. Remember those days?

Um I don't think those things actually matter. The point is that as long as you

matter. The point is that as long as you have the strong foundations, as long as you have product intuition, know what you're building and be able to like go down up and down the stack um to be able

to like solve problems, those are going to be the more important ones. And I

don't think that'll ever go out of fashion. I I feel like that is always

fashion. I I feel like that is always going to be the case.

We're here between mostly engineers, engineering leaders, but let's just spare a thought on on product managers and designers. How do you see their

and designers. How do you see their roles changing especially now that both engineers and and them can build features a lot faster?

How does it that change their roles or are are we getting closer or do they still have a distinct role from what you see?

Um I go back to the as long as we're building products for humans to use, we will need human designers, we will need human product managers. I think this is

a you know I don't know um there is a substitution for a product sense um or design sense those things will evolve will get even more productive even more

um abstractions but um we will continue to evolve that the they're they're getting more and more productive if anything so product managers are writing code designers are writing code they're

taking their pro design um into production into um prototypes and validating it before they come to engineers. So I think those are already

engineers. So I think those are already getting a lot more productive. Um you

know this may be uh the product managers are also using codecs for building PowerPoint slides and we have Excel plugins and so it's kind of like all around it's not just engineers um

everyone around is getting more productive.

One cool thing that you're doing inside OpenAI which which I've heard is this internal knowledge sharing this show and tell where where teams show what they do. Can you tell us

do. Can you tell us how you came up with it? How you're

actually doing the mechanics? And can

you tell us like some cool things that you've seen teams show and and like maybe other teams adopt?

Yeah, it's um it's interesting because we're sort of like discovering the technology and evolving it as and we're co-evolving with that as well. So like

you know we're like just as all of you are discovering like hey this is what you know AI can do for me and this is like what it means for the organization or this is what it means for my project like we're also discovering it you know like pretty much at the same time like

as soon as you know like when we have something that so like feels like it's starting to work is like you know we ship it to the world right so it's like that we have like a very small um small amount of time where you know we we

actually are able to like you know have like more of the crystal ball than than all of you. Um, and it's super important that like good ideas diffuse very fast through the organization. So like you

know we we use Slack and like the Codex Slack channels and like hot tips are like you know two channels that are like um super super active and then you know we organize like regular hackathons like

show and tell um we just like try and diffuse like you know novel ways of like working with AI as fast as possible and like it's a highly creative time. So I

think there's no like one true way to use this stuff. It's like you know very much still like in discovery and then our we have like this phenomenal product uh manager on codeex um Alexander

Emberos and he's just like the single pro product manager like for the entire codeex team and he hyper leverages himself like you know with the help of codeex like I like the other day he organized this bug bash it was like an

hour like people were going through like you know features that we were about to ship and then he sent codeex to collect like feedback from everyone this ended up in a notion doc and then he

dispatched Codex to like then file feature uh like bug reports and like you know feature improvements like tickets into linear and then assign it to everyone and then follow up with

everyone on like how it was going and so like he's like becoming like a 10x like you know 50x like program manager just you know by leveraging AI as well and I think it's important to so like again

going to the bottlenecks is like you know you need to continue going back like you know your product manager cannot become the bottleneck so it's like you know you need to look at it in a principled way

one thing I'll add is like I I've been to um these demo days and we've seen a whole bunch of these projects being demoed. I remember um going to these

demoed. I remember um going to these hackathons and looking at like the demos. Um one thing I'm noticing is the

demos. Um one thing I'm noticing is the depth of these demos have been consistently going up. So it's not just like a surface level here here's what is possible. Um some of these demos are

possible. Um some of these demos are actually like here's what's possible but also I've taken care of all of these corner cases and actually like a very usable product. So the depth um day by

usable product. So the depth um day by day of all of these products that people are building uh even to just show off uh some of the capabilities is definitely going down um going up and getting

deeper.

One kind of disclaimer that we need to add is inside OpenAI everyone has access to unlimited tokens. there's no cost and people are laughing because it's kind of

a big deal, right? It's a in the outside world if if if you may cost is is still a problem. You get the max subscription

a problem. You get the max subscription and when it runs out you're now on credits and you know some some some people are cool with it especially founders but sometimes people ask questions with this in mind that a lot

of places are constrained with with cost just just for practical purposes. What

suggestions and tactics would you have for folks who who w are inspired by how the team at OpenAI is is worked but they have these constraints/h handcuffs to

work with.

Cost is something that we constantly think about. Uh one is obviously we want

think about. Uh one is obviously we want to make our models more and more capable and offer that um to um our users. Um

and then the I I also believe that at some point the thinking will shift because now you should imagine like now you have a teammate that is working for

you 24/7 and you can send instructions to your teammate like you know you can assign linear tasks or Jira tasks to your teammate and then expect um and you

should fully expect uh your teammate to be capable of taking care of those things. And then the question then

things. And then the question then becomes like you know how much will you pay this teammate not necessarily like how many tokens are you going to use. Um

and so if you start to measure in the terms of like productivity of every engineer having a team of four or five of these teammates then it starts to make a lot more sense. Now you should

like hold us responsible to make these agents a lot more capable enough to treat them as teammates. And that's kind of like you know what we're working on.

Yeah. I I think it also you know is is useful to think about you know how it displaces costs across uh you know the company and there are things that you

know you can do now that are actually like you know it's very cheap for you to do so like you know doing like marketing research like going over like the entirety of like your feature backlog and like figuring out like which ones

are like the ones that you can trivially implement. Um you know before that you

implement. Um you know before that you would have needed to allocate like you know maybe like 15 engineers to go and like look through that uh backlog and now it's like you know like almost free.

Um obviously like not everyone can you know provide the perk of like having unlimited like inference um you know to

their employees but I do think limiting it prematurely is you know as a risk uh as well and we're we're very very early

stages at like you know how well leveraged people can get and so I would definitely like sort of be saying like hey there's like the best people at your company like you know give them like you know very very comfortable like large

amounts of like inference.

Reflecting on the pace of change, we know it's fast and and it's getting really really fast. It feels like that, but taking a step back from your times before open AI and and VJ, you you've

been in in this business for a long time, more than 25 years. Looking back,

what was a time where change also felt fast? And did we see anything somewhat

fast? And did we see anything somewhat comparable in the past?

I don't think I've ever seen anything like this. Um I can look back in uh in

like this. Um I can look back in uh in the 25 years I've seen the dotcom bubble burst and that was during my college

time and then I remember Y2K I remember the mobile uh revolution and I was actually part of the social network revolution and this one feels very

different. This one is happening um at a

different. This one is happening um at a massive scale and ma and also happening very fast. the speed at which this is

very fast. the speed at which this is happening um some of these charts don't make sense um and so I do think this is something very very special and unique

and it's also cool to be living in this uh period now as a closing question it changes fast but the two of you have been in open AI for for now quite some time so

I'm going to ask you to make an honest prediction in two years time what do you think software engineering will look like and what will engineering management look

like just knowing what you know [laughter] obviously two years is like way too long of a time frame. Uh [laughter] I think six six months from now like the things that I'm sure you know it's like I feel

very confident saying is like you know we will get maybe another order of magnitude on like speed um and that will you know change things again and the

other thing that we will get working is like you know large networks of like multi- aent uh that can collaborate together on like you know very very big goals you know for example it should be you know within the realm of like

feasible to say like you know alongside in the same team of like you know what cursor demonstrated like you know hey rebuild a browser from scratch like you know just like go and then 24 hours later like you know you have this like

this thing that was built you know like two millions of lines of code it's like you know pretty much like untractable uh to like understand you know like what actually is happening under the hood and

so there I think what we'll start seeing is we will set guard rails around you know what is getting built so that you don't actually have to look at the code

anymore more and you can sort of either prove that it's correct in some way or the it is constrained in a way where you know it is secure and you can just look at the inputs and outputs and then code

will become like abstracted away and it will all become about you know what are the actual challenges and things and you know the the properties of the system software has been increasing in

abstraction makes it easier for us to go build massive amounts of uh product um code um with very little code. So it's

kind of like um over the years that abstraction has increased and I feel like we're in a time frame where that abstraction is increasing the rate of change has also increased uh quite

rapidly. um at some point I worry um I

rapidly. um at some point I worry um I I'll say this right there because um any sufficiently complex or sophisticated system um becomes harder to debug and so

you rely on symptoms to debug these things and so I I I I get I think in a few years we'll get to the point where software um is is so complex software

has gotten like so many layers in it and we get really good at identifying issues by looking at symptoms and our tools are going to get like really good at that too. Um, and so I I think that will be a

too. Um, and so I I think that will be a unique um uh function or I think that will be a unique uh ability for software developers to pick up.

Well, VJ, I want to add something to like what the future will look like. Um I think very much you will just be able to call your assistant and check on the work as well and you know you will have like one

dedicated sort of like personal assistant that is able to represent the work of like you know all the AI agents that are sort of like doing things for you productively behind the scenes instead of having like to monitor and

like you know check in with like a hundred or like you know 200 individual like little agents. Uh I think that's something that we'll see actually like fairly quickly including this year.

Yeah. Well, thanks so much to VJ and Tibo for giving us a peak of what is actually happening inside and how your teams are working, which it feels is is either months or weeks or or sometimes longer ahead of the curve, but it is

happening. And also just like what we

happening. And also just like what we might or might not see in this like really exciting time. Thank you so much.

Thank you. Thank you. [music]

[applause]

Loading...

Loading video analysis...