LongCut logo

OpenAI's Codex: This Model Is So Fast It Changes How You Code

By Every

Summary

Topics Covered

  • Super Bowl Ad Signals Mass Builder Inspiration
  • GUI Trumps TUI for Agent Supervision
  • Automations Turn Agents into Silent Teammates
  • Insane Speed Reshapes Coding Flow
  • Verification Bottleneck Limits Agent Autonomy

Full Transcript

The first time I showed it to someone, they were like, no way, this is like a fake demo. Like this can not be this fast. This will change everything, especially because it's not yet the fastest that we can actually get it to be.

My experience was trying the app. I didn't really want to go back to a terminal. What I realized is actually GUIs are great, IDEs are just the problem. There's

terminal. What I realized is actually GUIs are great, IDEs are just the problem. There's

something that's a GUI for programming that's not an IDE. And it seems like you're figuring that out, but I don't even know what that's called. It's called a code exam.

And I want to take a second away from the episode to tell you about Granola. Granola is an AI note taker for your meetings, and I use it pretty

Granola. Granola is an AI note taker for your meetings, and I use it pretty much every day. That may sound a little bit weird or a little bit creepy, like transcribe all your meetings. Well, for me, it's actually kind of indispensable as a leader. Every is about 20 people now. And it's really important to me that I

leader. Every is about 20 people now. And it's really important to me that I understand how decisions get made, how I'm showing up in meetings, and how I can help my team the best way I can. Granola acts a little bit like a leadership log for me so I can see how I've done in meetings, what situations came up in a particular week, and how I can do better next time. If

you're trying to improve as a leader and scale your company, try Granola as your AI-powered notepad for meetings. Head to granola.ai slash every, code every, to get three months free. And now, back to the episode. Thibault, Andrew, welcome to the show. Hey, thanks

free. And now, back to the episode. Thibault, Andrew, welcome to the show. Hey, thanks

for having us. Thanks for having us. Great, great to get to chat with you.

So for people who don't know, Thibault, you are the head of Codex OpenAI, and Andrew, you are a member of the technical staff on the Codex app at OpenAI.

And you are the people of the moment. They just ran a Super Bowl commercial about codex. OpenAI did. How are you feeling? Yeah, that Super Bowl was quite surprising,

about codex. OpenAI did. How are you feeling? Yeah, that Super Bowl was quite surprising, wasn't it? It really was. I think the core thing and I think the reason,

wasn't it? It really was. I think the core thing and I think the reason, the place I want to start this conversation is it feels like that is a strategic shift. You would expect OpenAI to have run a ChatGPT commercial

strategic shift. You would expect OpenAI to have run a ChatGPT commercial during the Super Bowl. And maybe not, especially if you looked at Codex's positioning like three or four months ago for professional engineers, maybe not have run an ad targeted at a much broader audience. It felt like for a long time there was this

divide where Codex was for professional engineers. And if you want to do VibeCoding, you do that in the ChatGPT app. And it seems like that has shifted a lot over the last month or two. Can you tell me about that? Yeah, I think especially like in, you know, we can talk about last week, right? So like last week on Monday, we released a Codex app. Immediately we saw like a ton of

downloads, like more than a million downloads in the first week. And then we knew that we were releasing like an extremely strong model, like, you know, 5.3 Codex on Thursday. That just made, I think this, it very visible that, you know, we're here

Thursday. That just made, I think this, it very visible that, you know, we're here to, you know, put incredible experiences out there. We're very committed to Codex. And like

also agents are really starting to work and be able to create these things, you know, even if you're like a little bit less technical, I think like the app really showed that, you know, it's like it's much more inviting for people to just try it and like, you know, run multiple agents, you know, with our models being like very, very good at sort of like allowing for multitasking and being reliable for

long running, long running sessions. So it like allows you to create a lot more.

So it just felt that, you know, maybe we can inspire more people to build and then show that agents are here, right? It's like, It's coming, it's going to be mainstream. Why don't you try and create something new and inspire people? I felt

be mainstream. Why don't you try and create something new and inspire people? I felt

like the right thing that we wanted to reinforce. Yeah, while we were designing and developing the app, one of our internal mandates to ourselves the whole time was that we had to make something that we love to use and that we used for all of our work. And if we couldn't do that, then we weren't going to put this out. And this was back when we started. And I think

that we surprised ourselves a lot with how fun it was. And especially as we started to build this app before we started to build agent skills. And then once we kind of paired them together, it became this really rich interactive experience where you could open the browser or you could connect to these various services. And so all

of a sudden we started to feel this like really connected interactive experience and wanted to share I kind of see the ad as like a love letter to builders, right? I have never seen a Linux CD in a Super Bowl ad. And

builders, right? I have never seen a Linux CD in a Super Bowl ad. And

so, you know, like that was really cool to watch. What was the impact of the ad? We're still to measure that. We'll see like, you know, how it plays

the ad? We're still to measure that. We'll see like, you know, how it plays out over the long term. But we saw a giant surge of traffic actually, like remarkably, like, you know, very, very quickly after 4 p.m., like PST, when it aired, like the surge and like our systems were like under heavy load. So it felt kind of weird to me that people are watching the Super Bowl and then going

and like, you know, installing the app and they're just like trying it out right there and then. But that it happened. And a lot of people reached out and like saying they were really inspired by it and just wanted to build afterwards, which is, you know, what we're like aiming for as well. Timmy back, I still want to talk a little bit about the strategic shifts. So Codex app moving from, or

Codex in general, moving from something that is really for professional developers moving to something that has a more broader audience and maybe moving some of the vibe coding from ChatGPT into the Codex app. Tell me about that. I don't think we're trying to move vibe coding from ChatGPT into the Codex app. We're very

much... two things are happening. One, we're pushing the frontier on professional software development. 5.3

Codex beats every single other model on the top benchmarks for coding. So it is a very, very capable model. And it's also at the speed and cost, it is a top performer other. I think the app, the second thing is the app does make things more accessible. And so it does appeal to

a wider audience. But internally, we're also seeing the app It is very much used within research, within our own team, like the entire codex team uses the app. It

makes people more productive. So it's like very much leaning in into, you know, how we think agents are best used, the patterns that we were seeing, you know, that we're making people like very productive here at the company and outside. And then it's just sort of like going all in on that. It does happen at the same time also. It's like, hey, it's just delegation is finally here. It works, you know,

time also. It's like, hey, it's just delegation is finally here. It works, you know, and it's like much more accessible and we're going to, try and see how we can package that and actually ship this to a much, much wider audience. But that

might not be the Codex app. You use that all day. It's like you just build in there. 99% of the code that I write is using the Codex app.

Same. I mean, I live in there now. Yeah. Okay. Well, that's actually really interesting.

I definitely want to talk about the app in particular, but I want to go back to the thing you just said, which is Maybe if I if I'm reading you right, you're kind of like we're pushing the frontier. We're seeing lots of people who are maybe broader than just like senior engineers using this. However,

the overall idea of like who is doing what in which act, like maybe you haven't totally figured out yet. And it's not as clean of a line as like no longer vibe coding and chat, but you're really by putting in codex. It's like

you can do it in both, but we haven't figured out exactly like which thing you're going to do where. Yeah, I think Codex is like the most powerful experience right out there. So you should be fairly technical so that you understand like, hey, you know, code is actually getting written and it's going to get executed on your machine by default is executed in the sandbox. But you should probably be able to

read code in order to use, you know, Codex to like its fullest. We will

bring a similar experience to ChatGPT at some point. which will have different properties in terms of the sandbox and how concepts are represented. Maybe we won't be showing, hey, this scary terminal command thing is running and you should probably approve it.

Of course you shouldn't do that to someone who is not technical. And Codex is really there to appeal to just all coders, builders, technical, people who are close, either technical themselves or technical adjacent, like data science, these kinds of things. Yeah. And if you use the Codex app for any amount of time, you can see the inspirations from chat.

The layout's very similar. We auto-name your conversations. We've got contextual actions, but it's pretty clean. The composer looks very similar. And you'll see some of that inspiration

pretty clean. The composer looks very similar. And you'll see some of that inspiration back in chat for other types of things. But we still believe that when we set out to make something that was for the professional software developer and for us, that it deserved a dedicated experience that could really showcase the power of the models and the way that the models could change the development lifecycle.

And so we made something very tailored to that. And we've had a lot of success internally with research teams, with product teams, Um, and so, you know, we're, we'll look beyond, but I think we're really happy with where we've ended up on the kind of tailored, the tailored approach to this. Can you tell you about the decision to invest in a GUI over a TUI? I feel like

TUIs are, are so hot right now. And obviously you have one for Codex already, and you, you could have said, okay, we're going to double down and, and just make the, make the terminal terminal experience even better than it is now and really invest in that versus, okay, we're going to go, you know, I think, uh, Yeah, making a GUI is a little bit of like a counterintuitive or like counter narrative

thing to do. So tell me about that decision process. I think it wasn't counterintuitive.

It's more maybe it's not mainstream. And so we experiment with a lot of different approaches. Like I very much consider that we're still in the experimentation phase. And,

approaches. Like I very much consider that we're still in the experimentation phase. And,

you know, we're responsible primarily for two things is like, you know, building the most powerful entity out there, you know, that's capable of coding. And then, you know, increasingly this will become like a multi-agent system and it will become like more and more capable. And, you know, you will have to figure out like how to steer and

capable. And, you know, you will have to figure out like how to steer and supervise like its outcome and its behavior. You know, that's like one thing that we're building. And then we're also building like how you even, you know, interact with this.

building. And then we're also building like how you even, you know, interact with this.

It's like, you know, what is the optimal way to have visibility into what this like very capable entity or like system of entities is doing? How do you steer them? How do you supervise them? And so we, we're very much still experimenting with

them? How do you supervise them? And so we, we're very much still experimenting with what that is. It's like, sure, you can do it in the TUI. It's like

at some point it starts to feel like very limiting, especially on like multimodal, like actually like the models can like draw little diagrams and generate images and, or you can talk over it using voice. Maybe you have like many of them going in parallel and sort of you start to lose track. So we felt like we needed

to start experimenting with something else. And it is only when we saw it become like super, super popular internally, we were like, we have to ship this externally. This

has come to a point where it's too good to sort of just keep it to ourselves. I mean, that was like the journey that you went, you were not

to ourselves. I mean, that was like the journey that you went, you were not building in the app. Although, when did you start building in the app? That was

actually fairly quickly, when the app was building itself. Yeah, that was pretty quickly. And

yeah, because I was starting with the TUI and with the IDE extension. And I

think that my goal personally was how can I get to fully building the app on the app as fast as possible? It's really easy when building this stuff to slip into the mode of like, oh, this will be good for somebody. Somebody will

love this. They will love this, right? So we really wanted to get quickly to like, I want to be able to build the app on the app. I want

it to be able to run itself with skills. I want it to click around on the app that it spawned. and I want this to be part of my workflow as soon as possible. I still use the TUI sometimes when I want to fire something quick, but I think that there is something about the flexibility of controlling UI and being able to have some panes be persistent and

others be ephemeral. We shipped voice with the app so you can prompt with voice. We have mermaid diagrams in the app. We have

full image rendering. So all of those things I think are like the tip of the iceberg and what we wanna do with a dedicated UI. And it's pretty simple and it's simply intentionally, but I think we're gonna do a lot with dynamic stuff there. I mean, yeah, the ceiling is just much higher. Yeah, it's interesting.

stuff there. I mean, yeah, the ceiling is just much higher. Yeah, it's interesting.

My experience was trying the app. I didn't really wanna go back to a terminal and I had been coding mostly in cloud code and some codecs in the terminal for the last like for several months before that. And I think what I realized is actually GUIs are great. IDEs are just the problem. And like

there's something that's a GUI for programming that's not an IDE. And it seems like you're kind of in that figuring that out, but I don't even know what that's called. It's called a codec sound.

called. It's called a codec sound.

You know, there was a moment during the development of this where everybody and their mother was forking the same IDE. And

we kind of looked at each other and we were like, hey, should we have done a fork of VS Code as well? Like very seriously. I remember exactly which day it was. And I think, I don't know if I would say that IDEs are the problem But I go back to like the truck analogy sometimes

with them, which is that like I will open an IDE here and there. Like

I opened one today. It was something very specific that I wanted to do that I don't even remember what it was. But then I closed it and I went back to using the Codex app. And I think that there is something there with like the Codex app being a great daily driver. And like occasionally you need an IDE or occasionally you need like a really complex terminal setup, but that this should

be our home base. It should be our command center for the agents that are running and a place that you can come back to and track all this stuff.

And, you know, there were a lot of design decisions around like, do we allow free form panels like an IDE? And we kind of came to the conclusion that a lot of what these models are great at is knowing what is needed in the moment for what type of task. And so we wanted to have kind of more full control over what was able to show at what point. And you can

see that in plan mode, where you're not necessarily getting a composer. You're getting a really quick way to answer questions. And you've got your plan, and you can edit your plan. And I think we only want to do more with that as we

your plan. And I think we only want to do more with that as we go. It seems like you were surprised that you didn't want to go back to

go. It seems like you were surprised that you didn't want to go back to the tree after. I was. Yeah.

Were you like a, like, Greg did an interview and Greg was like, I am a Tui power user. I thought I would never leave the terminal. Greg lives in Emacs. I was a Tui power user for like six months, starting with like when

Emacs. I was a Tui power user for like six months, starting with like when cloud code first got really good. And I was like, holy shit, this is so much better than being in Cursor or Windsurf or whatever. And now I feel like I speedran my Tui era and I'm back in GUIs. Like I'm kind of flipping back and forth right now, but I can... I sort of see the light where.

It just if you're especially if you have a bunch of them going at once, the affordances of GUI are just like make it much nicer. Yeah, and there's a lot more to come there. And it was a very intentional thing for us, like we sort of see, you know, agents will act and are already acting on like much more than code. And so they need to be a companion, you know, to

like every single app and every single thing that you can do on your computer.

It's like we integrate with like linear Slack and of course, you also need to be able to read the code and like produce code, but maybe it can do like a deploy through Vercel as well. And like, are you gonna do all these things from your IDE? That would sort of like feel very odd. And so it's like this command center for your agent. We optimize the entire experience

around the idea that you have a very capable intelligent entity that you're like controlling, steering and supervising. And you never need to like sort of like go in there and like, you know, do the things yourself. It's like, you know, the thing is very capable of like, you know, being delegated to. Like, I think, you know, when you accept that that is like, you know, what we're headed towards and like, you

know, with 5.2 Codex is like, you know, it just feels like, you know, we're getting like almost there, right? Then you're like, well, you know, it's the same with you, right? You know, like when I talk to you about like a feature ID or something, it's just like, you know, there's, you know, you go and you get inspired and you go and do it. You're just like, you know, I don't

suddenly jump into your IDE and like, you know, just go and like implement it.

You could. Yeah, I mean, I think you would find it disturbing, right? It's like,

I mean, so that's the way that you will, you know, everyone will work with agents. It's like, you just talk to them. How has your workflow changed with 5.3

agents. It's like, you just talk to them. How has your workflow changed with 5.3 Codex versus 5.2? I was surprised at how much faster it was. And sort of like I had to adjust on I had been optimizing a lot more for like long running, sort of like multitasking. And, you know, I sort of like had an

expectation of like, okay, this type of task will take like, you know, 10, 15 minutes. I'm going to kick like, you know, four like, you know, different things and

minutes. I'm going to kick like, you know, four like, you know, different things and then come back. So I'm able to like, you know, maybe do a little bit less multitasking and like, you know, be more in the flow. So that, you know, felt really good. And then it just feels now very satisfying as well, like, you know, to kick off like, automations with it, using skills. It's like, it's a more

generally capable model. It's like less sort of like super focused on code, right? And

so I find it like much more reliable, like, you know, sort of like going through like Twitter replies and like, you know, summarizing like the important teams or like filing bugs in like linear and then, you know, coming back to that and using automation so that, you know, things are like implemented like daily. feels like it's much more robust toward these things. I mean, but you're really the super power user here,

Andrew. It's just like the kind of stuff he does, it's just like, I have

Andrew. It's just like the kind of stuff he does, it's just like, I have very vanilla usage of codex compared to Andrew. No, I mean, well said.

I had a series that I had intentions to run this for a while, and I only ran it for three days on X Twitter. which was

that I was setting up a prompt to basically add a feature to the Codex app, like some random, like non-shippable feature to the Codex app. I had this long prompt, like about the quality bar that we had to do. And

once I switched it to 5.3 Codex, the results got actually much more interesting. Like

we did a Subway Surfers panel on the right was one of them. Like a

little Minecraft UI for the subagents was another one that we did that, I don't know, maybe... Maybe we'll ship it. I was like, get back to work.

know, maybe... Maybe we'll ship it. I was like, get back to work.

Why do we have Minecraft in the Codex app now? Yeah, but we're going to explore. No, I mean, 5.3 Codex, it's neat. It's fast.

explore. No, I mean, 5.3 Codex, it's neat. It's fast.

It's capable. It's multimodal. What are... Tebow says you have a lot of cool use cases. What are the more interesting ways that you're using the Codex app that maybe people... should try but haven't thought of yet.

Andrew came up with automations. And I think that sort of like, shifts the way that you know, you're thinking about these things when they can just like sort of like hop it into background, you know, on a specific trigger at a specific time.

And then you know, just you can sort of like program it yourself. Yeah, they're

using that a lot. There are a lot of things that I use the app for that are a little bit outside of just like coding features. I keep it to I use it to keep my PRs mergeable. with automations. And so

it'll resolve merge conflicts. It'll keep them updated. It will fix like build issues. So

that basically like as soon as they're ready to go, like they're ready to go.

There's no like, Oh, Hey, somebody merged a big thing and there's a conflict now.

So I do that. So you said like, so at what point is the, is the automation trigger? Cause I thought the automation triggers like at a certain time schedule, but it sounds like there are other triggers I didn't know about. I, yeah, I, We're looking at a lot of things. I have it right now just on a time schedule and I use our GitHub skill and some internal skills for our CI

and that runs hourly or every two hours and kind of just cleans everything up.

I see. So it's like through all, you know, if there are any changes on main and it just looks through any PRs and just like make sure that they're all up to date so that whenever you're ready to go, it's never like, that's actually, that's good. I like that. Yeah. It's actually really helpful. It's surprisingly helpful.

I have one that every day at like 9am, I get sent all of the contributions that have emerged to the Codex app over the last day. And so it'll do like a nice report of who merged what and it will, I have a group bit by theme. So I can be like, all right, like three people worked on this part of the composer, two people worked on automations, like here's what happened

so that I can at least be like knowledgeable of what's happening because things, Things get chaotic right before launch. One automation I have is I run it multiple times a day. And it's like, pick a random file and find and fix a subtle bug. And then it's kind of funny because it

actually does pick a random file. So it will run Python, like rand. And then

it will find a random file. And then it will start from there. And so

it's like every time it sort of explores a new one. Has it caught anything?

Oh yeah. It was like, we, we catch like, it's often latent bugs that, you know, are not triggering actually like on the critical path, but you know, they're actually bugs. And then, you know, just it's like trivial to fix it, like merge it.

bugs. And then, you know, just it's like trivial to fix it, like merge it.

It takes very little time. And it's a thing that, you know, I would have never found myself, find like an issue in like constraint sampling like the other day.

Yeah. That's really cool. Do you have other, other automations that are worth sharing? Let's

see. I feel like I have 60 that are running at all times. Some for

testing and some for real. Some of the members on the team really like this one that looks at the PRs that you've done in past day or so and quietly cleans up any bugs you shipped and kind of like looks at a few of the observability platforms to see and tries to basically ship a fix before

anyone's noticed that you shipped a bug. That's cool. I know it's not coding related, which is like marketing research. It runs daily and it's just sort of like it's probably with like a specific skill to do like deep marketing research, which I've like sort of like tuned over time. And then that just goes and like searches the web on, you know, any sort of like new things that sort of like came

up in terms of like how, you know, just like how users are like perceiving, talking about Codex. And then I just received that little report and it always makes for like an interesting read. Um, yeah, we can just go on. It's like, these are just examples that we do rely on, like, you know, they run. Yeah. Yeah.

Do you have any particular skills that you guys like that are beyond the normal kind of, you know, I have a GitHub skill and that kind of stuff. I

love Andrew's, uh, yeet, yeet skill, which, um, it just like takes like the change and then, you know, does the commit, does the PR rights, like the draft, um, puts it in draft and like you know publishes a pr with like a pr title and body yeah it's very satisfying yeah it just does everything um that one

is like makes definitely makes people like productive what are the top used ones for you um image gen is a cool one yeah um for both like silly automation purposes like hey make me an image that characterizes my last day of work um Not my last day of work, my previous day of work.

Yes, yes, yes, Andrew. The ImageGen skill was actually really cool. I used the Codex app to make a book for my daughters. And so I put together this prompt

my daughters. And so I put together this prompt for teaching it about a script that I wanted written. So like 24 pages, here are my daughter's ages, here's like where we've lived in the past, like we were in Boston and moved to New York and then moved over here. And then I said like, after that, we went through that, I agreed on the script and then

we went through and I said like, all right, now it's time to use the ImageGen skill. And it made, like it prompted for every page in the book based

ImageGen skill. And it made, like it prompted for every page in the book based on the script, it prompted for the image and then it kind of put them all together and use the PDF skill to put together the book's PDF and then I printed it. And so we've got like a super custom book that, you know, I read to my kids and it's really cool. It's just this awesome thing when

you can combine like the intelligence of like the agent and then it's like, like works in a programmatic way, like, you know, by using skills and then you can just combine them in like novel ways. And like, yeah, I think the PDF and image 10 one is like a, it's a common combo that we see. It feels

like the codex model obviously it's gotten faster, which makes it much more usable. And

it also feels a little more opacy, like it's a little more, has a little more emotional intelligence, but it still has a little bit of that, like it does exactly what you say thing in a way that is a little, it can be annoying. How are you guys thinking about how you shape the way the model feels

annoying. How are you guys thinking about how you shape the way the model feels and which way you're pushing it? It's something that we obsess over. So we

definitely want the model to excel at coding and be really good at instruction following.

At the same time, when we optimize a little bit too much in that direction, it can over-index on specific words or misunderstand the intent in ways that humans wouldn't. Sometimes I will just have a typo and then the typo actually find its way into the file. And

I'm like, obviously, I didn't mean the typo. I meant like, this name of this class. So that's something that we're definitely continuing to push on. But the thing

class. So that's something that we're definitely continuing to push on. But the thing that we're pushing on the most right now is really efficiency, speed, and then also what we now refer to as personalities. How supportive is it? And we understand that not everybody has the same preferences there. Previous default was definitely

super blunt, pragmatic personality. Now we've also introduced a more supportive, friendly personality, and you can just pick between those. And I think for things that don't have a universal, accepted thing that everybody should just use, we're probably going to introduce some way for you to just make it your own. You should feel like you have

your own little personal codex that works in exactly the way that you want it to work. um do you use the friendly or the pragmatic one pragmatic pragmatic yeah

to work. um do you use the friendly or the pragmatic one pragmatic pragmatic yeah okay i'll say it's pragmatic yeah um interesting i think um you guys recently put out a model that is so fucking fast i was testing it before it came out and i was just like uh i can't really keep up with this thing so i'm curious how that changes how

you think about um what is now possible with coding with a model like this and also the affordances that you need in order to manage models that are so quick effectively. Yeah. The first time we use this

quick effectively. Yeah. The first time we use this model in the app, we had kind of that same thing happen where all of a sudden there was just like this wall of text and we are at the bottom of the scroll and we were immediately like, all right, we need to smooth this thing out coming in. And so we actually do slow it down ever so

slightly. just so that you can see the words come in like a little bit

slightly. just so that you can see the words come in like a little bit smoother. That's so funny. It's like a really funny problem. But this thing has been

smoother. That's so funny. It's like a really funny problem. But this thing has been super fun. And I think what I'm most excited about is what

super fun. And I think what I'm most excited about is what sort of capabilities we can start to add to the app that are really, really dynamic that we couldn't with a model that wasn't this fast. So yes, this model is going to allow you to iterate really, really quickly. but it also opens up

a lot of new opportunities to how you code and how you interact with the Codex app. The first time I showed the very first

Codex app. The first time I showed the very first prototype when we hooked everything up and obviously the model is powered by Cerebrus and we've talked about the partnership there and we're very excited to put the first model that we're serving through that you know, out there, like it's,

it's, you know, obviously like still like very early. It's literally the first time we hook it all up and we're just like so excited that we want to share it. But the first time I showed it to someone, they were

it. But the first time I showed it to someone, they were like, no way, this is like a fake, a fake demo. It's like, you know, this is not real. Like this can not be this fast. And then they tried like a few prompts. They were just like, It's just like, oh, I literally cannot

keep up. It's like, this is insane. And yeah, I think this will change

keep up. It's like, this is insane. And yeah, I think this will change everything, especially because it's not yet the fastest that we can actually get it to be. With the preview, we're putting it out quite early. We're actually going to

be. With the preview, we're putting it out quite early. We're actually going to layer a number of optimizations on top of it, which should be able to make it maybe two to three X faster than the experience that you have experienced. that's

going to change things. And we're thinking about this also from a point of view of like delegation, you know, like we think this model has a huge role to play as part of like a system of like, you know, multi-agent systems and as a way to like speed up, you know, maybe the slower, more intelligent agent as well. So we're going to be experimenting in that way. And do you

well. So we're going to be experimenting in that way. And do you expect the same hardware speed ups on like the more intelligent agents to come out soon? So a lot of the things that we worked

soon? So a lot of the things that we worked on were interesting, sort of like distributed systems and like infra problems that we uncovered because we were able to sample from the model at unprecedented speeds, right? And then if you're getting tokens back this fast, you need to

go and like optimize the entire set of bottlenecks that you sort of like uncover on the critical path of serving. All of those benefit, the current, they benefit like GPT-5.3 codex and like, you know, all future models. And there's one thing that we've been doing as well, which we're, I'm sure we're gonna put in like a more detailed blog post at some point, which is we wrote the entire

series stack to be based on like web sockets and like a persistent connection and to do things like a lot more incrementally and like statefully. And that decreases like the overall latency, like, you know, across all models. Like we haven't shipped it by default yet, but it's, you know, It is something that we are making the default for this new super fast model. And then we're also gonna enable on the other

models. And it makes things, it decreases overall turn latency by something like

models. And it makes things, it decreases overall turn latency by something like 30, 40%. We can look into the exact numbers.

30, 40%. We can look into the exact numbers.

Yeah. What are the most surprising things that you've seen using the model internally in terms of what a speedup like this enables? It just allows you to be super, super in the flow. And you're almost like just in real time sculpting the experience or the code. It's just a very different feel to it.

It's very unsettling at first. And then once you get into it, it's very hard to go back to any other model. That's the feedback that we've seen. That's what

I have felt myself. And so it's like this very It takes like five minutes to adapt and then you sort of like, no, okay, it's like, this is how I'm going to use this thing. Yeah. I also don't think that we've poked at the full extent of what we could do with it.

Yeah, it's true. It's very early. We haven't had it for very long. Yeah, someone

on the team like Channing was just showing like, oh yeah, it's so fast and it can actually like play Pong, you know, not very well, but it's like the model is able to react to things like, you know, almost like real time, right?

It like... you start to see how it might replace some deterministic steps. So we have in the Codex app a set of

deterministic steps. So we have in the Codex app a set of Git actions, right? And as everybody knows with Git, like certain configuration of things or certain states that you can be in can make it really hard to run those without a ton of error handling and like all sorts of like error messages and guidance. And it's really hard to create a good Git experience

which is why nobody ever has. But if you have a model that's almost as fast as running these scripts, then you can imagine a world where these things turn into skills or something like that. And you can have your operations run a little bit differently with some intelligence and not have the same latency that you

have today when you're asking it to go track something down the code mix. You

can kind of vaguely gesture and be like, hey, send this up and have that be fast enough for a button. What I'm very excited about is when it's going to come together with one thing that we shipped with 5.3 Codex as well as this thing that we call mid-turn steering, where you start with your prompt. It's like it got to work, and then you send another prompt.

while it's still working and it adapts like in real time as well, like it will just sort of like receive that message, acknowledge it and then, you know, continue its work. Like if you start to think about, okay, what would this look like

its work. Like if you start to think about, okay, what would this look like with voice and then with a model that is as fast as the one that we just shipped, then that's like a whole other experience that, you know, we would be very excited to bring, you know, hopefully very quickly. Because you can easily interrupt Yeah, if you're just talking and engaging with lateral

language and then doing the mid-turn steers and then the implementation happens almost instantly because of the speed, it becomes a very pleasant thing to use. Right now, you can emulate it with voice dictation and incentive and

use. Right now, you can emulate it with voice dictation and incentive and mid-turn steering and then watch the model implement and it's a very cool thing. I

think we're going to have a step change. in that experience when we just like really just polish it. If speed as a bottleneck is like close to being solved, what do you think is the next bottleneck? What is the next limit on making the thing you want? The bottleneck that is very apparent is like, you know, how

fast can you verify that things are correct? So

like we, I mean, we can generate like code faster than ever before. We can

implement entire features And I saw someone just based on a description of the Codex app, if you synthesize that into a plan just based on screenshots, the models are very much capable of reproducing 95% of the features and just rebuilding the app from scratch. Now, is it going to be bug free? Is it going to, you know, is everything like implemented to like, you know,

free? Is it going to, you know, is everything like implemented to like, you know, perfection in the same way that, you know, the actual app is? It's like, that takes like a lot of time still, like, you know, for like a human to go and click and verify and, you know, make sure that, you know, it's like, the designs are like consistent and that, you know, there's like no bugs here or

there, that the settings panel, like, you know, when you click that button, it actually does the thing that you expect. I think verification, you know, definitely becomes a bottleneck.

Like we have people on the team, like complain, you know, like, There's too much code to review. It's like, you know, that's what we're trying to solve for. I

mean, you, you complain about that. I complain about that. There's so much code to review now. Both like, like on your own machine and like from

review now. Both like, like on your own machine and like from another peer, it's, it's like, we're going to have to figure that out. Yeah. You're

already reviewing, you're reviewing the code the first time because the agent is just presenting it to you. And then you have to review, you know, the code produced by your peers, you know, who are like, there's like these two rounds of reviews and yeah yeah i mean this is something that we're working on uh a lot of us still do have to review code um and we want you know we're taking

a look at what that experience should look like with the model involved right um we've got a review mode in the codex app that works really nicely and kind of annotates your diffs on the side with findings and stylistic things and um lots to do It's one thing I'm also excited about making the models

faster and then this one that we just put out, which is mind-blowingly fast. You

can also use it. You can imagine using it in a way to understand code, understand features, helping you with code review, helping you understand the code that's up here.

It's much more pleasant because this is something that you want to do. You want

to be there in the flow, it's like something that has to be like synchronous.

It's not something that you delegate. You cannot delegate understanding, right? It's like, you know, you're trying to like, you know, get to understand something. And so like speed there, like is a real advantage. So it sort of like helps offset as well. Like,

you know, the fact that models are like producing more and more code is like, you know, speed helps you understand, you know, this code faster as well. Yeah. I

mean, I definitely think I've found this already with this, with this new model is, speed, especially for end-to-end testing is faster. Cause if you're having it do end-to-end testing, like manual integration testing, often there's like a toast that pops up in it. It

pops up for like a second. And if the model's not fast, it's not going to get it. And it seems like it's better for that. Cause it, it, the cycle times are much, much shorter. So I'm, and, and I definitely find this too.

It's like, I can produce so much code, but when I see a PR come in or when I make a PR, my first question is like, Is there evidence that you've actually tested this and this actually works? Like not just unit tests, like you've gone through it by end to end. How do you handle this?

I mean, I've seen a lot of peers that I have the same question about.

It's like, it's so easy to code things now, right? Yeah,

I mean, we have gotten the Codex app to be pretty good at, through some skills that we have. of running itself, clicking around, screenshotting itself for evidence and uploading it to the PR. There's like a lot that's pretty interesting there, especially when we make this like more async or when

the models get really fast at this stuff. Like, I don't know exactly what it looks like yet, but there is a lot there around like, hey, here's a bug fix. This is exactly like what it looked like when it was happening. And here's

fix. This is exactly like what it looked like when it was happening. And here's

exactly what it looks like now with the same exact click path. And so like maybe that's the turning point that code review becomes less important when it's like you can verify that part instead. So you have to kind of like do less through the code as a proxy. But there's definitely more to explore there.

Last couple of questions. I'm curious, what have you guys learned from Anthropoc and Cloud Code and how do you think about your positioning in the market versus them? Like,

how do you think about the differences? I think they were first to put something out there. And that was interesting to us because we had been working on similar ideas for a bit. But I think our models were a little bit at the time not ready, like, you know, they were not like reliable, like on long horizon tasks, like, you know, they were not able to like do

like, reliable tool calls and, you know, stay on topic. And so As soon as we started to really invest on that, and especially with GPT-5, we were like, okay, the models are there. We know how to make them even better. 5.2

brought even better long context, long horizon, reliability and context understanding. And what we were seeing is that, and Tropic was

understanding. And what we were seeing is that, and Tropic was sort of like, to us, losing a little bit of steam when it came to the model. we were in this fortunate position where like the way that we run

the model. we were in this fortunate position where like the way that we run codex is like, you know, we've got like product, we've got engineering, but we've also got research and we just like all work together and sit together and solve problems together. And it's like a highly creative space where, you know, at times we decide

together. And it's like a highly creative space where, you know, at times we decide to like solve problems in the product, in the harness, but at times we also, we're like, Hey, how can we actually improve the model? And like, let's just, you know, talk about it and like, you know, idea it together. And then like research will come and be like, hey, you know, we've got this like breakthrough that we're

sitting on. It's like, would this be like sort of like something we can ship?

sitting on. It's like, would this be like sort of like something we can ship?

And then it was just sort of like, get excited about that. One of the examples was we had a lot of complaints on compaction. You know, compaction was like something that people felt like whenever you would hit compaction, you know, people would complain.

It's like, it's losing too much context. And so we sort of like solved that end-to-end and like, you know, we decided to do like end-to-end RL training and, you know, introduce compaction like within research and then, you know, make the model like, you know, itself like, you know, very familiar with the concept of compaction and like producing like optimal, like sort of like delegating to itself like across time. And, you know,

once we had that and we had solved it at the model level, like sort of like the harness problem became like so much easier because it was just like, oh, just let the model do it and it's going to be like very reliable.

So through that and like through that collaboration, it just felt like like the momentum has been like very strong and that we're so like able to improve like models and like, you know, ship a model like roughly like on a weekly, a monthly cadence. And then we took like a bit of a different bet and like a

cadence. And then we took like a bit of a different bet and like a different approach with the Codex app, which turned out to be like, you know, an awesome thing, you know, to just try and do is like, and not just like sort of like force ourselves, you know, and like trying to cram everything into the Tui. I mean, it was like, it was like a great challenge, right? You know,

Tui. I mean, it was like, it was like a great challenge, right? You know,

you were like, I'm, you know, it's like, let's build an app. Like, just like, where do I get started? And then, you know, just like, you just got obsessed by it. It's hard not to. Yeah. I mean, it's like, how was it to

by it. It's hard not to. Yeah. I mean, it's like, how was it to just like, you know, build something that was quite contrarian, I suppose.

Yeah. I mean, I remember you and I talking about whether or not, like, early on, we were like, we don't know if we'll ship this. Yeah. Like, we'll try it out. We'll see if we can get there with something that we love and

it out. We'll see if we can get there with something that we love and see if we can get, I remember saying like, let's get some PMF internally. Let's

get everybody at OpenAI to want to use this thing without being forced to use it. Let's see if we can do it, right? We did. And it was like

it. Let's see if we can do it, right? We did. And it was like adopted very quickly. I mean, the minute it was barely usable, the research folks like put dev boxes on it, right? Like, which was like this crazy hack at the time. But now they use it like for everything. Yeah. Yeah, it was like

time. But now they use it like for everything. Yeah. Yeah, it was like including in training like the 5.3 codecs. And so like, I think I feel really good about having hit the point where like, you know, like everyone technical at the company, like almost everyone at technical at the company like uses codecs, but like the people who use it the most are, you know, actually building codecs and building the

models. And so, you know, we're just able to like, you know, improve things at

models. And so, you know, we're just able to like, you know, improve things at like crazy, crazy speeds and you know, there's just like no signs of it slowing down. Amazing. Well, I'm excited for what you ship next. Um, Thank you guys for

down. Amazing. Well, I'm excited for what you ship next. Um, Thank you guys for your time. I really appreciate it. Thank you. Thank you for having us. Thanks.

your time. I really appreciate it. Thank you. Thank you for having us. Thanks.

Oh my gosh, folks. You absolutely positively have to smash that like button and subscribe to AI and I. Why? Because this show is the epitome of awesomeness. It's like

finding a treasure chest in your backyard. But instead of gold, it's filled with pure unadulterated knowledge bombs about chat GPT. Every episode is a roller coaster of emotions, insights, and laughter that will leave you on the edge of your seat, craving for more. It's not just a show. It's a journey into the future with Dan Shipper

more. It's not just a show. It's a journey into the future with Dan Shipper as the captain of the spaceship. So do yourself a favor. Hit like, smash subscribe, and strap in for the ride of your life. And now, without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.

Loading...

Loading video analysis...