Automatic code reviews with OpenAI Codex

By OpenAI

Summary

## Key takeaways - **Codex automates GitHub code reviews**: Codex can be enabled in your repository settings to automatically review new pull requests, with findings flagged for attention. [01:23] - **Codex goes beyond static analysis**: Codex code review is not limited to static analysis; it can access the entire repository, track dependencies, run tests, and even write code to test hypotheses. [02:37], [03:19] - **Training for high-signal, low-noise reviews**: Models are trained to prioritize catching bugs that matter and are willing to be fixed, aiming for high precision and a lower rate of incorrect comments. [03:45], [04:02] - **Internal use prevents critical issues**: OpenAI uses Codex for code review internally, successfully preventing critical training run bugs and saving them from potential delays in model releases. [04:21], [04:31] - **Customizable reviews with AGENTS.md**: Codex code review can be customized with specific guidelines from AGENTS.md or custom user instructions to focus on particular areas or ignore certain problems. [05:57], [06:27] - **CLI review catches bugs locally**: Codex code review is now available in the CLI, allowing developers to review their local changes before submitting them to GitHub, catching bugs early. [07:04], [07:35]

Topics Covered

AI code review is essential for scaling engineering.
Codex code review uses tools and repository access.
Codex trains to find critical bugs with high precision.
Codex prevents critical issues and enables new contributors.
Codex CLI offers local code review before PR submission.

Full Transcript

Code review of two comment mark.

>> Hey everyone, I'm Roma

>> and I'm Maya.

>> Codex needs to do two things really well

to be an effective coding teammate.

First, it needs to work with all your

tools and second, it also needs to plug

into all of your team workflows. Code

review is one of the most important

workflows for any engineering team and

we want to help there as well. With GPT5

and now GP5 codeex, we train these

models specifically to find bugs and

investigate some issues. Maya, why don't

you tell us more?

>> Sure. I work on alignment team within

OpenAI research and code review is a

very important research project for us.

As the AI capabilities grow and we have

more and more powerful coding agents, we

are producing a lot of code and human

verification is becoming the bottleneck.

So we need to make sure that we're also

training powerful models to help humans

in verification and that our

verification abilities are scaling as

fast as AI capabilities. That's why we

trained code review models. And we could

have done it just like as a research

exercise internally, but it's actually

way more useful to have it interact with

our OpenAI code base and also release it

externally.

>> Well, very much align with our

philosophy at OpenAI of like iterative

deployment and making sure the

technology and contact with reality. So,

I'd love to enable code review. Can you

uh show us how to do it?

>> Absolutely. So the first thing you will

have to do is to go in your codex web

setting and just enable it.

>> So that's all it takes. And now going

forward to understand like any PR

submitted to that repo will be

automatically reviewed by codeex.

>> Exactly. So for example here I quickly

coded up some simple extension. I

created a pull request and it's marked

as ready for review. The code review

agents will pick it up and start

reviewing it.

>> In the meantime, there's a little bit of

an emoji with the eyes over here.

>> Exactly. Um, why don't you uh show us

one where Codeex has already completed

the review.

>> So, here's another one. It looks a bit

different because here I triggered the

review already when the PR was in a

draft stage and I didn't yet wanted real

humans to look at it.

>> So, that's an interesting technique. So,

you manually add codeex with the message

and it sounds like in this case you also

wanted to have specific instructions,

right?

>> Yeah. So you can always comment at

codeex review this PR and you can add

some extra information for example

something that would help the agent

understand the PR or you can instruct it

to focus on some specific area here what

I love about Codex code review and I

want people to understand like it's not

just static analysis right like the

model has access to tools it's able to

check its own work to run tests and

commands can you show us what happened

behind the scene for this PR

>> yeah so one important thing to note is

the model doesn't only look at the diff.

It has access to the whole repository.

It can track down dependencies. It can

try to understand it in a broader

codebase which is very important when

you're working on a very complex code

where you're not the only contributor

and you don't understand it in its

entirety.

>> And here it sounds like you can also

review all of the logs that led to that

particular review, right?

>> Yeah. So here the model was not only

browsing through the codebase but it

actually decided to form some hypothesis

and write some Python code to test the

hypothesis and check whether uh it's

actually correct and show it on a couple

of examples.

>> That's really cool. And I'm curious like

how was this model trained? Like did we

do anything specific to like make it so

good at like catching bugs? Yeah, so it

is part of the codeex training, but we

added some specific tasks in which we

wanted to really prioritize catching

bugs that actually matter and people

would be willing to fix in real life. We

also made sure that we aimed for very

high precision rate. When we evaluated

it against the previous generation of

the models, we have shown that it has

much lower incorrect comments rate. But

to be fair, the most important

evaluation is just people using it in

practice. and it finding the real bugs

and simultaneously not being annoying by

outputting too much comments.

>> Yeah. So, you showed us like a couple

examples with GPTOSS. Now, I'd love to

touch more on like how we use it

internally at OpenAI.

>> Yeah, we've already used it for a while

and we have gotten a lot of examples of

it really saving us from having critical

issues. So it saved us from having some

important training run bugs that would

potentially delay some important model

releases or some configurations that

would not be normally visible just from

the div alone. It also allows us to more

confidently contribute to the code bases

that we're not fully familiar with. So

for example here we have Alex our

codeexpm contributing towards VS code

extension and implementing a change code

review model decided that it's not the

correct way to implement it.

>> Right. It sounds like it caught a bug in

terms of like removing a react prop here

with the CSS.

>> Yeah, it was not really possible to spot

this bug without understanding of the

broader

>> code here. But then I love Alex's

followup as well at Codex. It's fair

enough. Fix it up because you can keep

on going with this conversation when and

ask Codex to take it over.

>> Yeah. After a code review bot produces a

finding, you can always ask Codeex to

trigger a new task and actually fix it,

which is a very nice workflow to do.

>> Yeah, that's really amazing. And really

the vision of having Codeex at any point

and any surface as your teammate that's

able to kind of do work for you. And

what I love about like codeex code

review here in this case is that you may

have like amazing engineers on your team

but they may not have like many hours to

spend to kind of not just look at a diff

but like run every possible scenario in

their head. And here you have codeex

that's able to do that in its own time.

And one thing I wanted to touch on as

well is that we now have like agents.mmd

right this like open format for coding

agents including codeex um to kind of

like follow some specific instructions.

Uh have you like used agent MD like in

your case to kind of give codeex code

review some extra instructions?

>> We actually train the model specifically

to be steerable either with uh with

custom user instructions or look for

agents MD within the codebase. Uh so if

you add some custom code review

guidelines to your package that would

help model to understand the code ways

but also potential requirements about

what it should pay special attention to

in code review or maybe what kind of

problems to ignore and don't bother your

developers about.

>> That's awesome.

>> Uh you can even like ask some custom

special instructions about the style it

responds to. So, for example, since I

needed a little bit more validation, I

wanted Codeex to tell me every time I

make a bug that I'm still an amazing

programmer.

>> That's awesome. And you are

uh what we've been looking at here so

far has been Codex reviewing code in the

cloud, but just recently we also

introduced review in the Codeex CLI. So,

you can have a review in the terminal as

well.

>> So, very often you want to review the

code already before it makes it into the

GitHub. For that purpose, CLA is perfect

and it also can execute more things

locally on your computer. So you could

just slash review in order to tell the

model to review your current changes.

And we're going to improve on this

feature a lot over the next weeks.

>> So it's basically like making sure all

of your changes that you're introducing

locally don't have like bugs uh before

you submit them to to a PR.

>> Exactly.

>> Amazing. Well, there you have it. Codeex

code review. So with Codeex, you now

have an amazing coding agent to pair

with locally in the terminal or in your

code editor. An awesome teammate to send

tasks to in the cloud at any time. You

also have this ability for Codeex to

review your changes locally or your PRs

on GitHub before your co-workers even

get to them. It's going to help you

catch bugs before they hit production.

And we can't wait to see how that's

going to accelerate your team and

ultimately help you ship better and

safer products. Thanks for watching.

Back to work.

>> Indeed.

Loading...

Loading video analysis...