Build beautiful frontends with OpenAI Codex
By OpenAI
Summary
## Key takeaways - **Codex: Your AI Teammate for Coding**: Codex functions as an AI teammate accessible across various platforms like CLI, IDE extensions, and cloud, enabling users to send tasks from the web or mobile phones. [00:06], [00:14] - **Multimodal Vision for Self-Correction**: Codex's multimodal capabilities allow it to not only understand visual input but also to check its own work visually, enabling tighter iteration loops. [00:19], [00:45] - **Whiteboard to Code with Visual Input**: Users can whiteboard design ideas, take a photo, and send it to Codex via a prompt to redesign app screens, incorporating elements like 3D globes and destination details. [01:16], [01:51] - **Data Visualization for Persuasion**: Codex can process open data, like New York City taxi information, to build dashboards and visualizations, serving as a powerful tool for presenting data to win over stakeholders. [04:15], [04:51] - **Iterative Design from Sketch to App**: Codex supports a spectrum of design fidelity, from thin napkin sketches to specific component screenshots, allowing it to fill in details or match exact visual requirements. [05:13], [05:20] - **Automated Responsiveness Checks**: Codex can generate multiple screenshots to verify responsiveness across different resolutions, including desktop and mobile views, and can be prompted to check designs in various modes like dark mode. [06:44], [07:05]
Topics Covered
- How AI visually self-corrects front-end code iterations.
- Visualize complex data with throwaway web apps.
- Turn napkin sketches into interactive apps instantly.
- Automated visual testing ensures responsive, consistent design.
- Multimodal AI expands beyond web to mobile, desktop.
Full Transcript
Hey everyone, I'm Roman. Codex is your
AI teammate that you can pay with
everywhere you code. Whether it's on
your computer with Codex CLI or the ID
extension or Codex cloud that you can
send tasks to anytime from the web or
your mobile phone. But one superpower we
really wanted to zoom in today is its
multimodal capabilities. But it's even
more magical when the model can have
vision understanding but also the
ability to check visually its own work.
Today I'm joined by Channing who helped
train the model to tell us more. Hi, I'm
Channing from the research team for
Codex. One of the big things we've been
focusing on is trying to give the model
more tools to leverage its multimodal
capability to just be a better software
engineer. In the same way that I might
check my own work and make sure that
things visually look the way I expect
them to, we want to have the model be
able to do that in a tight iteration.
>> That's awesome. Why don't we kind of go
through an example to see how it works?
>> Sure. We have one the last like one of
the demo apps we showed at a previous
dev day. It has one screen to kind of
discover destination and one assistant
where I can just type in some place.
Voila. Harris, what would you like to
know about?
>> So pretty awesome, right? But I think we
can do better with codeex now. We have
the ability to kind of take it to the
next level. So I thought why don't we
kind of whiteboard together how we could
make like this home screen more
engaging.
>> Sure. You could have like a globe in 3D.
>> Okay.
>> And as the user kind of spins the globe,
they could see the pens to explore. They
could also navigate left and right.
>> Sure. And the extra details for each of
the cities, too. Right.
>> Right. Like Tokyo. Mhm. And like all of
the details that might be like useful
for the destination.
>> Sure.
>> I think that that sounds like a good
place to start, right?
>> We should be able to just take a quick
photo of the uh the app as we sketch it
up here. We'll just go into chat GPT.
>> Yeah. Add a codeex task. Put in the
photo
and then you should be able to just
describe what you want.
>> Redesigned the home screen of Wonderlust
to show a 3D spinning globe on the left.
Details on the destination on the right.
The user should be able to fluidly
navigate across the globe. When they
click on the pen, they should see the
destination. And you can also map the
left and right arrows of the keyboard.
Boom. I'll just send that prom codeex.
>> Perfect. I think it should be able to do
that.
>> Perfect. Also, while we're sending that
task off, we could add an extra screen
to the app. Maybe we call it something
like the travel log. Okay.
>> Kind of like dashboard of their stats
and all that.
>> Uh, we could do like a full checklist if
you wanted to get all the contents.
>> Oh, great. I love this. Continents
checklist.
>> Could do like bottles of wine drunk
personally.
>> That's a great one. Okay. Bottles of
wine.
>> Sure.
>> Uh maybe photos taken as well.
>> Okay.
>> Maybe we'll come up with a pie chart or
whatever makes sense.
>> All right. Add one more screen to the
app called travel log. It's like a
dashboard of fun and interesting stats
for the user. Make sure the app is
responsive on mobile and make sure the
design is also consistent with
everything else. All right. And I just
send this to Codex
>> and we'll let it work.
>> So while this is working, I was thinking
like could you uh maybe walk us through
an example or two that you've already uh
use Codex multimodel capabilities for?
>> Sure. Love to.
>> Let's do it. Uh how I would start
usually is I would make a change. I'd
ask Codex to do it. It would, you know,
most of the time do the right thing.
Sometimes better than I expected it to.
Sometimes I want to make some tweaks and
I would take a screenshot and maybe send
that back into the app. Some of the
other people who are doing, you know,
front end development full-time as
opposed to as a side thing, they're uh
using the Playright MCP. So actually
giving the tool in the same way that the
cloud has the ability to open a browser
and look at the web app they're running
in their loop they're able to have the
model take a look at the the application
as they're working on it and check its
own work.
>> So that's when you use Codex CLI locally
for instance that you can use that and
CEDX cloud would do the same here for
the task we just sent.
>> Exactly. And then in the cloud we give
it a set of tools that are hopefully
expressive and flexible so it can
accomplish its goals and then if you do
the same thing through the MCP it can do
the same.
>> Yeah. It's really amazing how we can
harness those like aentic coding
capabilities, right? Whether it's in a
cloud container or like locally with
these tools. Do you have like an example
for instance that you tinkered with and
be curious to see?
>> Yeah. So, a lot of times uh you're
trying to like win someone over and one
of the best ways to do that is to
present a lot of data and it would be
you know great to work through stuff on
a whiteboard and you know build up
graphs and stuff but it's hard to to
present the data that way. Um, and so
actually a use case a lot of people have
been doing recently is they have codecs
like turn over the data or like dive
deep into a very complicated codebase
and present some visualizations or break
things down and build just like
sometimes a throwaway web application.
And so it's just like a single page that
they can, you know, they don't even need
to keep the artifact around, but you
might share the screenshot with someone.
>> That's such a fascinating use case.
>> And so I actually before we started
this, I uh gave the model uh a bunch of
open data. So New York City actually
publishes like taxi cab information. So
information about rides around the city.
And so I actually loaded that just the
data into a container and let the model
train on that to try to build a
dashboard. And so it actually turned out
some pretty interesting stuff, you know,
at least different theming
>> and different ways to get a structure
and present the data.
>> Mhm. It allows you to add basically
exactly as much fidelity as you want. So
like when we were over at the
whiteboard, you can do like a very thin
sketch and like have the model fill in
the details. If you wanted to take a
screenshot of a very specific like I
want my component to look this way, you
can feed that in and it'll it'll try its
best to accomplish that.
>> Yeah. You can go all the way from like a
napkin sketch like to a Figma mo
application.
>> Mhm.
>> I don't have to check this out. I don't
have to run it local. I can just like
take a look at the top level what design
I like and then I can iterate from
there.
>> Why don't we go back now and check uh
the two tasks that we sent from the
whiteboard?
>> So I guess this first one is uh where we
asked it to do the 3D globe and it looks
like it did that for us. Oh wow.
>> So looking through the log, yeah, it
pulled in 3JS, so it's actually like
animating this, building it all out. It
has a texture for like how the globe
should look.
>> I mean, it looks promising to me, but
I'm really curious to see if the
animations actually work. Do you want to
check out that PR locally?
>> Let's uh Yeah. So, we create a PR.
>> Yep.
>> And then we should be able to just in
the terminal check it out and then start
the dev server.
>> Oh my god.
>> All right.
>> That's incredible. Oh, the globe is is
spinning exactly like we said. It even
decided to put a tool tip on top
>> to tell how to explore.
>> Uh, that's amazing. And if you click on
one,
>> damn, it works. And it even has the
button to open the assistant.
>> Why don't we now check the travel lock
screen that we wanted to add and see
what Codex came up with.
>> So, it gave us a couple options to look
at.
>> Well, that one is really good. Oh,
>> it really matches the design of the app.
What about the mobile view? Uh,
>> so we asked to make sure that things
were responsive and so you can see it
actually took two screenshots here. It
took one at like the full desktop
resolution or like a common desktop
resolution and the other is a a common
mobile view. And what's nice is it took
like the full screen.
>> Yeah.
>> Even if it's not in that, you know,
above the fold section, you can still
see if there's an error or some, you
know overlap.
>> And I'm sure that can apply to anything,
right? If I were to say make sure it
works in uh dark mode, it would take as
many screenshots as I want.
>> Yeah. Internally, people have done
exactly that. They've run their
component through like light mode, dark
mode, responsive, different sizes. Like
they have uh our design team has like
different stops. They want to make sure
everything looks good at and so you can
actually like make that part of your
prompt that you want to see all these
things and make sure you check it before
it gets PR.
>> What are some other things that you are
excited about uh multimodality and maybe
like things that you're thinking about?
>> Yeah, one of the first tools we gave it
is a browser tool and so it's ability to
to look at a website and a web
application. I think there's a lot of
other multimodal software development
tasks where being able to check your own
work in a tight iterative loop will be
an important thing. So I I think we're
trying to look at how to do uh like
mobile engineering I mean even desktop
applications. Web was really kind of a a
proof of concept to make sure we got the
loop working.
>> Thanks Jenny.
>> No problem. This was a quick tour of
multimodal capabilities in codecs. We
know models perform better when they can
check their own work. And previously we
could only do that with backend code.
But now by harnessing the multimodal and
agentic capabilities of GP5 codeex,
we've also unlocked that for front-end
coding. And we hope this gives you some
ideas for how you can pair with Codex as
a creative partner. To get started, go
to chajgpd.com/codex.
Thanks for watching.
Loading video analysis...