ChatGPT Atlas isn’t just a Chrome wrapper
By Theo - t3․gg
Summary
## Key takeaways - **Atlas: More Than a Chromium Wrapper**: OpenAI's Atlas browser, while built on Chromium, is significantly more complex, featuring a new 'OWL' architecture that separates the Chromium process from the main application. [00:15], [02:56] - **Native Frameworks Create Platform Lock-in**: Atlas's heavy reliance on Apple's native UI frameworks like Swift UI, AppKit, and Metal for its user interface makes porting to other platforms, especially Windows, extremely challenging and likely a major reason for project difficulties. [04:57], [05:07] - **IPC Complexity and Security Concerns**: The extensive use of IPC (Inter-Process Communication) via Mojo, with custom Swift and TypeScript bindings, introduces significant complexity and potential security vulnerabilities by exposing more client-side capabilities. [09:17], [13:10] - **Input Event Handling is a Maintenance Nightmare**: The intricate process of translating macOS events to web input events and back, especially when pages don't handle them, creates a complex and fragile system prone to maintenance issues and potential breakage. [21:37], [23:45] - **Agent Mode Compositing Challenges**: For agentic browsing, Atlas must composite UI elements like dropdowns that render outside tab bounds into a single screen image for the AI model, requiring reverse-engineering of macOS rendering to create synthetic views. [27:47], [28:08] - **Depot Boosts Build Times Dramatically**: Helium, a Chromium-based browser, reduced its build times from over 40 hours to under seven by using Depot, significantly improving development velocity and deployment to multiple platforms. [02:02], [02:07]
Topics Covered
- Does Swift UI doom Atlas's Windows future?
- Decoupling Chromium creates IPC binding trauma.
- Will OpenAI's Atlas architecture become a maintenance nightmare?
- Agent mode hides terrifying rendering and input complexities?
Full Transcript
It feels like everyone's doing their own
chromium rappers nowadays. From my
search engine duck.go to Microsoft with
Edge to Perplexity with whatever the
heck's going on with Comet to my friends
over at a puntnet building Helium. But
what about OpenAI? Atlas is definitely a
Chromium rapper, right? Well, it is. And
it's also something much scarier. I just
read and dove deep on Owl, which is
their new architecture for the Chhat GBT
base browser Atlas. I filmed a different
intro a couple minutes ago before I had
read this article.
I saw demons.
This was a terrifying read. It's really
cool that they shared all of this
architecture, but being that I've spent
as much time in Electron and Chromium as
I have in the past, this one scared me.
So, if you like big technical deep dives
and or watching me spiral into chaos and
depression, you're probably going to
enjoy this video a lot.
This has everything you don't want. From
chaotic rewrites of things that
shouldn't be touched to re-transing
events in many different directions from
higher level to lower level and back
again to me just getting stressed and
angry.
This one was a journey and I hope you
enjoy it as much as I didn't. All of
that said, I'm going to need therapy
after this one. So, let's do a quick
word from today's sponsor so I can
afford it. Are you tired of waiting on
builds? Then you'll love today's
sponsor, Depot. These guys will take
your builds, your CI, and everything
else that you're dealing with on GitHub
actions and make it way, way faster,
especially those Docker builds.
Previously, I was talking about this
based on the testimonials that they had
and the cool things I'd looked into when
I played with it on the side. But now, I
have an actual story to tell you guys.
As you all know, I switch browsers a a
good bit, but I've settled on two.
You're seeing one of them right here,
Zen, but the one I use for my day-to-day
tends to be Helium, because I need a
browser that has Chromium stuff.
Chromium just makes my life much easier.
Helium is built by two kids that are
working their butts off to make a better
browser. That is a fork of Chromium.
That is not easy to maintain, much less
build. And before they made the switch
to depot, their builds were 40 plus
hours long. Now they're under seven for
all platforms. And they were also able
to roll out to Mac and Linux much
smoother as a result of moving to Depot.
So it's got to be super expensive,
right? It actually ends up being way
cheaper. Not only is depot going to make
your actions literally five to 50 times
faster in some scenarios, they're also
going to make the build cheaper in the
end. It's one of those things that's
just such a no-brainer. I don't know why
people aren't trying it. You can take my
word for it or read all the other
stories from how Post Hog cut their
build times by 55x, how Jane cut their
GitHub action costs in half and
increased throughput by 25%, or the
crazy stories I just told about Helium.
Your engineers are too expensive to be
sitting around waiting for builds all
day. Solve them now at soy./ /do. Oh
boy, it's been a bit since I got to dig
deep on Chromium stuff. I'm actually
really excited for this. So, uh, let's
dive in. How we built OWL, the new
architecture behind our chatbt based
browser, Atlas. I like saying chatbt
based and not Chromiumbased, but sure,
we'll go with it.
Last week, we launched chatbt Atlas,
which is a new way to browse the web
with Chat GBT by your side. In addition
to being a fully featured web browser,
Atlas offers a glimpse into the future.
A world where you can bring JTBt with
you across the internet to ask
questions, make suggestions, and
complete tasks for you. In this post, we
unpack one of the most complex
engineering aspects of the product. How
we turn ChatBt into a browser that gets
more useful as you go. You mean how you
added ChateBT to Chromium? Sure. But uh
I think the end side will be worth it.
As cringe as this intro is, making
SHATBT a true co-pilot for the web meant
reimagining the entire architecture of a
browser, separating Atlas from the
Chromium runtime. This entailed
developing a new way of integrating
Chromium that allows us to deliver on
our product goals. Instant startup
responsiveness even as you open more
tabs and creating a strong foundation
for agentic use cases. Shaping the
foundation, Chromium was a natural
building block. It provides a
state-of-the-art web engine with a
robust security model, established
performance credentials, and peerless
web compatibility. Furthermore, it's
developed by a global community that
continuously improves it. It's a common
go-to for modern desktop web browsers. I
like they said it's developed by a
global community and are intentionally
just like not mentioning Google anywhere
in here. Very fun. Like Google is the
main entity driving this development.
There are people all around the world
contributing to Chromium, but 98% of the
work is coming from Google employees.
The amount of public free work that
Google is paying for for Chromium is
something they deserve a little more
respect for. I get everybody thinks that
Google's trying to own the web. They
don't have to make Chromium open source.
They don't have to contribute millions
of dollars a month to its development,
but they do because they want the web to
win as much as they want to win
themselves. Rethinking the browser
experience with a sidebar for chatting.
Our talented design team has ambitious
goals for our new user experience,
including rich animations and visual
effects for features like agent mode.
This required our engineering team to
leverage the most modern native
frameworks for our UI. Swift UI, AppKit,
and Metal instead of simply reskinning
the open source Chromium UX. Fun. I like
that all of the modern native frameworks
for UI are Apple here. Is it still Apple
only? Pretty sure it's Mac only, right?
Yeah. Download for Mac OS. It's still
Mac only. Fun. Knowing that they built
this so heavily in with Swift UI,
AppKit, and Metal, I wish them luck.
Swift UI barely works on Mac OS. I
cannot fathom trying to get that working
well in Windows. I honestly think the
big bet that browser company made on
trying to make Swift UI work for the
browser is a big part of why the Windows
project failed and probably a big part
of why they've lost motivation to
maintain Arc entirely. Swift UI should
have been much better than it is. I'm I
talked to a lot of iOS people and
there's a varied opinion on Swift UI. a
lot of people who love it and a lot more
that want to love it and are just burnt
by all of the churn and changes and
performance issues. You're getting a lot
of the problems that you get from React
Native when you move to Swift UI, which
is a crazy thing to say because it was
meant to kill React Native, but uh it is
what it is. But they still are using
AppKit as well in parallel, which is
kind of crazy. And then Metal's called
out, which is the lower level rendering
engine, similar to like a DirectX type
thing for Macs and iOS. This is a weird
set of things to mention here, but yeah,
that means they're building a lot of
native code for their apps and it means
that these things are going to be much
harder to get onto other platforms. So,
if you're sitting around here waiting
for Atlas to come to Windows, good luck.
I'm sure they're running a codeex job in
the background trying to port all this
code over to random.net [ __ ] And I'm
sure that's going about as well as you
would expect. But yeah, I am very happy
with Helium right now, which is my
Chromiumbased browser by my friends over
at Impot.net. Very, very nice browser.
They're customizing the hell out of
Chromium, but they're not doing this
with Swift and Swift UI. They're doing
this with C and C++ the way the original
Chromium codebase does. Well, to be more
accurate, they're doing it with patch
files. It does look like they're
automating a lot of the patching now
with Python, which is cool. But a
significant portion of this codebase is
patch files. That's how they're doing
the work. So they're not adding new
languages or tech in. They are patching
the existing stuff which is very cool
and they're doing a great job with it.
Helium has been incredibly
wellmaintained. Highly recommend it if
you're looking for a Chromium based
browser. Yeah. Anyways, they had other
goals like fast startup times and
supporting hundreds of tabs without
penalizing performance. These goals were
challenging to achieve with Chromium out
of the box which is opinionated about
many details from the boot sequence,
threading models, and tab models. We
considered making substantial changes
here. We wanted to keep our set of
patches against Chromium targeted so
that we could quickly integrate new
versions. Interesting. So they're trying
almost to like keep the Chromium part
separate and light on patches. So when
that changes, the amount of like
integrations they're doing that has to
change is less. They're definitely
building this so that maintenance isn't
that hard if they decide to not fully
support it in the future or if they want
to iterate in different directions with
it. the more they patch Chromium, the
more they have to maintain the Chromium
patches and the Chromium instance within
this project. Makes a lot of sense to
try and keep those separate, but it also
results in the chaos of building your
own separate UI on top of it with a
handful of shims to break out. It'll be
interesting to see how maintainable this
is over time. I would be surprised if
they didn't poach somebody from Arc or
browser company to help with figuring
out the architecture here. To ensure
development velocity was maximally
accelerated, we needed to come up with a
different way to integrate and drive the
Chromium runtime. A litmus test for our
technical investment was not only that
it would enable faster experimentation,
iteration, and delivery of new features,
but it would also enable us to maintain
a core part of OpenAI's engineering
culture, shipping on day one. Every new
engineer makes merges a small change in
the afternoon of their first day. We
needed to make sure this was possible,
even though Chromium can take hours to
check out and build. Yeah, Chromium is
not a fast build process. Our solution,
OWL. Our answer to these challenges was
to build a new architecture layer that
we call OWL, the OpenAI web layer. The
OWL is our integration of Chromium,
which entails running Chromium's browser
process outside of the main Atlas app
process. Oh boy.
Oh boy. I have a lot of trauma around
the Electron and Chromium IPC binding
layer. Oh god.
One of my old projects, my first like
viral GitHub thing is this template I
made called Yerba, which was my best
attempt to try and make a full stack
type- safe binding layer between
Electron the like Electron system code
and the actual client code. So you can
call things between the two reasonably
easily. I put a lot of work into this
project and haven't touched it now for
many years. But I've also heard people
still using this as a reference or even
a basis for their own projects using
Electron because getting that full stack
type safety was obnoxious.
I did this by grabbing this Electron
file from the Electron side that is the
type definitions that I bind to Windows
so you have access to them to try my
best to force it to be type safe. Not
easy or fun. I just remember this
project being really annoying to build
and maintain. Regardless, I've been
through it with these things.
Apparently, Egoist, who's one of my
favorite devs, made a much much more
flushed out version of this with TIPC,
which is like TRPC, but for the IPC
bindings and Electron. That's really
cool.
Also has not been maintained for a
while. Yeah, we've both been through it
with Electron. So, I yeah, I get trauma
as soon as I see this part of this
diagram.
Anyways, think of it like this. Chromium
revolutionized browsers for moving tabs
into separate processes. We're taking
that idea further by moving Chromium
itself out of the main application
process and into an isolated service
layer. The shift unlocks a cascade of
benefits. It is a simpler modern app.
Atlas is built almost entirely in Swifty
and AppKit. One language, one text
stack, one clean codebase.
Oh boy. Oh boy. This is never coming to
Windows. God. If they're specifically
trying to have one clean codebase and
then they're building all this [ __ ] with
Swift UI and AppKit, this is never
coming to Windows ever. And if it is,
it's not going to move anymore. Oh boy.
There's also faster startups because the
Chromium engine can boot asynchronously
in the background and Alison have to
wait. Pixels will hit the screen nearly
instantly. Let's try. I'm pressing enter
now.
That was not that fast.
For comparison, I'm going to close
Helium and I'm pressing enter now.
It was faster. Cool. Yeah, I'm going to
call [ __ ] on that one. I'm sure it's
faster than what they were doing before,
which is booting their app, then booting
Chromium. But yeah, next piece is that
it's isolated from the jank and crashes.
Chromium is a powerful and complex web
engine. If its main thread hangs, Atlas
doesn't. If it crashes, Atlas will stay
up. What are they doing to Chromium
where the main thread is crashing? I
need Waco's opinion on things. I wish he
was here. I'll bug him later. Next piece
is that there are fewer merge headaches
because we're not building on as much of
the Chromium open source UI. Our diff
against upstream Chromium is much
smaller and easier to maintain. It's not
slowing down WCO that much, but I get
it. And then faster iteration. Most
engineers never need to build Chromium
locally. OWL ships internally as a
pre-built binary. So Atlas builds take
minutes instead of hours. Okay, that
part is fair. If you're not touching the
patch files, then you can have a cached
entity that is this right side of the
diagram. So you don't have to rebuild
that when you're playing with this part.
That part makes a lot of sense and I can
see why that'd be a lot faster. That
said, good luck once that Swift UI
project gets big. Swift UI's compile
times are known for being really good
and reasonable. Ah, good luck. So how
does this actually work? At a high
level, the Atlas browser is the OWL
client and the Chromium browser process
is the OWL host. They communicate over
IPC, specifically Mojo.
Oh man, I haven't been this deep in
Chromium stuff in a minute. Oh god, this
hurts. If you weren't familiar, Mojo is
one of the ways that you can do IPC
inside of Chromium to get things in and
out of the like backend processes.
Backend means a very different thing
here, but you get the idea. It's the
part that is running in the C++ native
layer. And you expose these by making a
mojo m file which has an interface
definition.
Getting header file vibes. This is this
is a world I am so thankful to be out of
at this point. Yeah. See the pingable.h.
This is the header file that goes
alongside the mojo m file in order to
bind things together and make this work.
God, I hate all of this. I am so happy
to be free of this world. So these are
all communicating over Mojo which is
Chromium's message passing system. They
wrote custom Swift and even TypeScript
bindings for Mojo. So the Swift app can
call host side interfaces directly. Yep.
I would be curious to see how the
TypeScript definitions there work.
That's got to be painful. The OWL client
library exposes a simple public swift
API which abstracts several key concepts
exposed by the host service layer. The
session which allows you to configure
and control the host globally. The
profile which lets you manage the
browser state for specific user
profiles. web view which allows for
controlling and embedding individual web
contents like render input navigate zoom
etc. The web content render which allows
you to forward input events into
Chromium's rendering pipeline and
receive feedback. So if somebody clicks
something inside of your like Swift UI
layer and you want to pass that down to
one of the renderer processes that let
you do that and then the layer host and
client layers which exchange compositing
information between the UI and Chromium.
God, I this was not as vibe coded as I
thought. Terrifying.
I do have a a friend and active chatter
hanging out here, NMG, saying there's a
whole lot that they wish they could say
but can't. Let me drop my suspicions
about the potential security holes that
my friends in chat may or may not have
found already. I suspect that by
exposing many more things over Mojo, in
particular via TypeScript, that there
are now things that are exposed in the
client layer, as in the client that is
rendering your JavaScript, that might
not have been intended, which would
hypothetically allow for someone from
JavaScript, to execute things on that
native host layer by calling the
bindings. Am I on to something here,
NMG? Yes, no, or can't say are all good
answers. Okay, it's not that special.
Okay, I'm thinking I'm going 22 200 IQ
for somebody that's 50 IQ in the space.
Good to know. Anyways, more diagramming
of how this is broken up. We have the
web view and Swift and the Mojo IPC
bindings handle things like navigate,
stop, reload, all going to the owl
process, which is the web view host in
C++. And this is how they go back and
forth between them. Fun. There's also a
wide range of service endpoints for
managing highle features like bookmarks,
downloads, extensions, and autofill.
They had to reimplement everything for
this then, huh? I thought this was a
much more minimal Chromium wrapper than
it is. Yeah, I just looking at this now,
it's silly, but like the way the tab bar
works, this isn't a Chromium tab bar. I
I fell for this. I thought this was much
more minimal a Chromium wrapper than it
is. Yeah, that the way the like open tab
moves is something that no one would
bother doing with the C++ code. It's
also why the settings page is entirely
different. This broken out dialogue for
settings is not something that would
exist in the standard Chromium browser.
Very interesting. Very interesting. They
might have put more work in here than we
thought before. Oh yeah, that is such a
Swift UI text box. That is such a Swift
UI text box. the like the cursor being
right at the start of the filler text
that wasn't removed properly because
that's not a default behavior for inputs
in Swift UI for some [ __ ] reason.
Yeah, this is Swift UI for sure. Cool.
So, how do they actually get the pixels
across the boundary? Because that's the
interesting thing. If the web view host
owl process, like the actual Chromium
part is doing the rendering of the page,
how do they get that into the Swift UI
layer at all? There's no way they're
just IPCing up all the the pixels and
data, right? Please tell me they're not
doing that. Web views, which share a
mutually exclusive presentation space in
the client app, are swapped in and out
of a shared compositing container. For
example, a browser window often has a
single shared container visible.
Selecting a tab in the tab script sways
that tab's web view into the container.
On the Chromium side, this container
corresponds to a GFX accelerate widget
uh binding, which is ultimately backed
by a CA layer. expose that layer's
context ID to the client where an NS
view embeds it using the private CA
layer host API. Okay, so they're passing
the identifier for this like container
context up to the view where it can be
rendered nativelyish
interesting.
Oh god, select dropdowns render in a
separate layer. Oh god,
their solution for that is interesting.
They don't have a content col web
contents, but they do have content colon
render widget host view with their own
accelerated widget. So the same
delegated rendering model applies. God.
But now they have to like conditionally
when you open up a select tag on a page
change which parts are being rendered
through this pipe. Apparently they're
using a private API for this. Definitely
not a thing that can go wrong. Oh god.
Oh god. I I have a feeling that this
approach they have taken is going to
cause more problems for maintenance over
time rather than less because if any
changes happen to how this accelerated
widget pipe works or how they hoist
things through this uh like the there's
a layer in the browser now I forgot the
term for it. There we go. That that's
the thing I was thinking of and of
course Jay is the writer here. Yeah, the
top layer is this new layer they're
putting in as part of full screen API to
handle things like dialogue support and
all of that higher up above. So you
don't need all of these crazy packages
to manage tool tips that don't break due
to overflow rules. It's a very annoying
layer and that's a change that they have
been working on for a while. That
article is 2022 and these things are
still rolling out. Chromium doesn't
really care about breaking changes in
how things work internally because
they'll eat it. They'll just pay the
cost and rewrite entire layers of
Chromium and most people will never
notice. Atlas is absolutely going to
notice because if they change how that
top layer works, good luck with your
bindings. Owl internally keeps view
geometry in sync with the Chromium side.
So the GPU compositor can be updated
accordingly and can always produce layer
contents of the correct size and device
scale. The fact that they're even
calling this out suggests that it didn't
do that before and different device
scales and text rendering was breaking
things. And once again, I will say good
luck making this work on Windows where
the text scaling is a more a vague
suggestion than an actual spec. Oh god,
I do not envy whatever engineers have
been tasked on making this work in
Windows somehow because I know there's a
team in there working on it that's
convinced they can do it and is
suffering immensely. We also reuse this
technique to selectively project
elements of Chromium's own native views
UI into Atlas. This is also useful for
bootstrapping features like permission
prompts quickly without building
replacements from Scratch and Swift UI.
That's fun that they have to project
Chrome views in for certain things. This
technique borrows heavily from
Chromium's existing infrastructure for
installable web apps on Mac OS. Oh man,
not the the PWA and installable Chromium
app stuff. They're not even maintaining
that anymore. Google's fully deprecated
installable Chrome apps outside of
Chrome OS. So, whatever stuff you're
borrowing here is unmaintained and is
going to cause problems in the near
future, almost certainly. God, I wish
him luck with that. This is this is a
very complex project that's going to
have a lot of maintenance issues in the
future if I understand anything about
Chrome. Let's talk about how they handle
input events. This is going to scare me.
Cracking and forwarding. The Chromium UI
translates platform events like a Mac OS
NS event. So like things happening in
the mouse layer for example, they'll
translate those into Blink's web input
event model before they get forwarded to
the renderers. So the renderers in
Chromium don't know anything about how
Mac OS handles inputs. They have a layer
above that does that before it gets
passed over. But since Owl runs Chromium
in a hidden process, they have to do
that translation themselves in the Swift
client library and then forward the
translated events down to Chromium. Oh
god. I
The scientists were so enthralled by
whether or not they could that they
never stopped to ask how [ __ ] hellish
will this be to maintain and port to
Windows. God, can you see how red my
face is getting? This is stressing me
out. I I don't know if you guys know
this, but I built a big Electron app
when I was at Twitch. We were doing
Twitch Studio. We were trying to build
an OBS alternative from scratch in
Electron. I've seen the skeletons in
these closets. They are horrifying.
There are far too many. Like they are
choosing to make this harder than we
did. And we were trying to build a video
rasterization pipeline with JavaScript
bindings. I I do not envy any of the
people who have to work on this. I I
respect you, but I do not envy you. God,
this hurts.
every single event that can happen in
this higher level. Be it a mouse event,
a key, a copy paste, a screen reader
request. Does this even work with screen
readers? I don't have one set up on this
machine right now. I usually keep a
machine that's set up with like a ton of
accessibility tools just for testing my
own services. I don't have one right now
cuz I wiped my other machine to give it
to an employee. I would be very
impressed if this doesn't break
accessibility immensely. Once they have
this forwarding layer built, they follow
the same life cycle that real input
events would normally follow for web
content. That includes having events
returned back to the client whenever a
page indicates that it didn't handle the
event. When this happens, we reynthesize
an NS event and give the rest of the app
a chance to handle the input.
Are you kidding?
Are you kidding? Okay. Um,
let's let's diagram what they just
described there because this hurts me.
So let's say you have a a Mac OS app
which obviously is a white square and
then you click something in it. Now when
this happens in the Mac OS app, the Mac
OS app processes this as an NS event.
What's actually happening if we were to
split this up a bit more is Mac OS
itself is the layer that receives the
click and effectively translates this IO
this behavior this thing into an NS
event click. So the IO thing that
occurred, which is the click that
happened and was processed by Mac OS,
gets turned into this NS event that gets
sent to your Mac OS app in this case in
Swift could be whatever. You get the
idea. We have this event, but this type
of event isn't something that the
Chromium process can handle. I'm also
going to make these solid because
they're actual things. And this process
we're talking about here is more uh I I
don't know how to describe it, but I
think that the spaced out lines here,
the dashes will do a good enough job of
it for me. I know that there are events
that are much deeper in this pipeline
NMG, but we never translate down. We
only translate up, which is what makes
this terrifying. Apparently, the screen
reader does seem to work, which is good.
But yeah, where this gets scary isn't
the layering here where the NS event
gets translated by that Mac OS app in
the Swift code over to be a web event.
Web input event to be specific. Where
things get scary is if this web input
event does not have what it's supposed
to. So if you're clicking on something
that's in the app and the browser
process refuses it because you're
clicking somewhere where there isn't
anything. If I have some text selected
and I right click on it, the web page
might have a hijack for right click and
it has to handle that. The web page
might also almost certainly not have
that. So now we have to do that event.
So if it's a left click and the page
handles it, cool, the page handled it.
If it's something else and the page
doesn't handle it, we need to figure out
what we were supposed to do, which means
we have to send it back to our app as a
re-ransated NS event. So, what ends up
happening if you were to do a rightclick
on a page that doesn't have a binding
for it is we send the click event to the
Mac OS app, the event is translated into
a web input event that is sent to the
web page. And if the web page doesn't
respond, then there is code there now. I
don't even know where that [ __ ] code
would live at this point that sends back
a hey, we didn't use this which forces
it to be retransated back to an NS event
because they never sent the NS event
down in the first place. They just sent
the web input event. So they have to
recreate a fake synthetic NS event once
it has that and it isn't hitting an
event in the browser. That's Jesus
[ __ ] Christ.
when the event is returned, they have to
reynthesize an NS event and give the
rest of the app a chance to handle the
input. So, so to be even more fun here
with this diagram,
this re-ransated event isn't re-ransated
yet. It's a rejected event and once that
rejected event comes in,
it gets piped back in as a re-ransated
NS event.
God,
I I don't envy the work that they have
to do for all this. Oh god, agent mode.
I've been so distracted by the Chromium
side, I didn't even think about what
happens when you let the agents control
things, too, and how absurdly complex
those bindings must be, and how all of
those are certainly written in Swift,
which is going to make it impossible to
run anywhere else. Oh boy. Atlas's
agentic browsing features pose some
unique challenges for our approaches to
rendering input event forwarding and
data storage. Our computer use model
expects a single image of the screen as
an input. But some UI elements like a
select drop down render outside of the
tabs bounds in separate windows. In
agent mode, we have to composite those
pop-ups back into the main page image at
the corresponding and correct
coordinates so the model sees the full
context in one frame. Do you understand
what that means? They had to reverse
engineer how Mac OS renders these layers
to fake it in a screenshottable layer so
that it will make a synthetic image of
what the user would be seeing so that
the computer use model can go do the
right thing.
[ __ ] Christ.
For input, we apply the same principle.
Agent generated events are routed
directly to the renderer, never through
the privileged browser layer. That
preserves the sandbox boundary even
under automated control. For example, we
don't want this class of event to
synthesize keyboard shortcuts that make
the browser do things unrelated to the
web content being shown. That kind of
makes sense because again the events
only become like OS or browser events if
the Chromium process rejects them. So
they just have a short circuit here
where that request comes back that
checks like did an agent do this
and if the agent did this it won't
forward it back to a Mac OS command but
if the agent didn't do it then it will
and I am sure there are no security
holes in that implementation whatsoever.
God debugging this must be absolutely
hellish. I see you like complexity so I
added complexity to your complexity.
Asian browsing can also run in an
ephemeral logged out context. Instead of
sharing the user's existing incognito
profile, which could leak state, we use
Chromium's storage partition
infrastructure to spin up isolated
in-memory stores. Each agent session
starts fresh, and when it ends, all
cookies and site data are discarded. You
can run multiple logged out agent
sessions, each one in its own browser
tab, and each fully isolated from the
other. Yeah, that's not that hard or
scary. It's one of the least scary
things so far. Let's see the wrap-up
here. new way to use the web. None of
this would be possible without the
global Chromium community and their
incredible work building a foundation
for the modern web. Owl builds on that
foundation in a new way, decoupling the
engine from the app, blending a
worldclass web platform with modern
native frameworks, and unlocking a
faster and more flexible architecture by
rethinking how a browser holds Chromium.
We're creating space for new kinds of
experiences, smoother startups, richer
UI, tighter integrations with the rest
of the OS with Mac OS, and a development
loop that moves at the speed of ideas.
That sounds like your kind of challenge.
Check out the openings to work on Atlas
for their software engineers for Atlas,
iOS, and more. Oh boy,
this was quite a read. I am pumped that
they went this indepth on all of these
things, even if it's giving me
existential dread and anxiety thinking
about all of the things that they've had
to do here. God, I don't miss working in
Chromium on these levels at all.
I've said all I have to here. Chromium
makes me scared. I don't want to talk
about this any more than I have to, so
I'm going to go. Thank you as always,
nerds.
Loading video analysis...