LongCut logo

ChatGPT Atlas isn’t just a Chrome wrapper

By Theo - t3․gg

Summary

## Key takeaways - **Atlas: More Than a Chromium Wrapper**: OpenAI's Atlas browser, while built on Chromium, is significantly more complex, featuring a new 'OWL' architecture that separates the Chromium process from the main application. [00:15], [02:56] - **Native Frameworks Create Platform Lock-in**: Atlas's heavy reliance on Apple's native UI frameworks like Swift UI, AppKit, and Metal for its user interface makes porting to other platforms, especially Windows, extremely challenging and likely a major reason for project difficulties. [04:57], [05:07] - **IPC Complexity and Security Concerns**: The extensive use of IPC (Inter-Process Communication) via Mojo, with custom Swift and TypeScript bindings, introduces significant complexity and potential security vulnerabilities by exposing more client-side capabilities. [09:17], [13:10] - **Input Event Handling is a Maintenance Nightmare**: The intricate process of translating macOS events to web input events and back, especially when pages don't handle them, creates a complex and fragile system prone to maintenance issues and potential breakage. [21:37], [23:45] - **Agent Mode Compositing Challenges**: For agentic browsing, Atlas must composite UI elements like dropdowns that render outside tab bounds into a single screen image for the AI model, requiring reverse-engineering of macOS rendering to create synthetic views. [27:47], [28:08] - **Depot Boosts Build Times Dramatically**: Helium, a Chromium-based browser, reduced its build times from over 40 hours to under seven by using Depot, significantly improving development velocity and deployment to multiple platforms. [02:02], [02:07]

Topics Covered

  • Does Swift UI doom Atlas's Windows future?
  • Decoupling Chromium creates IPC binding trauma.
  • Will OpenAI's Atlas architecture become a maintenance nightmare?
  • Agent mode hides terrifying rendering and input complexities?

Full Transcript

It feels like everyone's doing their own

chromium rappers nowadays. From my

search engine duck.go to Microsoft with

Edge to Perplexity with whatever the

heck's going on with Comet to my friends

over at a puntnet building Helium. But

what about OpenAI? Atlas is definitely a

Chromium rapper, right? Well, it is. And

it's also something much scarier. I just

read and dove deep on Owl, which is

their new architecture for the Chhat GBT

base browser Atlas. I filmed a different

intro a couple minutes ago before I had

read this article.

I saw demons.

This was a terrifying read. It's really

cool that they shared all of this

architecture, but being that I've spent

as much time in Electron and Chromium as

I have in the past, this one scared me.

So, if you like big technical deep dives

and or watching me spiral into chaos and

depression, you're probably going to

enjoy this video a lot.

This has everything you don't want. From

chaotic rewrites of things that

shouldn't be touched to re-transing

events in many different directions from

higher level to lower level and back

again to me just getting stressed and

angry.

This one was a journey and I hope you

enjoy it as much as I didn't. All of

that said, I'm going to need therapy

after this one. So, let's do a quick

word from today's sponsor so I can

afford it. Are you tired of waiting on

builds? Then you'll love today's

sponsor, Depot. These guys will take

your builds, your CI, and everything

else that you're dealing with on GitHub

actions and make it way, way faster,

especially those Docker builds.

Previously, I was talking about this

based on the testimonials that they had

and the cool things I'd looked into when

I played with it on the side. But now, I

have an actual story to tell you guys.

As you all know, I switch browsers a a

good bit, but I've settled on two.

You're seeing one of them right here,

Zen, but the one I use for my day-to-day

tends to be Helium, because I need a

browser that has Chromium stuff.

Chromium just makes my life much easier.

Helium is built by two kids that are

working their butts off to make a better

browser. That is a fork of Chromium.

That is not easy to maintain, much less

build. And before they made the switch

to depot, their builds were 40 plus

hours long. Now they're under seven for

all platforms. And they were also able

to roll out to Mac and Linux much

smoother as a result of moving to Depot.

So it's got to be super expensive,

right? It actually ends up being way

cheaper. Not only is depot going to make

your actions literally five to 50 times

faster in some scenarios, they're also

going to make the build cheaper in the

end. It's one of those things that's

just such a no-brainer. I don't know why

people aren't trying it. You can take my

word for it or read all the other

stories from how Post Hog cut their

build times by 55x, how Jane cut their

GitHub action costs in half and

increased throughput by 25%, or the

crazy stories I just told about Helium.

Your engineers are too expensive to be

sitting around waiting for builds all

day. Solve them now at soy./ /do. Oh

boy, it's been a bit since I got to dig

deep on Chromium stuff. I'm actually

really excited for this. So, uh, let's

dive in. How we built OWL, the new

architecture behind our chatbt based

browser, Atlas. I like saying chatbt

based and not Chromiumbased, but sure,

we'll go with it.

Last week, we launched chatbt Atlas,

which is a new way to browse the web

with Chat GBT by your side. In addition

to being a fully featured web browser,

Atlas offers a glimpse into the future.

A world where you can bring JTBt with

you across the internet to ask

questions, make suggestions, and

complete tasks for you. In this post, we

unpack one of the most complex

engineering aspects of the product. How

we turn ChatBt into a browser that gets

more useful as you go. You mean how you

added ChateBT to Chromium? Sure. But uh

I think the end side will be worth it.

As cringe as this intro is, making

SHATBT a true co-pilot for the web meant

reimagining the entire architecture of a

browser, separating Atlas from the

Chromium runtime. This entailed

developing a new way of integrating

Chromium that allows us to deliver on

our product goals. Instant startup

responsiveness even as you open more

tabs and creating a strong foundation

for agentic use cases. Shaping the

foundation, Chromium was a natural

building block. It provides a

state-of-the-art web engine with a

robust security model, established

performance credentials, and peerless

web compatibility. Furthermore, it's

developed by a global community that

continuously improves it. It's a common

go-to for modern desktop web browsers. I

like they said it's developed by a

global community and are intentionally

just like not mentioning Google anywhere

in here. Very fun. Like Google is the

main entity driving this development.

There are people all around the world

contributing to Chromium, but 98% of the

work is coming from Google employees.

The amount of public free work that

Google is paying for for Chromium is

something they deserve a little more

respect for. I get everybody thinks that

Google's trying to own the web. They

don't have to make Chromium open source.

They don't have to contribute millions

of dollars a month to its development,

but they do because they want the web to

win as much as they want to win

themselves. Rethinking the browser

experience with a sidebar for chatting.

Our talented design team has ambitious

goals for our new user experience,

including rich animations and visual

effects for features like agent mode.

This required our engineering team to

leverage the most modern native

frameworks for our UI. Swift UI, AppKit,

and Metal instead of simply reskinning

the open source Chromium UX. Fun. I like

that all of the modern native frameworks

for UI are Apple here. Is it still Apple

only? Pretty sure it's Mac only, right?

Yeah. Download for Mac OS. It's still

Mac only. Fun. Knowing that they built

this so heavily in with Swift UI,

AppKit, and Metal, I wish them luck.

Swift UI barely works on Mac OS. I

cannot fathom trying to get that working

well in Windows. I honestly think the

big bet that browser company made on

trying to make Swift UI work for the

browser is a big part of why the Windows

project failed and probably a big part

of why they've lost motivation to

maintain Arc entirely. Swift UI should

have been much better than it is. I'm I

talked to a lot of iOS people and

there's a varied opinion on Swift UI. a

lot of people who love it and a lot more

that want to love it and are just burnt

by all of the churn and changes and

performance issues. You're getting a lot

of the problems that you get from React

Native when you move to Swift UI, which

is a crazy thing to say because it was

meant to kill React Native, but uh it is

what it is. But they still are using

AppKit as well in parallel, which is

kind of crazy. And then Metal's called

out, which is the lower level rendering

engine, similar to like a DirectX type

thing for Macs and iOS. This is a weird

set of things to mention here, but yeah,

that means they're building a lot of

native code for their apps and it means

that these things are going to be much

harder to get onto other platforms. So,

if you're sitting around here waiting

for Atlas to come to Windows, good luck.

I'm sure they're running a codeex job in

the background trying to port all this

code over to random.net [ __ ] And I'm

sure that's going about as well as you

would expect. But yeah, I am very happy

with Helium right now, which is my

Chromiumbased browser by my friends over

at Impot.net. Very, very nice browser.

They're customizing the hell out of

Chromium, but they're not doing this

with Swift and Swift UI. They're doing

this with C and C++ the way the original

Chromium codebase does. Well, to be more

accurate, they're doing it with patch

files. It does look like they're

automating a lot of the patching now

with Python, which is cool. But a

significant portion of this codebase is

patch files. That's how they're doing

the work. So they're not adding new

languages or tech in. They are patching

the existing stuff which is very cool

and they're doing a great job with it.

Helium has been incredibly

wellmaintained. Highly recommend it if

you're looking for a Chromium based

browser. Yeah. Anyways, they had other

goals like fast startup times and

supporting hundreds of tabs without

penalizing performance. These goals were

challenging to achieve with Chromium out

of the box which is opinionated about

many details from the boot sequence,

threading models, and tab models. We

considered making substantial changes

here. We wanted to keep our set of

patches against Chromium targeted so

that we could quickly integrate new

versions. Interesting. So they're trying

almost to like keep the Chromium part

separate and light on patches. So when

that changes, the amount of like

integrations they're doing that has to

change is less. They're definitely

building this so that maintenance isn't

that hard if they decide to not fully

support it in the future or if they want

to iterate in different directions with

it. the more they patch Chromium, the

more they have to maintain the Chromium

patches and the Chromium instance within

this project. Makes a lot of sense to

try and keep those separate, but it also

results in the chaos of building your

own separate UI on top of it with a

handful of shims to break out. It'll be

interesting to see how maintainable this

is over time. I would be surprised if

they didn't poach somebody from Arc or

browser company to help with figuring

out the architecture here. To ensure

development velocity was maximally

accelerated, we needed to come up with a

different way to integrate and drive the

Chromium runtime. A litmus test for our

technical investment was not only that

it would enable faster experimentation,

iteration, and delivery of new features,

but it would also enable us to maintain

a core part of OpenAI's engineering

culture, shipping on day one. Every new

engineer makes merges a small change in

the afternoon of their first day. We

needed to make sure this was possible,

even though Chromium can take hours to

check out and build. Yeah, Chromium is

not a fast build process. Our solution,

OWL. Our answer to these challenges was

to build a new architecture layer that

we call OWL, the OpenAI web layer. The

OWL is our integration of Chromium,

which entails running Chromium's browser

process outside of the main Atlas app

process. Oh boy.

Oh boy. I have a lot of trauma around

the Electron and Chromium IPC binding

layer. Oh god.

One of my old projects, my first like

viral GitHub thing is this template I

made called Yerba, which was my best

attempt to try and make a full stack

type- safe binding layer between

Electron the like Electron system code

and the actual client code. So you can

call things between the two reasonably

easily. I put a lot of work into this

project and haven't touched it now for

many years. But I've also heard people

still using this as a reference or even

a basis for their own projects using

Electron because getting that full stack

type safety was obnoxious.

I did this by grabbing this Electron

file from the Electron side that is the

type definitions that I bind to Windows

so you have access to them to try my

best to force it to be type safe. Not

easy or fun. I just remember this

project being really annoying to build

and maintain. Regardless, I've been

through it with these things.

Apparently, Egoist, who's one of my

favorite devs, made a much much more

flushed out version of this with TIPC,

which is like TRPC, but for the IPC

bindings and Electron. That's really

cool.

Also has not been maintained for a

while. Yeah, we've both been through it

with Electron. So, I yeah, I get trauma

as soon as I see this part of this

diagram.

Anyways, think of it like this. Chromium

revolutionized browsers for moving tabs

into separate processes. We're taking

that idea further by moving Chromium

itself out of the main application

process and into an isolated service

layer. The shift unlocks a cascade of

benefits. It is a simpler modern app.

Atlas is built almost entirely in Swifty

and AppKit. One language, one text

stack, one clean codebase.

Oh boy. Oh boy. This is never coming to

Windows. God. If they're specifically

trying to have one clean codebase and

then they're building all this [ __ ] with

Swift UI and AppKit, this is never

coming to Windows ever. And if it is,

it's not going to move anymore. Oh boy.

There's also faster startups because the

Chromium engine can boot asynchronously

in the background and Alison have to

wait. Pixels will hit the screen nearly

instantly. Let's try. I'm pressing enter

now.

That was not that fast.

For comparison, I'm going to close

Helium and I'm pressing enter now.

It was faster. Cool. Yeah, I'm going to

call [ __ ] on that one. I'm sure it's

faster than what they were doing before,

which is booting their app, then booting

Chromium. But yeah, next piece is that

it's isolated from the jank and crashes.

Chromium is a powerful and complex web

engine. If its main thread hangs, Atlas

doesn't. If it crashes, Atlas will stay

up. What are they doing to Chromium

where the main thread is crashing? I

need Waco's opinion on things. I wish he

was here. I'll bug him later. Next piece

is that there are fewer merge headaches

because we're not building on as much of

the Chromium open source UI. Our diff

against upstream Chromium is much

smaller and easier to maintain. It's not

slowing down WCO that much, but I get

it. And then faster iteration. Most

engineers never need to build Chromium

locally. OWL ships internally as a

pre-built binary. So Atlas builds take

minutes instead of hours. Okay, that

part is fair. If you're not touching the

patch files, then you can have a cached

entity that is this right side of the

diagram. So you don't have to rebuild

that when you're playing with this part.

That part makes a lot of sense and I can

see why that'd be a lot faster. That

said, good luck once that Swift UI

project gets big. Swift UI's compile

times are known for being really good

and reasonable. Ah, good luck. So how

does this actually work? At a high

level, the Atlas browser is the OWL

client and the Chromium browser process

is the OWL host. They communicate over

IPC, specifically Mojo.

Oh man, I haven't been this deep in

Chromium stuff in a minute. Oh god, this

hurts. If you weren't familiar, Mojo is

one of the ways that you can do IPC

inside of Chromium to get things in and

out of the like backend processes.

Backend means a very different thing

here, but you get the idea. It's the

part that is running in the C++ native

layer. And you expose these by making a

mojo m file which has an interface

definition.

Getting header file vibes. This is this

is a world I am so thankful to be out of

at this point. Yeah. See the pingable.h.

This is the header file that goes

alongside the mojo m file in order to

bind things together and make this work.

God, I hate all of this. I am so happy

to be free of this world. So these are

all communicating over Mojo which is

Chromium's message passing system. They

wrote custom Swift and even TypeScript

bindings for Mojo. So the Swift app can

call host side interfaces directly. Yep.

I would be curious to see how the

TypeScript definitions there work.

That's got to be painful. The OWL client

library exposes a simple public swift

API which abstracts several key concepts

exposed by the host service layer. The

session which allows you to configure

and control the host globally. The

profile which lets you manage the

browser state for specific user

profiles. web view which allows for

controlling and embedding individual web

contents like render input navigate zoom

etc. The web content render which allows

you to forward input events into

Chromium's rendering pipeline and

receive feedback. So if somebody clicks

something inside of your like Swift UI

layer and you want to pass that down to

one of the renderer processes that let

you do that and then the layer host and

client layers which exchange compositing

information between the UI and Chromium.

God, I this was not as vibe coded as I

thought. Terrifying.

I do have a a friend and active chatter

hanging out here, NMG, saying there's a

whole lot that they wish they could say

but can't. Let me drop my suspicions

about the potential security holes that

my friends in chat may or may not have

found already. I suspect that by

exposing many more things over Mojo, in

particular via TypeScript, that there

are now things that are exposed in the

client layer, as in the client that is

rendering your JavaScript, that might

not have been intended, which would

hypothetically allow for someone from

JavaScript, to execute things on that

native host layer by calling the

bindings. Am I on to something here,

NMG? Yes, no, or can't say are all good

answers. Okay, it's not that special.

Okay, I'm thinking I'm going 22 200 IQ

for somebody that's 50 IQ in the space.

Good to know. Anyways, more diagramming

of how this is broken up. We have the

web view and Swift and the Mojo IPC

bindings handle things like navigate,

stop, reload, all going to the owl

process, which is the web view host in

C++. And this is how they go back and

forth between them. Fun. There's also a

wide range of service endpoints for

managing highle features like bookmarks,

downloads, extensions, and autofill.

They had to reimplement everything for

this then, huh? I thought this was a

much more minimal Chromium wrapper than

it is. Yeah, I just looking at this now,

it's silly, but like the way the tab bar

works, this isn't a Chromium tab bar. I

I fell for this. I thought this was much

more minimal a Chromium wrapper than it

is. Yeah, that the way the like open tab

moves is something that no one would

bother doing with the C++ code. It's

also why the settings page is entirely

different. This broken out dialogue for

settings is not something that would

exist in the standard Chromium browser.

Very interesting. Very interesting. They

might have put more work in here than we

thought before. Oh yeah, that is such a

Swift UI text box. That is such a Swift

UI text box. the like the cursor being

right at the start of the filler text

that wasn't removed properly because

that's not a default behavior for inputs

in Swift UI for some [ __ ] reason.

Yeah, this is Swift UI for sure. Cool.

So, how do they actually get the pixels

across the boundary? Because that's the

interesting thing. If the web view host

owl process, like the actual Chromium

part is doing the rendering of the page,

how do they get that into the Swift UI

layer at all? There's no way they're

just IPCing up all the the pixels and

data, right? Please tell me they're not

doing that. Web views, which share a

mutually exclusive presentation space in

the client app, are swapped in and out

of a shared compositing container. For

example, a browser window often has a

single shared container visible.

Selecting a tab in the tab script sways

that tab's web view into the container.

On the Chromium side, this container

corresponds to a GFX accelerate widget

uh binding, which is ultimately backed

by a CA layer. expose that layer's

context ID to the client where an NS

view embeds it using the private CA

layer host API. Okay, so they're passing

the identifier for this like container

context up to the view where it can be

rendered nativelyish

interesting.

Oh god, select dropdowns render in a

separate layer. Oh god,

their solution for that is interesting.

They don't have a content col web

contents, but they do have content colon

render widget host view with their own

accelerated widget. So the same

delegated rendering model applies. God.

But now they have to like conditionally

when you open up a select tag on a page

change which parts are being rendered

through this pipe. Apparently they're

using a private API for this. Definitely

not a thing that can go wrong. Oh god.

Oh god. I I have a feeling that this

approach they have taken is going to

cause more problems for maintenance over

time rather than less because if any

changes happen to how this accelerated

widget pipe works or how they hoist

things through this uh like the there's

a layer in the browser now I forgot the

term for it. There we go. That that's

the thing I was thinking of and of

course Jay is the writer here. Yeah, the

top layer is this new layer they're

putting in as part of full screen API to

handle things like dialogue support and

all of that higher up above. So you

don't need all of these crazy packages

to manage tool tips that don't break due

to overflow rules. It's a very annoying

layer and that's a change that they have

been working on for a while. That

article is 2022 and these things are

still rolling out. Chromium doesn't

really care about breaking changes in

how things work internally because

they'll eat it. They'll just pay the

cost and rewrite entire layers of

Chromium and most people will never

notice. Atlas is absolutely going to

notice because if they change how that

top layer works, good luck with your

bindings. Owl internally keeps view

geometry in sync with the Chromium side.

So the GPU compositor can be updated

accordingly and can always produce layer

contents of the correct size and device

scale. The fact that they're even

calling this out suggests that it didn't

do that before and different device

scales and text rendering was breaking

things. And once again, I will say good

luck making this work on Windows where

the text scaling is a more a vague

suggestion than an actual spec. Oh god,

I do not envy whatever engineers have

been tasked on making this work in

Windows somehow because I know there's a

team in there working on it that's

convinced they can do it and is

suffering immensely. We also reuse this

technique to selectively project

elements of Chromium's own native views

UI into Atlas. This is also useful for

bootstrapping features like permission

prompts quickly without building

replacements from Scratch and Swift UI.

That's fun that they have to project

Chrome views in for certain things. This

technique borrows heavily from

Chromium's existing infrastructure for

installable web apps on Mac OS. Oh man,

not the the PWA and installable Chromium

app stuff. They're not even maintaining

that anymore. Google's fully deprecated

installable Chrome apps outside of

Chrome OS. So, whatever stuff you're

borrowing here is unmaintained and is

going to cause problems in the near

future, almost certainly. God, I wish

him luck with that. This is this is a

very complex project that's going to

have a lot of maintenance issues in the

future if I understand anything about

Chrome. Let's talk about how they handle

input events. This is going to scare me.

Cracking and forwarding. The Chromium UI

translates platform events like a Mac OS

NS event. So like things happening in

the mouse layer for example, they'll

translate those into Blink's web input

event model before they get forwarded to

the renderers. So the renderers in

Chromium don't know anything about how

Mac OS handles inputs. They have a layer

above that does that before it gets

passed over. But since Owl runs Chromium

in a hidden process, they have to do

that translation themselves in the Swift

client library and then forward the

translated events down to Chromium. Oh

god. I

The scientists were so enthralled by

whether or not they could that they

never stopped to ask how [ __ ] hellish

will this be to maintain and port to

Windows. God, can you see how red my

face is getting? This is stressing me

out. I I don't know if you guys know

this, but I built a big Electron app

when I was at Twitch. We were doing

Twitch Studio. We were trying to build

an OBS alternative from scratch in

Electron. I've seen the skeletons in

these closets. They are horrifying.

There are far too many. Like they are

choosing to make this harder than we

did. And we were trying to build a video

rasterization pipeline with JavaScript

bindings. I I do not envy any of the

people who have to work on this. I I

respect you, but I do not envy you. God,

this hurts.

every single event that can happen in

this higher level. Be it a mouse event,

a key, a copy paste, a screen reader

request. Does this even work with screen

readers? I don't have one set up on this

machine right now. I usually keep a

machine that's set up with like a ton of

accessibility tools just for testing my

own services. I don't have one right now

cuz I wiped my other machine to give it

to an employee. I would be very

impressed if this doesn't break

accessibility immensely. Once they have

this forwarding layer built, they follow

the same life cycle that real input

events would normally follow for web

content. That includes having events

returned back to the client whenever a

page indicates that it didn't handle the

event. When this happens, we reynthesize

an NS event and give the rest of the app

a chance to handle the input.

Are you kidding?

Are you kidding? Okay. Um,

let's let's diagram what they just

described there because this hurts me.

So let's say you have a a Mac OS app

which obviously is a white square and

then you click something in it. Now when

this happens in the Mac OS app, the Mac

OS app processes this as an NS event.

What's actually happening if we were to

split this up a bit more is Mac OS

itself is the layer that receives the

click and effectively translates this IO

this behavior this thing into an NS

event click. So the IO thing that

occurred, which is the click that

happened and was processed by Mac OS,

gets turned into this NS event that gets

sent to your Mac OS app in this case in

Swift could be whatever. You get the

idea. We have this event, but this type

of event isn't something that the

Chromium process can handle. I'm also

going to make these solid because

they're actual things. And this process

we're talking about here is more uh I I

don't know how to describe it, but I

think that the spaced out lines here,

the dashes will do a good enough job of

it for me. I know that there are events

that are much deeper in this pipeline

NMG, but we never translate down. We

only translate up, which is what makes

this terrifying. Apparently, the screen

reader does seem to work, which is good.

But yeah, where this gets scary isn't

the layering here where the NS event

gets translated by that Mac OS app in

the Swift code over to be a web event.

Web input event to be specific. Where

things get scary is if this web input

event does not have what it's supposed

to. So if you're clicking on something

that's in the app and the browser

process refuses it because you're

clicking somewhere where there isn't

anything. If I have some text selected

and I right click on it, the web page

might have a hijack for right click and

it has to handle that. The web page

might also almost certainly not have

that. So now we have to do that event.

So if it's a left click and the page

handles it, cool, the page handled it.

If it's something else and the page

doesn't handle it, we need to figure out

what we were supposed to do, which means

we have to send it back to our app as a

re-ransated NS event. So, what ends up

happening if you were to do a rightclick

on a page that doesn't have a binding

for it is we send the click event to the

Mac OS app, the event is translated into

a web input event that is sent to the

web page. And if the web page doesn't

respond, then there is code there now. I

don't even know where that [ __ ] code

would live at this point that sends back

a hey, we didn't use this which forces

it to be retransated back to an NS event

because they never sent the NS event

down in the first place. They just sent

the web input event. So they have to

recreate a fake synthetic NS event once

it has that and it isn't hitting an

event in the browser. That's Jesus

[ __ ] Christ.

when the event is returned, they have to

reynthesize an NS event and give the

rest of the app a chance to handle the

input. So, so to be even more fun here

with this diagram,

this re-ransated event isn't re-ransated

yet. It's a rejected event and once that

rejected event comes in,

it gets piped back in as a re-ransated

NS event.

God,

I I don't envy the work that they have

to do for all this. Oh god, agent mode.

I've been so distracted by the Chromium

side, I didn't even think about what

happens when you let the agents control

things, too, and how absurdly complex

those bindings must be, and how all of

those are certainly written in Swift,

which is going to make it impossible to

run anywhere else. Oh boy. Atlas's

agentic browsing features pose some

unique challenges for our approaches to

rendering input event forwarding and

data storage. Our computer use model

expects a single image of the screen as

an input. But some UI elements like a

select drop down render outside of the

tabs bounds in separate windows. In

agent mode, we have to composite those

pop-ups back into the main page image at

the corresponding and correct

coordinates so the model sees the full

context in one frame. Do you understand

what that means? They had to reverse

engineer how Mac OS renders these layers

to fake it in a screenshottable layer so

that it will make a synthetic image of

what the user would be seeing so that

the computer use model can go do the

right thing.

[ __ ] Christ.

For input, we apply the same principle.

Agent generated events are routed

directly to the renderer, never through

the privileged browser layer. That

preserves the sandbox boundary even

under automated control. For example, we

don't want this class of event to

synthesize keyboard shortcuts that make

the browser do things unrelated to the

web content being shown. That kind of

makes sense because again the events

only become like OS or browser events if

the Chromium process rejects them. So

they just have a short circuit here

where that request comes back that

checks like did an agent do this

and if the agent did this it won't

forward it back to a Mac OS command but

if the agent didn't do it then it will

and I am sure there are no security

holes in that implementation whatsoever.

God debugging this must be absolutely

hellish. I see you like complexity so I

added complexity to your complexity.

Asian browsing can also run in an

ephemeral logged out context. Instead of

sharing the user's existing incognito

profile, which could leak state, we use

Chromium's storage partition

infrastructure to spin up isolated

in-memory stores. Each agent session

starts fresh, and when it ends, all

cookies and site data are discarded. You

can run multiple logged out agent

sessions, each one in its own browser

tab, and each fully isolated from the

other. Yeah, that's not that hard or

scary. It's one of the least scary

things so far. Let's see the wrap-up

here. new way to use the web. None of

this would be possible without the

global Chromium community and their

incredible work building a foundation

for the modern web. Owl builds on that

foundation in a new way, decoupling the

engine from the app, blending a

worldclass web platform with modern

native frameworks, and unlocking a

faster and more flexible architecture by

rethinking how a browser holds Chromium.

We're creating space for new kinds of

experiences, smoother startups, richer

UI, tighter integrations with the rest

of the OS with Mac OS, and a development

loop that moves at the speed of ideas.

That sounds like your kind of challenge.

Check out the openings to work on Atlas

for their software engineers for Atlas,

iOS, and more. Oh boy,

this was quite a read. I am pumped that

they went this indepth on all of these

things, even if it's giving me

existential dread and anxiety thinking

about all of the things that they've had

to do here. God, I don't miss working in

Chromium on these levels at all.

I've said all I have to here. Chromium

makes me scared. I don't want to talk

about this any more than I have to, so

I'm going to go. Thank you as always,

nerds.

Loading...

Loading video analysis...