From Arc to Dia: Lessons learned building AI Browsers – Samir Mody, The Browser Company of New York
By AI Engineer
Summary
Topics Covered
- Embed AI Tools in Product
- JePA Optimizes Prompts Sample-Efficiently
- Model Behavior Evolves like Product Design
- Security Emerges from Product Design
- AI Shifts Demand Company Evolution
Full Transcript
[music] My name is Samir and I'm the head of AI engineering at the browser company of New York. And today I'm going to talk a
New York. And today I'm going to talk a little bit about how we transitioned from building ARC to DIA and the lessons we learned in building an AI browser.
But first, a little about the browser company.
So we started with a mission to rethink how people use the internet. At its
core, we believe that the browser is one of the most important pieces of software in your life and it wasn't getting the attention it deserved. Simply put, the
way we've used a browser has changed over the last couple decades, but the browser itself hadn't. And think about this. We we started this company in
this. We we started this company in 2019. Um, and so this is a screen cap of
2019. Um, and so this is a screen cap of Josh, our CEO, sharing a little bit about our idea on the internet a few years ago, which we endearingly called
the internet computer. So our mission has been to build a browser that reflects how people use the internet today and how we think the browser
should be used tomorrow.
So through years of discovery, trial and error, and some ups and downs, we shipped our first browser, Arc, in 2022.
It was a browser we felt was an improvement over the browsers of that time. It made the internet more
time. It made the internet more personal, more organized, and to us, a little more delightful with a little more craft.
And it was a browser that was loved by many. It still is by millions, many of
many. It still is by millions, many of whom are probably in this audience today. I've gotten a lot of questions
today. I've gotten a lot of questions about Arc today. Um, and uh, it's great, but um, if we took a step back, we felt that ARC was still just an incremental
improvement over the browsers of that time. And it didn't really hit the
time. And it didn't really hit the vision that we set out to create. And
so, uh, we kept building and then in 2022, we got access to LLMs like the GPT models. And so, we started like we
models. And so, we started like we always do with prototyping. We started
trying new ideas um and eventually shipped a few of them in ARC. But what
started as a you know a basic exploration turned into a fully formed thesis. In the beginning of 2024 uh our
thesis. In the beginning of 2024 uh our company put out what we called act 2 a video on YouTube where we shared that thesis that we believe that AI is going
to transform how people use the internet and in turn fundamentally change the browser itself. And so with that we
browser itself. And so with that we started building again but this time we built a new browser with AI speed and security in mind and from the ground up.
And later and sorry earlier this year we shipped DIA our AI native browser.
It allows you to have an assistant alongside you in all the work you do in the browser. It gets to know you,
the browser. It gets to know you, personalizes, helps you get work done with your tabs, and effectively get more work done through the apps you use. And
while it hasn't achieved our vision yet, we fully believe it's well on the way, too.
So, it is not easy to build a product.
You all know that. Let alone two, the latter of which an AI native one. We've
had a lot of years of iteration, trial and error and through that we've learned a lot and I'm going to just talk about a few of those things uh here today.
The first I want to talk about is optimizing your tools and process for faster iteration. From the beginning,
faster iteration. From the beginning, browser company has believed that we're not going to win unless we build the tools, the process, the platform, and the mindset to iterate, build, ship, and
learn faster than everyone else. And
that of course holds true today but the form it takes with AI and an AI native product has changed.
So even as a small company where are we investing in tooling these days? First
is prototyping for AI product features.
Second is building and running evals.
Third is collecting data for training and for eval uh last but definitely not least automation for hill climbing.
So let's start with tools. Initially uh
as we always do, we built some tools.
The first was a very rudimentary uh prompt editor and it was only in dev builds. What did what did this mean for
builds. What did what did this mean for us? Well, it meant a few things. One,
us? Well, it meant a few things. One,
limited access as only engineers were able to access this. Two, slow iteration speeds. And three, none of your personal
speeds. And three, none of your personal context. And as you all know with an AI
context. And as you all know with an AI product, the context is what matters. It
what gives you the feel for whether a product is good or not.
So we evolved and since then we built all of our tools into our product, the product that we as a company internally use every day. And that includes the prompts, the tools, the context, the
models, every parameter. Um, which has not only allowed us to 10x our speed of ideating, iterating and refining our products. But it has also widened the
products. But it has also widened the number of people who can access and iterate on our products themselves. from
our CEO to our newest hire can ideate and create a new product in DIA and also refine an existing one all with their full context.
And this holds true with all of our major product protocols. We have tools for optimizing our memory knowledge graph which all of us use and we have tools for creating iterating on our
computer use mechanism. We actually
tried tens of different types of computer use strategies before landing on one before even building it into the product itself.
And I'll say and I'll end this part with uh it actually is a lot of fun. People
don't talk about that a lot but uh actually building these tools into our product has enabled so much creativity.
It has enabled our PMs, our designers, uh customer service and strategy and ops to try out new ideas that are tailored to their use cases. And that ultimately is what we're trying to do.
The next thing I want to talk about is how we evolve and optimize our prompts through a mechanism called Jeba. This
for us is very nent but an important learning nevertheless.
How we heel climb and refine our AI products is just as important as ideulating them in the first place. So
we're investing in mechanisms to help with this to enable faster hill climbing and one of those being Jeepa. And this
is based on a paper from earlier this year from a few smart folks.
So the key motivation here is simple.
It's a sample efficient way to improve a complex LLM system without having to leverage RL or other fine-tuning techniques. And for us as a small
techniques. And for us as a small company, that's hugely critical.
And how it works is you're able to seed the system with a set of prompts, then execute it across a set of tasks and score them. Then leverage a mechanism
score them. Then leverage a mechanism called PA selection to select the best ones. And then leverage an LLM on top of
ones. And then leverage an LLM on top of that to reflect on what went well and what didn't and then generate new prompts and then repeat with the key innovations here being around that reflective prompt mutation technique.
the selection process which allows you to explore more of the space of prompting rather than one avenue and the ability to tune text and not weights.
And here's a modest uh example of this at work for us. You know, you can provide it a very simple uh a simple simple prompt and run it through JPA and it's able to optimize it uh along the
metrics and scoring mechanisms that we uh created to refine that prompt.
And so if I take a step back and talk about kind of how we build uh for certain types of features, I would buck it into a couple different phases. The
first is that prototyping and ideation phase where we have widened the breadth of number of ideas at the top of the funnel um and lower the threshold on who can build them and how. And so we try
out a bunch of ideas every week, every day from all types of people and we dog food those. And if we feel like there's
food those. And if we feel like there's actually real utility there, it's solving a real problem for us and there is a path towards actually hitting the quality threshold that we believe we need to hit, then we'll move on to this
next phase where we collect and refine eval to clarify product requirements and then hill climb through code through prompting and automated techniques like Jeba and then dog food as we always do
internally and then chip and I do want to kind of double down on these phases. The ideation phase is
these phases. The ideation phase is extremely important just as much as that refinement phase.
And our goal is to enable faster ideation and a more efficient path to shipping. Because with all these AI
shipping. Because with all these AI advancements every week, new possibilities are unlocked in DIA. And
it's up to us as a browser, as a product to get as many at bats with these new ideas and try out as many of them and explore as many of them as possible. At
the same time though not underestimating the path it takes to ship some of these ideas to productions as a high quality experience.
Next uh I want to talk about treating model behavior as a craft and discipline.
So what is model behavior to us? It's
the function that defines evaluates and ships the desired behavior models. It's
turning principles into product requirements, prompts, and evals, and ultimately shaping the behavior and the personality of our LLM products, and ultimately for us, our DIA assistant.
So, I'd buck it into a few different areas. First, it's that behavior design,
areas. First, it's that behavior design, defining the product experience we actually want, the style, the tone, the shape of responses in some cases. Then,
it's collecting that data for measurement and training, clarifying those product requirements through eval.
And last but not least, it's the model steering. It's the building of the
steering. It's the building of the product itself. It's the prompting. It's
product itself. It's the prompting. It's
the model selection. It's defining the what's in the context window, the parameters, etc. Um, and so much more.
And to us, that that process is iterative, very iterative. We build,
refine, we create evals, and then we ship, and then we collect more feedback and feed that into our iterative building process. That could be internal
building process. That could be internal feedback, and that could be also uh external feedback.
And so if I move on for a second, one analogy we've thought about uh is for model behaviors that to product design through the evolution of the internet.
At first websites were functional. They
got the job done. But over time that evolved as we tried to achieve more on the internet and technology advanced. Uh
product design and the craft of the internet itself grew as well as well as the complexity.
And so what might that be for model behavior? Well, at first it was
behavior? Well, at first it was functional. We had prompts. We had
functional. We had prompts. We had
evals. We had instructions in and output out. Now we frame it through agent
out. Now we frame it through agent behaviors. It's goal- directed
behaviors. It's goal- directed reasoning, the shaping of autonomous tasks, selfcorrection and learning, and even shaping the personality of the LM models themselves.
And so, what might the future hold? I'm
excited to see. But what we believe is that we are in the early days of building AI products and model behavior will continue to evolve and into a specialized and prevalent function of
its own even at product companies.
And the last thing I'll leave you with here is that the best people for it might just surprise you. One of my favorite stories about building DIA these last couple years has been uh the
formation of actually this model behavior team. As I mentioned earlier,
behavior team. As I mentioned earlier, uh engineers were writing the prompts at first and then we built these prompt tools to enable more people at the company to actually prompt and iterate.
And there was a person on our team on the strategy and ops team and he actually leveraged these prompt tools one weekend to rewrite all our prompts.
And he came in on a Monday morning and dropped a loom video sharing what he did, how he did it, and why. and a set of prompts and those prompts alone unlocked a new level of capability and
quality and experience in our product and consequentially uh it was the formation of our model behavior team and so one thing I'd emphasize to you all is to think about who are those people at
the company agnostic of their role who can help shape your product and help shape and steer the model itself it might not be an engineer or it might be it could also be someone on the strategy
and ops team next I want to talk about AI security as an emergent property of product building. And today I'm going to focus
building. And today I'm going to focus specifically on prompt injections.
So what is a prompt injection? Well,
it's a prompt attack in which a third party can override the instructions of an LLM to cause harm. That might be data exfiltration, the execution of malicious
commands, or ignoring safety rules.
And so here's an example in which you give uh the context of a website to an LLM and instruct it to summarize it.
Little did you know that there was a prompt injection hidden in that website's uh HTML.
So instead of actually summarizing the web page, the LM actually gets directed to open a new website, extracting your personal information and embedding it as get parameters in the website's URL,
effectively exfiltrating that data.
So, as a browser, prompt injections are extremely crucial for us to prevent.
They're critical to prevent because browsers sit at the middle of what we can call a lethal trifecta.
It has access to your private data. It
has exposure to untrusted content and it has the ability to externally communicate and for us that means opening websites, sending emails, scheduling events, etc. So, how do we
prevent this? Well, there's some
prevent this? Well, there's some technical strategies we can try. First
is wrapping that untrusted context in tags. You can tell the LM, listen to
tags. You can tell the LM, listen to these instructions around these tags and don't listen to the content around these tags. But this is easily escapable and
tags. But this is easily escapable and quite trivy, an attacker could still uh leverage a prompt injection on your browser.
Well, another solution we could try is separating that data and that instructions. We can assign uh the
instructions. We can assign uh the operating instructions to a system role and we can assign a user role for the content of the third party and even layer on randomly generated tags to wrap
that user content to be extra sure that the LM listens to the instructions and not the content. And while this can help, there are no guarantees and prompt
injections will still happen.
So what do we do? Well, it's on us to design a product with that in mind. We
have to blend technology approaches and user experience and design into a cohesive story that actually builds them from the ground up and solves it together.
So, what that might what that excuse me what might that be for a feature in DIA?
Well, let's take the autofill tool in DIA. The autofill tool allows you to
DIA. The autofill tool allows you to leverage an LLM with context, memory, and your details to fill forms on the internet. It's extremely powerful, but
internet. It's extremely powerful, but as you can imagine, it has some vulnerabilities. A prompt injection here
vulnerabilities. A prompt injection here could extract your data and put it on a form, and once it's on that form, it's out of your hands. So, we try to build with that in mind.
In this case, before the form is written to, we actually let the user read and confirm that data in plain text. This
doesn't prevent a prompt injection, but it gives the user control, awareness, and trust in what is happening. And this
is a framing we carry throughout our product and how we build every single feature. So here are some examples.
feature. So here are some examples.
Scheduling events in DIA, we have a similar confirmation step. Writing
emails India, we also have a similar confirmation step.
So I've talked about three different things here today. First is optimizing your tools and process for fast iteration. Second, treating model
iteration. Second, treating model behavior as a craft and discipline. And
third, AI security as an emergent property of building products.
But uh the last thing I want to leave you with, when we started on this journey to building DIA, we recognized a technology shift and we sought to evolve
our product of Arc. We initially came at it from a hey, how can we leverage AI to make ARC better, make the browser better? But what we quickly learned and
better? But what we quickly learned and adapted to was that it wasn't just a product evolution. It was a company one
product evolution. It was a company one and today I shared a glimpse of that.
How we build and how it's changed a team we've literally created around this and how we think about security for AI products. But really it's so much more.
products. But really it's so much more.
It goes beyond that. It's how we train everyone here. It's how we hire. It's
everyone here. It's how we hire. It's
how we communicate. It's how we collaborate and so much more. And if
there's one thing I'll leave you all with, if there's one thing we've learned over the last couple years, it's that when when you recognize that technology shift, you have to embrace it. And you
have to embrace it with conviction.
Thank you.
[applause and music] [music]
Loading video analysis...