Building Claude Code with Boris Cherny

By The Pragmatic Engineer

Summary

Topics Covered

Printing Press Creates Authors
AI Rejects Handwritten Code
Let Models Use Tools Freely
Parallel Checkouts Maximize Productivity
Generalists Thrive in AI Era

Full Transcript

You were the first ever TypeScript book with O'Reilly.

>> Yeah, I found that book translated in Japanese in this little town in Japan.

That was just the coolest moment. And

then I realized I don't remember TypeScript at all. Now we're at the point where Quad Code writes, I think something like 80% of the code had Enthropic on average. I wrote maybe 10 20 p requests every day. Opus 4.5 and Quad Code wrote 100% of every single

one. I didn't edit a single line

one. I didn't edit a single line manually.

>> Andre Carpet posted that he's never felt as much behind as a programmer as he is now.

>> This is something I really struggle with. The model is improving so quickly

with. The model is improving so quickly that the ideas that worked with the old model might not work with the new model.

One metaphor I have for this moment in time is the printing press in the 1400s because there was a group of scribes that knew how to write.

>> Some of the kings were illiterate who are employing the scribes.

>> And if you think about what happened to the scribes, they ceased to become scribes, but now there's a category of writers and authors. These people now exist. And the reason they exist is

exist. And the reason they exist is because the market for literature just expanded a ton.

What happens when you join one of the top AI labs in the world and your first poll request gets rejected? Not because

the code was bad, but because you wrote it by hand. This is exactly what happened to Boris Churnney when he joined Antrophic. Boris is the creator

joined Antrophic. Boris is the creator and engineering lead behind Claude code.

Before joining Androphic, he spent 7 years at Meta where he led code quality across Instagram, Facebook, WhatsApp, and Messenger, and was one of the most prolific code authors and code reviewers at the company. In today's episode, we

cover how Cloud Code went from a side project to one of the fastest growing developer tools and the internal debate at Entrophic whether to release it at all. Boris's daily workflow of shipping

all. Boris's daily workflow of shipping 20 30 poll requests a day with zero handwritten code and how code review works when AI writes everything. Why

Boris believes we're living through a time as transformative as a printing press and which engineering skills matter more now and which ones do not.

If you want to understand how one of the people closest to AI coding agents actually builds software today and what that means for the rest of us engineers, this episode is for you. This episode is presented by Statsig, the unified

platform for flags, analytics, experiments, and more. Check out the show notes to learn more about them and our other season sponsors, Sonar and Work OS. How did you get into tech,

Work OS. How did you get into tech, software engineering, and and coding in general?

>> It starts a while back. I think there was kind of like two parallel paths that crossed. So, when I was maybe 13 or

crossed. So, when I was maybe 13 or something like this, I started selling my old Pokemon cards on eBay. And I

realized that on on eBay, you can actually like write HTML. And I was looking at other people's Pokemon card listings and I realized like some of them have like big colors and fonts and

stuff like this. And then I discovered the blink tag and I named Blink Tag.

>> And if I put the blink tag on it, I could sell my card, you know, for like 99 cents instead of 49 cents or whatever. So I kind of learned about

whatever. So I kind of learned about HTML this way. Then I got an HTML book and kind of learned about HTML. And then

uh the second thing was this was also I think sometime in middle school. We had

these old TI83 uh graphing calculators and we use them for math. And what I realized is I can get a better answer on the math test if I just program the answers to the math test into my

calculator. And so I wrote these little

calculator. And so I wrote these little programs to just program the answers and then the test got harder. first then I had to program solvers instead of the actual questions cuz I didn't know what what you know the coefficients and stuff

would be ahead of time and then the math got more advanced like the next year and so I had to drop down from basic to assembly to just make the program run a little bit faster.

>> Oh wow. So like in high school you dropped down to assembly.

>> I think this is like middle school or high school maybe like 8th or 9th grade or something like this. Then then the thing I realized is uh everyone in my class was starting to realize that I had the solver and they got kind of jealous

and so I bought this little serial cable. so I can give it to them too. And

cable. so I can give it to them too. And

then the next math test, everyone on the class just got A's. And the teacher was like, what's going on? And then

eventually she realized it. She was

like, okay, you get away with it once and and uh knock it off. But for me, it it was very practical. So, you know, in school I studied economics. Um I

actually dropped out to to startups and I never thought that coding would be a career at all. It was always very practical to me. Coding is a means to build things and to to make useful

things. this startup. Um, the first one

things. this startup. Um, the first one was I think it's like my friends and I were trying to get weed and so we started this like weed review startup. We made like a website. We

startup. We made like a website. We

called kind of different uh dispensaries I I think and then we just tried to get kind of like weed samples so we could like review it for them. And it actually kind of blew up. Um, and then I actually

got more interested in uh at the time no one was like testing this stuff and so I got into kind of the like chemical testing kind of chemical analysis and then after this I kind of did a bunch of

other startups and then I joined YC actually pretty early uh and I was the first hire of uh this YC startup up in up in Palo Alto after.

>> How did you decide to go go to one startup after the other?

>> Kind of vibes vibes I'd say cuz you know you know like you know startups it's it's never a linear path. You always

kind of pivot pivot pivot. You have to figure out what the market wants and what users want. And it's never the thing that you think. You you always try a thing, but the the idea is always a hypothesis and then almost always you

have to pivot once, twice, three times.

You know, at at this uh at this medical software company, this is called Agile Diagnosis. This was kind of an early YC

Diagnosis. This was kind of an early YC company. This was back in maybe 2011,

company. This was back in maybe 2011, 2012, something like that. It was

medical software for doctors. And the

idea was there's these like clinical decision protocols. They vary a lot

decision protocols. They vary a lot hospital to hospital. And our idea was there was one hospital in Chicago that had a really great protocol specifically for cardiac symptoms. And so we're like,

wouldn't outcomes be great if every hospital in the US would use the same protocol? And so we tried to standardize

protocol? And so we tried to standardize it. And we made this like decision tree

it. And we made this like decision tree software for doctors to use. And I

wrote, you know, some of the software.

The team was like it it was it was just a few of us. It was a pretty small team.

And I wrote the software. It was in a web browser. And I remember this was

web browser. And I remember this was back in the like the Internet Explorer 6 days. that's what hospitals were using

days. that's what hospitals were using >> and I wrote this like SVG renderer uh because it was this visual decision tree and we launched it and then we had a DAU

chart and the DUS were flat and couldn't figure it out and we were piloting it with a few hospitals at the time and at the time we were based in PaloAlto we were piloting it with uh you know a few hospitals including UCSF and I rode a

motorcycle at the time so I rode my motorcycle up to you know UCSF and I shadowed doctors for a couple days just to see how how do they actually use And I realized that actually doctors

don't have time to sit down and use a computer because you're seeing a patient >> then you have maybe 5 minutes until the next patient and in those 5 minutes you have to walk down the hall you have to

go to the computer station you have to open up this totally legacy computer. By

the time it boots up that's like 3 minutes. Then you open up Inner Explorer

minutes. Then you open up Inner Explorer 6 that takes like 30 seconds. Then you

have to open up this like app that we built. You have to sign in and your 5

built. You have to sign in and your 5 minutes are up. you don't even have time to use it. And so we rewrote everything to run on Android and they still weren't using it. And the thing we realized is

using it. And the thing we realized is doctors are walking around with a bunch of residents behind them. In this kind of situation, it's like a social situation, right? Like the thing that

situation, right? Like the thing that matters is they're seen as an authority.

They don't want to be seen on their phones. And then we pivoted again. So at

phones. And then we pivoted again. So at

that point, we were like, okay, so maybe the doctor isn't the target user.

Actually, we wanted to be used by maybe nurses or X-ray technicians or something like this. At that point, I left because

like this. At that point, I left because I was like, "This is actually pretty far off from kind of what I wanted to do."

This is like the most fun thing for me is finding this this product market fit because it's always surprising. You

can't have one big idea because the idea is probably going to be wrong. So, you

kind of form hypothesis, you you follow it down and and you see what's right.

Also, I find it so interesting how you're telling us this story because I feel behind a lot of startup success stories, we hear the success story. We

hear the path of how it went. But first

of all, a lot of startups are like this.

And second of all, what struck me is you you were hired as a software engineer, right? And this was back before product

right? And this was back before product engineers or anything was a thing which we're now talking about. But you just like you rode your motorbike and you went there and you shadowed the people

and you understood how they're using it, why they're not using it. getting

getting ideas. I I feel, you know, this this is what makes a great software engineer back then and and even today, right? You you weren't doesn't seem to

right? You you weren't doesn't seem to me that you were focused on a technology. You were focused on the

technology. You were focused on the outcome though.

>> Yeah. I mean, look, there there's different kinds of engineers and there's different ways to do it. And you know, I even even on our team right now, I look at an engineer like Jared Sumar and he's just incredible technical mind. He

understands systems better than anyone I've met. And you know you need you need

I've met. And you know you need you need people like this. You need people with this kind of depth. For me engineering has always been a practical thing. Uh

and you know for me I've always been a generalist and like it doesn't matter if I'm doing you know like design or you know if I'm doing engineering or user research or whatever. The investment

thesis for AI and software engineering is straightforward. As AI writes more

is straightforward. As AI writes more code more code needs to be verified. But

there's a catch. AI generated code is on average harder to verify than human written code. This is why there's Sonar,

written code. This is why there's Sonar, the makers of Sonar Cube. As a critical verification layer for the AI enabled world, Sonar ensures that speed and volume with AI does not compromise your

codebase. Sonar's competitive position

codebase. Sonar's competitive position is built on 17 years of specialized expertise that no foundational model can replicate. We're talking about deep

replicate. We're talking about deep analysis engines like symbolic execution and cross- repository data flow tracking that simulate how code actually behaves, not just what it says. To bridge the

divide between AI productivity and code quality, Sonar has released the Sonar Cube MCP server. This tool acts as a universal translator between AI applications and the Sonar Cube

platform. By using the modal context

platform. By using the modal context protocol, it gives AI tools like cloud code, GitHub copilot, and cursor direct access to sonar cubes analysis capabilities. Instead of context

capabilities. Instead of context switching, your AI agent becomes a full-fledged code review and quality assurance copilot capable of analyzing code snips for issues, filtering bugs by severity, and even checking your

project's quality gate status before you ever commit code. Whether you're working with coding assistants or scaling up with full agogentic workflows, Sonar provides the automated verification that

75% of the Fortune 100 rely on. It's

about giving your developers the freedom to innovate without the fear of breaking the code base. Head to

sonarsource.com/pragmatic

to learn more about how Sonar enables the confidence to develop at the speed of AI. With this, let's get back to

of AI. With this, let's get back to Boris's career and what he learned working at startups. My first job I ever had, I was like, I think I was 16 and I just wanted to buy an electric guitar.

And so what I did was I I started uh I just started freelancing. And so I was like, "Okay, I guess I'll make websites." And I think Fiverr was not a

websites." And I think Fiverr was not a thing back then. So there were some other freelancing websites. So I just started like I put up a website. I

started bidding on stuff. And my first paycheck, I just spent the entire thing on an electric guitar. But it but it was very practical, right? Right? Cuz it's

like when you're in this kind of setup, you have to you have to do the engineering, you have to do kind of the accounting, you have to do the the design, you have to talk to customers.

It's just always been like that for me.

After a couple of these startups, you ended up at Facebook now now called Meta. And there you spent seven years

Meta. And there you spent seven years there. Can you just talk us through what

there. Can you just talk us through what you've worked there, what you've learned there? You've also had a very remarkable

there? You've also had a very remarkable career growth in terms of four promotions over over over seven years.

And what did you take away from that that experience?

>> Yeah, so I started on Facebook groups.

That was the first time I worked on uh Vlad Klesnikov uh hired me. I think I think he's actually still at Facebook.

Um I think he's on some other team now.

And it was cool actually. There there's

a big group of people that I worked with that were these kind of early JavaScript people too. And you know, like I I did a

people too. And you know, like I I did a bunch of JavaScript stuff. And it's

funny like I kept crossing paths with these people. And so Vlad, he worked on

these people. And so Vlad, he worked on Bolt.js, JS which was the software it was the framework that powered ads manager which later became ReactJS. I I

kept crossing paths with these people and later on for yeah later on there there was a bunch more people like this but anyway so I I was working on Facebook groups um I was really excited about it because the because of this

mission of connecting people to their community. This is the thing that drew

community. This is the thing that drew me in. And at the time I was a big

me in. And at the time I was a big Reddit user. I became a Reddit user back

Reddit user. I became a Reddit user back when I was a teenager because I didn't know anyone else that coded. Even in

college, I didn't really know anyone that coded.

>> And honestly, I was always kind of embarrassed about it cuz I thought it was this nerdy thing. And I thought it was kind of this this thing that I knew how to do, but I wanted, you know, I wanted to be like a cool kid and, you know, like I I couldn't like tell people

that I coded. It was like it was very nerdy. Um, and and at some point I

nerdy. Um, and and at some point I discovered it was some like programming community on Reddit and I was I was just shocked like there's other people that are into this thing. It's like such a

weird hobby. It's so niche and it was

weird hobby. It's so niche and it was just so exciting to find like-minded people like this and get this connection and so I just wanted to work on this. I

wanted to kind of contribute to this in in some way. So I worked on Facebook groups for a while. Um, and then you know there there's a bunch of different projects have to to kind of get get into

details for any of these. Eventually I

became the the tech lead for for Facebook groups and kind of grew grew into this and the org grew the work really changed. It changed from kind of

really changed. It changed from kind of building to a lot of like dock writing and coordination and kind of delegating to others. The culture was changing at

to others. The culture was changing at the time. So you know this early

the time. So you know this early Facebook culture was disappearing. The

docs were coming in. The you know alignment meetings were coming in. uh

there was a lot of a lot more work around this kind of foundational stuff like privacy, security, things like this that I think honestly early on a lot of corners were cut in order to grow. But

at some point you just have to pay that debt and that was the time when that happened. Then I spent a few years at

happened. Then I spent a few years at Instagram after um and that was also a funny story. My wife got a got a job

funny story. My wife got a got a job offer and she was just really excited about it and she came to me and was like, "Hey, like I got this offer but we're going to have to move. Is that

okay?" And I was like, "Yeah, that's fine." You know, like I work in tech. we

fine." You know, like I work in tech. we

can work remotely anywhere. Where's the

job? And she was like, it's a N. And I

was like, where where's that? And uh N is like rural Japan. And this was uh >> different time zone as well.

>> Different time zone. Yeah. This was

>> 12 hours or something different or something like that.

>> Something like that. Yeah. It was like 2021.

>> Wow.

>> Um and then I I tried to kind of find a team that would sponsor me cuz there was there were these kind of arcane HR rules about like the time zone you have to be in and the team you have to be collocated with and so on. And so uh

there was a little kind of naent team uh for Instagram in Tokyo and Will Bailey was running this team. He was also the guy that made Instagram stories and uh so he was my manager for a while and so

we decided to grow that team together and I worked remotely from NA and then most of the team was in Tokyo and uh during this time I I started hacking on Instagram and the stack was

just insane like Facebook was the single best web serving stack in the world. the

the way that HH everything is optimized like from from the hack language to the HHVM runtime to the to GraphQL as the transport layer to like the client libraries like relay and and all the

stuff it was just and in React it was just amazing there there's no other devstack in the world that was this good and it's just fully optimized and then I went to Instagram and it's like you know

Python where the type checker didn't work and click to definition didn't work and it was this like kind of hack together Django and then like a work of

uh you know the Syon runtime and just nothing really worked and so I came to Instagram I joined the labs team uh you know in in Japan and the idea was to find the next big thing for Instagram.

We tried some stuff but what I very quickly realized is that I was just not effective at working on the stack because it was such a terrible stack and so I just went and started working on

Dev Infra because uh we we needed to fix it and there there's a few projects that we worked on. So one was migrating from Python to the big Facebook monolith.

Another one was migrating from Rest to GraphQL. And uh these projects, they're

GraphQL. And uh these projects, they're they're actually in progress, you know, like these are things that involve it takes hundreds of engineers many years to do this. It's a big code base. It's a

big migration. Um now it's it's much faster.

>> Yeah. With with with these tools that we have, the AI AI tools and migrations are a pretty good use case for them though.

>> Yeah. It's like the it's the perfect use case for it. And then I I just started getting kind of deeper into this. And by

the end, by the time I left Instagram, so I was working on this on dev and kind of leading a bunch of these migrations.

That's also where I intersected with Fiona Fun who is now the manager for the quad code team. I just worked with her and she was just such an amazing leader, this incredible depth and kind of history in tech. And I just thought like

there's no better there's no better manager for this team. And then I I also started working on code quality. And so

the the work on Instagram kind of expanded a bit. And um by the time I left, I was leading code quality for all of Meta. And so I was responsible for

of Meta. And so I was responsible for the quality of the code bases across Instagram Facebook Messenger WhatsApp, Reality Labs, kind of all these code bases. At Meta, it it was this program called Better Engineering.

And the idea was I think it's sort of like 2016 or 2018 or something, but Zuck mandated that every engineer at the company 20% of their time has to be spent fixing tech debt.

>> Oh, interesting.

>> And we called this better engineering.

>> Mhm. And the some of this is kind of bottom up where you know a team knows best the tech debt that they have to fix and then some of it is top down where you need to do you know very big migrations you need to migrate to new

language features new frameworks things like this and at Facebook scale you know there was tens of thousands of these migrations every year. Um and so I I just started leading all this and I

realized very quick that it just needed a little bit more order to it. There was

no goals. No one knew kind of like what the outcomes were there. there wasn't

any tracking. Um, and so we developed a bunch of stuff. Uh, one of the ideas was a centralized way to prioritize the different kind of code quality efforts.

The second thing was figuring out the impact of code quality on engineering productivity which turned out to be significant.

>> How how did you measure what did you find there?

>> There was a bunch of stuff. I think some of this has been published. I don't know if all of it has, but essentially you try to do like causal analysis and causal inference. This is the

causal inference. This is the methodology. You try to figure out like

methodology. You try to figure out like what what are the factors that make it so engineers are more productive. Some

of it is code quality, some of it is outside of code quality. So for example, meta went back to uh you know return to office instead of work from home. That

was partially driven by this because we just found some you know fairly strong correlations that we thought were causal.

>> Yeah.

>> Um about this but quality actually contributes like you know double digit percent to to productivity. It turns out even even at the biggest scale. It's

it's kind of comforting to hear because I I think it's it's rare to have a place where you actually measure this, but I think we feel it like when you have a clean code base in modular or it can get

easier to work with and I I think you know reasoning could it also be easier for LM to to work with it and my hint would be yes it should be right but I I

think there's just very little data but that's a feeling that I I would have.

Yeah, I think a lot of the big companies have published about this. Like I think Facebook published something. Uh

Microsoft publishes a bunch about this, Google does, but yeah, totally. If if if every time that you build a feature, you have to think about do I use framework X or Y or Z. These are all options that

you can consider because the codebase is in a partially migrated state where all of these are around the code somewhere.

As an engineer, you're going to have a bad time. As a new hire, you're going to

bad time. As a new hire, you're going to have a bad time. As a model, you might just pick the wrong thing and then, you know, like the user has to course correct you. So actually you know the

correct you. So actually you know the better thing to do is just always have you know a clean code base always make sure that when you when you start a migration you finish the migration and this is great for engineers and nowadays

it's it's great for models too and then you joined entropic and I've heard this story which you can confirm or give more color to it that your first poll request was rejected by Adam Wolf.

>> He was my rampa buddy. So I joined Enthropic. I was trying to figure out

Enthropic. I was trying to figure out kind of like what to do next and you know I I met a bunch of people at all the different labs and anthropic was just the obvious choice for me because of the mission. This is the thing that

personally I know that I need the most.

Um and also just kind of seeing all this change that's happening. It's important

to have some sort of framework to think about this and to think about our role in it. I'm also a really big sci-fi

in it. I'm also a really big sci-fi reader. Like that that's definitely my

reader. Like that that's definitely my genre. Um I'm I'm a big reader. I have

genre. Um I'm I'm a big reader. I have

like, you know, giant bookshelf at home and stuff and I just know how bad this thing can go and I just felt like this is a place that has serious thinkers.

People are taking this very seriously and thinking about what what what can we do to make this thing go better. So when

I joined Anthropic, I did a bunch of ramp up projects uh just you know various stuff that that I was hacking on and I wrote my first pull request by hand because I thought that's how you write code.

>> That used to be how you write code.

>> That used to be how you write code. But

even at the time at Enthropic, there was this thing called Clyde and it was the it was the predecessor to quad code. It

was it was super janky. It was like it was Python, you know, it took like 40 seconds to start up. It was research code. It was not agentic. But if you

code. It was not agentic. But if you prompt it very carefully and hold the tool just right, it can write code for you. And so Adam rejected my PR and he

you. And so Adam rejected my PR and he was like, "Actually, you should use this Clyde thing for it instead." And I was like, "Okay, cool." It took me like half a day to figure out how to use this tool because you have to like pass in a bunch

of flags and like use it correctly. Um,

but then it it sped out a working PR. It

just one-shotted it.

>> Oh, >> and this was like 2024.

This like September 2024, August, something like that. And I think for me, this was my first fuel hi moment at Anthropic cuz I I was just, oh my god, like I didn't know the model could do

this. Like I I was used to these like

this. Like I I was used to these like kind of tab completions, line level completions in an IDE. I had no idea that it could just make a working pull request for me. Boris just talked about

how he had a true wow moment at work using their AI model. A very different wow moment is when you use a tool at work that makes things so much easier than before. And this leads us nicely to

than before. And this leads us nicely to our presenting sponsor, Statsig. Statsig

offers engineering teams the tooling for experimentation and feature flagging that used to require years of internal work to build. It's the kind of tool that was so complex to build that only large companies like Meta or Uber had

their own custom advanced tooling for it. Here's what satic looked like in

it. Here's what satic looked like in practice. You ship a change behind a

practice. You ship a change behind a feature gate and roll it out gradually, say to 1% or 10% of users at first. You

watch what happens. Not just did it crash, but what did it do to the metrics you care about? Conversion, retention,

error rates, latency. If something looks off, you turn it off quickly. If it's

trending the right way, you keep it rolling forward. And the key is that

rolling forward. And the key is that measurement is part of the workflow.

You're not switching between three tools and trying to match up segments and dashboards after the fact. Feature

flags, experiments, and analytics are all in one place using the same underlying user assignments and data.

This is why teams at companies like Notion, Brex, and Atlastian use Statsig.

Statsic has a generous free tier to get started, and pro pricricing for teams starts at $150 per month. To learn more and get a 30-day enterprise trial, go to stats.com/pragmatic.

stats.com/pragmatic.

And with this, let's get back to Boris and the origin story of Claude Code.

>> Yeah. And and then when you when you joined Entrophic, we we've covered this in in a deep dive, but we could recap briefly on how Claude Code came to be out of out of what seemed like a side

project or just a cool hack. So yeah, I I I started hacking on a bunch of different stuff. Um I was working on

different stuff. Um I was working on some things in product. Um I worked on reinforcement learning for a little bit just to kind of understand the layer under the layer which I was building.

This is still advice that I give to a lot of engineers is always understand the layer under. It's really important because that just gives you the depth and you kind of like you have a little bit more levers to to work at the layer that you actually work at. This was the

advice 10 years ago. It's still the advice today. Um but the layer under is

advice today. Um but the layer under is a little bit different now. You know,

before it was like understand, you know, the Java if you're writing JavaScript, understand the JavaScript VM and frameworks and stuff.

>> Now it's like understand the model. So I

was hacking on a bunch of different stuff. Uh something shipped, some things

stuff. Uh something shipped, some things uh didn't ship. And at some point I I just wanted to understand the public anthropic API because I'd never used it before. Um and I didn't want to build a

before. Um and I didn't want to build a UI. I just wanted to, you know, hack

UI. I just wanted to, you know, hack something up quite quickly cuz we didn't have quad code back then. We're still

writing code by hand. And I wrote this little batch tool that um all all it did was it hit the anthropic API and it it was essentially like a chatbased application um but just in the terminal

because that's what AI used to be. And

you know, I I still think about it like engineers are the first adopters. And so

when we started to move out of conversational AI to agentic AI, it took a little bit, but engineers understood it pretty quick. And I I think now when you ask non-engineers about like what is

AI, they would say it's this conversational AI, it's like a chatbot or something. And that's why I'm

or something. And that's why I'm actually very excited for, you know, co-work this new product that we launched because it's going to bring the same thing that engineer saw very early to everyone else. But when I think

about, you know, co-work, I I think back to this moment that we're talking about like very early on, quad code originally wasn't quad code. It was a chatbot because that's what I thought AI was.

Um, but we had to kind of figure out kind of what is the next thing. And so I at at the time I I built this chatbot.

It was somewhat useful, but it was just a chatbot. And the next thing that I

a chatbot. And the next thing that I tried was I I wanted it to use tools because tool use just came out and I didn't know what it was and I was like let's experiment

and and I I gave it a single tool which was the bash tool and I didn't know what to do with the bash tool and so I asked it you know like I I actually didn't know if it could even do this but I asked it like what music am I listening

to and uh it just wrote a little Apple script program using like said or or whatever to uh open up my music player and then like query it to see what music

it's listening to and just one shot at this with sonnet 3.5. This is actually my second a field AI moment very quickly after the first one

>> and the model just wants to use tools that though that's that's just what I realized like this thing like if you give it a tool it will figure out how to use it to get the thing done and I think

at the time when when I think about the way that people were approaching AI and coding everyone essentially had this mental model of you take the model and you put it in a box and you figure out

like what is the interface like what how how do want to interact with this model?

What do you need it to do? Essentially,

it's like if if you have a program, you you stub out some module, stub out some function, and you say, "Okay, this is now AI." But otherwise, the rest of the

now AI." But otherwise, the rest of the program is just a program. And so, this is just not the way to think about the model. The way to think about it is the

model. The way to think about it is the model is its own thing. You give it tools. You give it programs that it can

tools. You give it programs that it can run. You let it run programs. You let it

run. You let it run programs. You let it write programs, but you don't make it a component of this larger system in this way. And I think there's just like, you

way. And I think there's just like, you know, this is a version of the bitter lesson. There's the bitter lesson is a

lesson. There's the bitter lesson is a very specific framing, but there's many corollaries to it. This is one of the corollaries is just let the model do it do its thing. Don't try to put it in a box. Don't try to force it to behave a

box. Don't try to force it to behave a particular way.

>> One of the first ways you saw it was giving it tools, giving it access to the bash and then later to the file system and then to more tools. Right.

>> That's right. Yeah, we we give it uh we give it bash then uh I say we it it was just me the first three months but then the team grew. So it it was bash, it was uh and and file edit that was the second one.

>> And one of the interesting thing we talked about uh last time for the deep dive is when you built it and it started to actually write code with with the tool tools that you had. You've had an

internal debate inside entrophic should we just keep it to ourselves because it's making suddenly it spread across engineering and it was making all of you a lot more productive right. Yeah,

that's right. In the end, the decision was to release so that we can study safety in the wild. Because when you think about safety and you know, I keep talking about the word safety. The

reason anthropic exists as a lab is safety. This is the reason it was

safety. This is the reason it was founded. This is the reason it exists.

founded. This is the reason it exists.

If you ask anyone at anthropic why they chose it, it's because of safety. And so

if you think about model safety, you know, there's different layers at which to think about it. There's kind of alignment and mechanistic interpretability. This is at the model

interpretability. This is at the model layer. Then there's evals and this is

layer. Then there's evals and this is kind of like a it's kind of putting the model in a petri dish and synthetically studying it in this way. Um and then you can study it in the wild and you can see how it actually behaves. You can see how

users talk about it. You can you can see like what are the risks in the wild and you actually learn a lot this way. And

by doing this we we've been able to make the model much safer. So in in hindsight it was it was totally the right decision. It's amusing to hear about it

decision. It's amusing to hear about it from your perspective because from the outside what what I saw and what a lot of engineers saw is like oh entropic release cloth code oh wow this you know

for the first release with uh I I believe it was with sonet 4 release was was did it come out with sonet 4 originally or sonet 4.5 >> I think it was it was for that that was the general availability in February but

I think it was research preview before that >> yeah but when it came out my infiltration was like oh this thing can write code pretty well and over time it became a lot more capable. So from from

our perspective it was like this really capable coding tool that we just started to adopt and use and use for all sorts of increasingly product productive parts

and it has become I believe one of the fastest growing developer tools and I'm always surprised to hear the story that it actually comes from research and the goal to understand how people use the

model because at the other hand like some startups have been trying to build developer tools deliberately to to get adoption and yet this research tool is getting a lot more adoption.

>> I mean this is a you know anthropic we're we're a research lab we're a safety lab and you know product is this kind of thing tacked on to the side product exists so that we can serve research better and so we can make the

model safer and this is kind of how we think about everything there there was this there's also this funny moment early on when uh we we had this launch review and we were deciding whether to launch it. I remember this moment cuz we

launch it. I remember this moment cuz we were in the room. I think it there was like there was Mike Creger, there was Daario, there were some other folks in the room and we were deciding what should we do. We were looking at the internal adoption chart which was just

vertical said it was just insane. It was you know like nowadays >> vertical is 100% right >> just just 100% like nowadays everyone at an every technical employee at anthropic

uses quad code every day is pretty much 100%. For nontechnical employees it's

100%. For nontechnical employees it's also like it's actually getting quite close to 100%. It's it's increasing very quickly like you know like half the sales team uses quad code um and I think that's increasing it's just it's crazy.

Dario had this question about like how how did it grow this fast? Are you like forcing people to use it?

And I was like no we offer this tool people vote with their feet and you know just like let people use the tool that they prefer.

>> Yeah they chose it.

>> You don't seem like the person who's act exactly forcing people to use your tool.

>> Yeah. Yeah. I mean the the way we did it, we just we launched the thing and then we just like listened to the users and we talked to people, we saw how they use it, we followed up, we made it better and yeah, I mean now now we're at

the point where Quad Code writes I think something like 80% of the code in at Enthropic on average and you know it writes all of my code for sure.

>> Yeah. And this started for you it started the first time you mentioned I think it was in November when it started to write all of your code. When did that switch come and what what happened to

made you trust it to to write your code or how much you trusted? How much you review that code for example?

>> So the switch was instant when we started using Opus 4.5. This was before before it came out, you know, we we were dogfooting it for a little bit and it it was just right away. Um it's such a more

capable model. I just found that I

capable model. I just found that I didn't have to open my ID anymore. I

just uninstalled my ID cuz cuz I just didn't need it at that point. I actually

did that like a month later because I I I just didn't even realize that I wasn't using it anymore.

>> Yeah, a lot of us had similar experiences once Opus 4.5 was out in the public and especially over the winter break. I I had a similar experience. I

break. I I had a similar experience. I

just realized that this thing it actually writes, if I'm being honest with myself, as good code as I would have written in the stack that I'm very familiar with and my code base, my side projects where I know it and just a lot

better than what I could for code base that I'm not as familiar or technologies I'm not as familiar with. Yeah. I'll be

honest, he writes better code than I do.

>> I I I don't want to go there. I I still like to keep my pride, but probably true.

>> Yeah. Yeah. I I realized this because also in December, I was traveling a little bit. I was like on a I was on a

little bit. I was like on a I was on a coding vacation. We we're talking about

coding vacation. We we're talking about this before, but I I went to Europe. We

were just in a different time zone kind of nomading around. And it was so fun cuz I was just coding all day every day, which is my favorite thing to do. And uh

I wrote maybe, you know, like 10 20 p requests every day, something like that.

Opus 4.5 and quad code wrote 100% of every single one. I didn't edit a single line manually and I realized uh at the end of that month Opus introduced maybe two bugs whereas if I had written that

by hand that would have been you know like 20 bucks or or something like that.

Can we talk about your development workflow? You have written threads about

workflow? You have written threads about this which is awesome. It's on it's on social media on threads and on on X. But

can you tell us how you use today uh cloud code in terms of you know parallelism and and tips and tricks that you and the team have kind of learned and share across the across the team?

>> Yeah, I mean look there's no one right way to use quad code. So I I can share some tips and things but I I think the wrong conclusion to draw would be to

just copy copy these and and use it. The

way we build cloud code is we build it to be hackable because we know every engineer's workflow is different.

There's no one way to do things. There's

no two engineers that have the same workflow. It's just every every engineer

workflow. It's just every every engineer is >> same with workstation setup, right? Like

keyboards, monitor placement, all that.

Everyone has it differently.

>> Yeah. It's like we're like crafts people, right? Like you choose you

people, right? Like you choose you choose your tools. Like we care deeply about it. So there's no one right way to

about it. So there's no one right way to do it. So for me, the way that I do it

do it. So for me, the way that I do it generally is I have five terminal tabs.

Each one of them has a checkout of their repository. So it's five parallel

repository. So it's five parallel checkouts. Um and usually I'll kind of

checkouts. Um and usually I'll kind of roundroin and start cloud code in each one. Almost every time I start in plane

one. Almost every time I start in plane mode. So that's like shift tab twice in

mode. So that's like shift tab twice in the terminal. And uh I also overflow uh

the terminal. And uh I also overflow uh as I run out of tabs cuz there's only so many terminal tabs. I used to use web a lot for this. So like quad.ai/code,

that's the place that I overflow to.

Nowadays I actually use the desktop app.

Um it's more convenient. So Quad Code, you know, it's been in our desktop app for, you know, for many months. It's

just a code tab in in the Cloud app. Um,

and I actually really like it because it has built-in uh work tree support. So

that's existed for a while. Um, and that that's quite nice for parallelism. So

you have multiple, you don't need multiple checkouts. You just have one

multiple checkouts. You just have one and then we automatically set up Git work trees for you. So you get this kind of environment isolation. The reason I do that is I actually just really hate fiddling with git work trees on the

command line cuz it it's kind of fiddly.

like you need to know the CD get work tree for those of who are not as familiar with it. It's it's when you can check out instead of having a separate local folder, it's almost like checks

out separate branch, right? And then you can work on it separately but not have the comp have the complex only at like merge time.

>> That's right. Imagine that you you have a folder but you have maybe like git makes five copies of that folder in a way that's very cheap um and kind of easy to throw away. So you get this kind of isolation. it can work in parallel

of isolation. it can work in parallel and the quads don't interfere.

>> Yeah. So, you now have support for this which I I think you recently added like native support but like for for your workflow you just stuck with the old one of checking out on separate f folders, right?

>> Yeah, exactly. I I actually find over time I'm using the desktop app more and more for this.

>> Um just cuz I don't need these separate checkouts and you know I I just have a bunch of quads running in parallel and I don't have to think about it. The other

surprise hit is the iOS app for me.

Every day I start like I wake up and I just start a few agents on my phone. Oh,

the the native one. Yeah,

>> the native one. Yeah, it's just like it's the quad app. It's the code tab in the in the quad app and it's the same exact quad code.

>> Yeah, except it it runs in the cloud, right?

>> It runs in the cloud. Yeah. So, you have to kind of configure the environment.

Luckily, our environment is pretty simple. So, you know, um and it we just

simple. So, you know, um and it we just use hooks for it. So, you just use the session start hook and configure it.

This is kind of one of the benefits of making quad code really hackable is it's very easy to do to do this kind of configuration. And this is something

configuration. And this is something honestly I would never have predicted because you know like I I I code on a computer. If you told me six months ago

computer. If you told me six months ago I'd be writing I don't know a third I haven't pulled the data maybe like a third half something like this of my code on a phone. That's crazy. But

that's that's what I'm doing today.

>> And you're using parallel agents. At

what point did you start using them? And

how has it changed your work? Cuz one

thing that I notice on myself, I don't really use that many parallel agents. I

maybe like two at a time, but I'm someone who well I I like to be in charge and especially with Claude.

Claude is is is a a tool that you can follow it along. It tells you what it's doing. It you can also have for example

doing. It you can also have for example learn mode which this was shipped a lot earlier where where you can actually follow along. It gives you tasks. I I

follow along. It gives you tasks. I I

feel that like staying in one tab and following along the model is pretty fast as well. I can kind of keep in touch.

as well. I can kind of keep in touch.

I'm assuming at some point you must have done this but then what happened when you changed to parallel and are do you feel you're losing any control or it doesn't really matter that much?

>> Yeah, I I I think there's kind of like two modes to think about or kind of like two two uh two kind of workflows to think about. So when you're new to a

think about. So when you're new to a codebase, highly re learn mode is awesome. Highly recommend it for people

awesome. Highly recommend it for people that are onboarding to the quad code team, people that onboard to enthropic.

Um the thing that we recommend is so you do for people that haven't tried it you do slashconfig in quad code you pick the output style and you can do learn or explanatory. We usually recommend

explanatory. We usually recommend explanatory cuz that tends to be better for new code bases um that you kind of haven't been in before. For me once you're familiar with the codebase you just want to be productive right like

you just want to ship as much as you can and you want to kind of be effective doing that. Um so the role really

doing that. Um so the role really switches. I don't really go deep into

switches. I don't really go deep into tasks anymore. I start a quad in plan

tasks anymore. I start a quad in plan mode. I'll have it kick something off.

mode. I'll have it kick something off.

With Opus 4 4.5, I think it got there.

With 4.6, it just really really does it.

Once there is a good plan, it just it will oneshot the implementation almost every time.

>> So, the most important thing is to go back and forth a little bit to get the plan right. So, what I do is I I start

plan right. So, what I do is I I start one, I enter plan mode, I give it a prompt. As it's chugging along, I'll go

prompt. As it's chugging along, I'll go to my second tap and I'll start the second quad also in plan mode. Get it

chugging along. Then go to the third tab, go to the fourth one. Then maybe

I'll go back to the first one when I get notified that it's done. Uh, and then I'll kind of >> Do you have notifications on or do you turn them off?

>> I actually operate in both modes. Um,

sometimes I do like, you know, focus mode on the Mac. Um, so I just have it off, but also sometimes I use the system notifications.

>> And you're very very productive with with PRs. I mean, I I think it was very

with PRs. I mean, I I think it was very visible. Even around the holiday breaks

visible. Even around the holiday breaks uh on social media, you actually were responding to I think someone reported a bug or or a feature request. I'm not

sure which one it was. And then an hour or two later it was done cuz cuz you did it. You've also talked about like number

it. You've also talked about like number of poll requests you've done on a day not to like show up but just as context.

What what does a poll request typically involve in terms of complexity? Are

these like are some some super trivial or some actually like larger pieces of work as well?

>> Yeah, pull request each one varies a lot. Um sometimes it's a few lines,

lot. Um sometimes it's a few lines, sometimes it's a few hundred or a few thousand lines. They're all just very

thousand lines. They're all just very very different. It's changed so much.

very different. It's changed so much.

Like back when I was at Instagram, I think I was one of the uh top two maybe top three most productive engineers at Instagram just by volume of code written. Oh wow. Um so I've always, you

written. Oh wow. Um so I've always, you know, for me I've I've always just coded a lot. Like this is uh coding is like a

a lot. Like this is uh coding is like a way that I can express myself and it's just like it's a way that my brain thinks also. And so now I just get to do

thinks also. And so now I just get to do it. But I I think with quad code the the

it. But I I think with quad code the the the kind of code that you write if you are very productive it it tends to be even it's just the number of PR sort of underelves what what's happening because

I I think people that used to be very productive in the old days before AI assistance a lot of the code maybe was like code migrations or something like this so like people that shipped you know 20 30 PRs every day a lot of it was

like pretty you know like a oneliner or kind of migrating A to B or whatever.

Nowadays I ship you know 20 30 PRs every day but every PR is just completely different. Some of them are thousands of

different. Some of them are thousands of lines, some of them are hundreds, some of them are dozen, some of them are oneliners. It's none of these are kind

oneliners. It's none of these are kind of code migrations cuz actually Claude just does those and I I don't need to be part of that.

>> Shipping this much code or this much productive. The obvious question that

productive. The obvious question that comes up for any I guess software professional is well the review. What

the way teams used to work and I'm not sure if Instagram did this but a lot of other companies did this is you make a pull request you put it up there there's a mandatory human reviewer at Google

there's actually two cuz there's one on code quality as as well how has this workflow changed how does the hot code team think about code review and how has it changed over time yeah I'll start by

thinking I I'll start by talking about how code review used to work for me so the the way that I used to do it is uh every time I I also used to be one of the most prolific code reviewers.

>> Oh, okay. So, both.

>> I I met Yeah. Yeah.

>> Right. Or is it code reviewers?

>> That's actually and that's one of the benefits of being in a different time zone. Like I'm not super human. I just

zone. Like I'm not super human. I just

didn't have any meetings. And the the way that I approach code review is every time that I would have to comment about something, I would drop it in a spreadsheet and I I would like describe the issue.

So, let's say, you know, like someone named a parameter, you know, in a function badly, I would like put that in a spreadsheet. If someone did some bad

a spreadsheet. If someone did some bad React pattern or something, I would I would put that in a spreadsheet. And

then over time I would just kind of tally up the spreadsheet and anytime that a particular row had more than three or four instances I would write a lint rule for it.

>> So just automate it with kind of an op.

And so that's what it used to look like for me. I've always tried to automate

for me. I've always tried to automate myself away um because there's just so many things to do. Um and this is one of our superpowers as engineers >> is we were able to automate all of the

tedious work. There's very few other

tedious work. There's very few other fields where you're able to do this thing. This is a thing uniquely that

thing. This is a thing uniquely that we're able to do. Um, and this is a thing that I I've just always enjoyed because it gives me more free time and uh I get to do the work I actually enjoy. And so today the way this looks

enjoy. And so today the way this looks is a little different, but it it mirrors this a little bit. So when cloud code writes code, it generally it will run tests locally. And this is something

tests locally. And this is something cloud just often decides to do when it's relevant or it'll write new tests. So

you kind of do this this kind of verification. When we make changes to

verification. When we make changes to cloud code, cloud will also test itself.

So it'll launch itself kind of in a subprocess. It'll verify itself and

subprocess. It'll verify itself and it'll test itself end to end.

>> This is for the the your internal cloud code implementation. So you have like

code implementation. So you have like this test suite so they can test itself.

>> Yeah, that's right. That's right. But

it'll literally launch itself just in a bash process and kind of just see like hey do I still work.

>> Wow. Okay. So it'll do this and this is something that we we just didn't code in like it just with Opus 4 4.5 especially it just sort of spontaneously doing this. It just wants to kind of check. So

this. It just wants to kind of check. So

so we do this and then we also run claudep. So this is the quad agent SDK

claudep. So this is the quad agent SDK in uh CI. So every pull request at Enthropic is code reviewed by quad code.

Uh and that actually catches maybe like 80% of bugs something like this. Um and

it's the first round of kind of code review. Cloud will automatically address

review. Cloud will automatically address some of these. Some of them some of them it'll leave to a human cuz it's not sure what to do. There's always an engineer that does the second pass of code review. Um and you know there there

review. Um and you know there there always has to be a person in the loop approving the change.

>> Mhm. So on on on the team before anything goes into production if you will an engineer does look at it. Yes.

As you're thinking of code review would you do this for every type of project or this is specifically because you now know that this actually has real world impact people depend on it. You know

there's a lot of users let me put it the other way around like can you see places where you would just not have an engineer review uh code. What situations

would that be in?

>> I think it depends how how how it's used. Yeah I'd agree with that. But you

used. Yeah I'd agree with that. But you

know if you're building some personal side project like you can just yolo straight to main you know like >> it's even even before AI you would have not reviewed you just trust yourself or

you know just ship to production or SSH into production and do some changes that kind of stuff right >> exactly exactly um the very first versions of quad code that were internal like you know I committed straight to main but then you know as soon as you

have users and you know for enthropic our main customer base is enterprises this is what we care about the most for us for safety reasons security is really important privacy is important. These

are these are all related. It's also

very important for our customers. And so

because this is an enterprise product, it has to be secure. It has to be we have to make sure that it meets a certain bar. So we definitely use a lot

certain bar. So we definitely use a lot of automation, but at least for now, there has to be a human in the loop just to make sure.

>> One thing that is just known about LM is they're nondeterministic.

And by putting the element as a reviewer claude doing a review like it it will give good feedback but how do you deal with the fact that you can be sure if

it's always giving the feedback you cannot be sure that even if it's capable of catching an issue that it will necessarily catch that. Are you doing anything in in this loop to do deterministic thing? For example,

deterministic thing? For example, linting is very deterministic as you will very well know. Like have you thought of marrying some of these ideas or are you using for example are using llinters on the codebase or you found no need to for it? Yeah, absolutely.

Absolutely. Yeah, you

>> this is just a Yeah.

>> Yeah, we we have type checkers, we have llinters, we run the build. Claude is

actually so good at writing lint rolls.

So, actually what I do now, I used to tally stuff up in a spreadsheet. Now,

what I do is when a coworker puts up a pull request and I'm like, this is lintable. I'll just be at Claude, please

lintable. I'll just be at Claude, please write a lint roll for this in that PR on their PR. And we have, you know, you

their PR. And we have, you know, you just run like slash I think it's like setup GitHub or or something like this.

You can do this in cloud code and it'll install the GitHub app which then makes it so you can tag add Claude on any pull request, any issue. I use this every single day. Um, so very very useful. So

single day. Um, so very very useful. So

you want these deterministic steps. Also

though there are there are ways to get cloud to be a little bit more deterministic. So for example, you can

deterministic. So for example, you can do best event. You can have it do multiple passes >> and and this is actually quite easy to do. So you know for example the

do. So you know for example the coderview skill that we use internally it's open source um and it's available in the quad code repo and so all we do is you know we launch parallel agents to

do stuff and then we launch parallel dduping agents to check for false positives but essentially best of end the way you implement it is is all you say is claude start three agents to do

this and that's it. or just talked about building that enterprise infrastructure layer, the O, the permissions, the security that has to all work before you can ship to real customers. This makes

it a great time to speak about our season sponsor work OS. If you're

building any SAS, especially an AI product one, then authentication, permissions, security, and enterprise identity can quietly turn into a long-term investment. SL edge cases,

long-term investment. SL edge cases, directory sync, audit logs, and all the things enterprise customers expect. It's

a lot of work to build these mission critical parts and then some more to maintain them. But you don't have to.

maintain them. But you don't have to.

Work provides these building blocks as infrastructure so your team can stay focused on what actually makes your product unique. That's why companies

product unique. That's why companies like Antrophic, OpenAI, and Cursor already run on Work OS. Great engineers

know what not to build. If identity is one of those things for you, visit work.com.

work.com.

And with this, let's get back to building cloud code with Boris. How does

cloud code work in terms of ar architecture? So as as an engineer, how

architecture? So as as an engineer, how can I imagine it's setup? It's uh we we covered some of this in the the deep dive and I think you told me that you had some pretty complex ideas when you started and you just simplified a lot of

it.

>> Yeah. Yeah. It's very simple like you know there there's not much to it.

There's like there's a core query loop.

Uh there's a few tools that it use that it uses. We we delete these tools all

it uses. We we delete these tools all the time. We add new tools all the time.

the time. We add new tools all the time.

We're just always experimenting with it.

So there's kind of this core kind of agent part of it. Then there's the the 2E part of it. Uh and then there's there's actually a ton of different pieces around security. Um and making

sure that everything that QuadCode does is safe and that there's a human in the loop for when it happens.

>> And by safety, do you mean as as a user when it's doing stuff on my computer or also as entropic monitoring use cases that that could be deemed unsafe? Yeah,

there's kind of a couple versions of this. You safety, there's just many,

this. You safety, there's just many, many layers and for things like safety and security, there's no one perfect answer. So, you know, it's always a

answer. So, you know, it's always a Swiss cheese model. You just need a bunch of layers and with enough layers, the probability of catching anything goes up. And so, you just have to kind

goes up. And so, you just have to kind of count the number of nines in that probability and pick the threshold that you want. And so, for something like

you want. And so, for something like prompt injection for example, we do this generally at three different layers. So,

let's think about something like web fetch. So cloud fetches a URL and uh it

fetch. So cloud fetches a URL and uh it reads the contents of of of that web page and then it does something in in quad code. So one of the risks for

quad code. So one of the risks for something like this is prompt injection.

Maybe there's an instruction on that website to be like hey quad delete all the folders or something like that.

>> So we think about this in a number of ways. The the most basic way is it's an

ways. The the most basic way is it's an alignment problem. And so opus 4.6 is

alignment problem. And so opus 4.6 is the most aligned model we've ever released because we've taught the model how to be more resistant to prompt injection. And so you can read about

injection. And so you can read about this on the model card and I think it was part of the release. The second part is that we have classifiers at runtime where if there is a request that seems

to be prompt injected, we block it um and we just make the model try again.

And then the third layer is for something like web fetch, we actually summarize the results in using a sub agent and then we return that summary back to the main agent. So again, this kind of reduces the probability of

prompt injection. And so you can kind of

prompt injection. And so you can kind of see how this isn't just one mechanism.

It's it's a layer and by by having a bunch of these different layers, it just reduces the probability a lot.

>> One interesting technical choice that you've also mentioned is is using rag or not rag retrie retrieval augmented generation and you mentioned how in the

earlier version of cloud code you use a local vector database to to get some to to speed up search and you layer threw this away. Can you talk about how this

this away. Can you talk about how this one because this was another example where I guess did the model get better?

>> Yeah, I mean this is one of those things where we try so many different things.

We try so many different tools and just statistically most of them we throw away.

>> Even something like the spinner in quad code I think it's gone through like a hundred iterations >> I want to say. Oh

>> just the spinner and you know out of those we've landed maybe like 10 or 20 in production and like 80 of them I probably just threw away cuz it didn't feel good enough. So just statistically almost all the code we write we throw

away because it's just so easy to write this code and try stuff and see what feels good. So for something like rag we

feels good. So for something like rag we tried a bunch of different approaches early on. So the the first one was rag

early on. So the the first one was rag for retrieval cuz I think this I was just like reading up like how people were doing retrieval and it seemed like all the papers were talking about rag.

Um and so the way I did it was it was like a local vector database. I think it was like written in Typescript and it just lived on the user machine. Uh and

then I was using some like embedding uh model that was in in the cloud to compute the embeddings before storing it. Um and that that worked like pretty

it. Um and that that worked like pretty good, but there's a lot of issues with rag. Um so for example, I was finding

rag. Um so for example, I was finding that the code drifted out of sync. Like

if I make a local function, it's not yet indexed and so rag isn't going to find it. There's also this question of like

it. There's also this question of like how exactly is the index permissioned?

So who can access it? I can access it.

Um but then how do we like encode that in kind of permission policies? How do

we make sure no one else can access it?

How do we make sure that like if there's a rogue IT person within the company, they can't access someone else's data?

This is really really important that we think about this.

>> Yeah.

>> Um and so we just decided like it was sort of working, but it was it also has a lot of downsides. And so we tried a bunch of other stuff. Uh one of them was just using the model to uh kind of index

everything recursively. Um that was kind

everything recursively. Um that was kind of a cool idea. There was another version where um we just tried glob and gp. We tried a bunch of different stuff.

gp. We tried a bunch of different stuff.

It it turned out that agentic search just outperformed everything >> and and when I say agentic search, this is a fancy word for glob and grap.

That's all it is.

>> Nice. So So the model both got good enough and you realize that it can use these tools pretty efficiently.

>> Yeah. And this was uh it was partially inspired honestly by my experience at Instagram because at at Instagram click to definition didn't work because the the dev stack was just borked like half

the time and I think now it's better.

And so what engineers weren't to do instead is let's say you're looking for the definition of the function fu instead of click to definition what you would do is you would use the global index which is quite good at meta and

then you would search for fu per opening parenthesy and this worked pretty well and it it's funny because like this works for the model pretty well too

interesting how one one idea from one area can come to the other one of the more advanced parts of cloud code that we've also previously talked about is

the permission system. Can you talk about what was complex about it? And

also you recently open source sandboxing, right? Permissioning is

sandboxing, right? Permissioning is really complex. Um there's like

really complex. Um there's like everything else that has to do with security. It's a Swiss cheese model.

security. It's a Swiss cheese model.

There are a number of classifiers that run to make sure the command is safe. Um

and there's also static analysis that we do to make sure the command is safe. As

a user, you can also allow list particular patterns that you know to be safe. So, for example, um some standard

safe. So, for example, um some standard Unix utilities we preow because we know they're readon because we know they can't expilt your data or anything like this. So, we we just won't prompt you

this. So, we we just won't prompt you for permission. But actually quite few

for permission. But actually quite few tools fall into this category because even something like the find command, there's actually a way to execute arbitrary code as part of that command because there's there's like system

flags that you can use for this. or even

something like the said command. There's

ways to use this. So there's just like all this like arcania about these various Unix utilities where it's actually not as safe as you think.

>> And so we want to be by default fairly conservative about what we allow by default. As a user though you can

default. As a user though you can configure an allow list. So you can say for example like the these patterns are allowed the these patterns are not allowed. Uh and so we we let you define

allowed. Uh and so we we let you define that and we also check this allow list to to make sure that it's safe.

>> Yeah. And then you you have this like neat permission system where every time you run a command that needs permission, you can decide to run it once or run it for either this session or whatever it

makes sense or just globally allowed going forward. Right. That's right. This

going forward. Right. That's right. This

is a funny artifact. This was actually in the very very first version of quad code. This is the way permissions

code. This is the way permissions worked. This is the very first release.

worked. This is the very first release.

This was like September 2024, the first internal release. I remember at the time

internal release. I remember at the time we weren't sure whether agentic safety could be even be solved. And so there was actually a lot of push back internally from safety teams because they were like okay like you can't just

run let the model run bash commands like that's unsafe. So like what do you do

that's unsafe. So like what do you do like this is not a solvable problem so like we can't launch this. I I

brainstormed with Ben man and Ben was he started the labs team. He's one of the founders at Enthropic. Um he's actually he's the the person that hired me to Anthropic. We just came up with

Anthropic. We just came up with permission prompts as the way to do this. You you put the if you're not sure

this. You you put the if you're not sure just ask the human and and they can decide.

>> Yeah. I wanted to ask you about how software engineering is done in general in terms of Antrophic and one of the first questions which is a I guess a

more formal one but or from the outside is titles or lack of them. Everyone at

Antroic has the same title member of technical staff. Why did this happen and

technical staff. Why did this happen and what does this result in this kind of like everyone there basically no titles right except for one? I think it's kind

of an acknowledgement that um everyone just is figuring stuff out. And um if if you kind of squint and look at the work people are doing, it's all quite similar

and it's it's kind of quite generalist and if you talk to the average software engineer, they might not just be doing coding. They might also be doing a

coding. They might also be doing a little design. They might also be

little design. They might also be talking to users. They might be writing their own product requirements. They

might be writing software and also uh you know doing research. They might be writing product code and also infrastructure code. At anthropic

infrastructure code. At anthropic there's a lot of generalists. This is

also you know from my background. This

is one of the reasons that I gravitated towards it. And I I I think member of

towards it. And I I I think member of technical staff just kind of encodes this in in the way that people talk to each other even if they don't know each other. Without this title the default

other. Without this title the default would have been I see your name on Slack and under your name it says software engineer. And then I'm like well okay I

engineer. And then I'm like well okay I guess you're like you're the coding person then. So I'm I'm not going to ask

person then. So I'm I'm not going to ask you like product questions, but when everyone's title is member of technical staff, by default, you assume everyone does everything. And so it kind of

does everything. And so it kind of inverts this this relationship between people even if you don't know each other well yet. In in a way, it's kind of this

well yet. In in a way, it's kind of this like optimism built into the built into the structure. Um I think it's also a

the structure. Um I think it's also a glimpse of the future because I I think this is where software engineering is going. I think this is where every

going. I think this is where every discipline is going is more of this generalist model. It definitely feels

generalist model. It definitely feels like it in in software engineing. And I

I heard this funny uh comment by Mark Andre uh how we said that there's this Mexican standoff happening in the tech world where the the designers are are saying that they're actually now doing

like PM and engineering work. The

engineering are saying we're doing design and and like everyone thinks they're doing the work of the others and they're kind of standing there like I'm doing your work as well. when the

reality is everyone's role is expanding most of it thanks to AI because it makes easier for an engineer to do product work or for a product person to engineer work and so on. So just what what you've said

>> I I remember back in the back in June or July of last year I I walked into the office and the data there's a row of uh data scientists that sit right next to the quad code team at least at least at

the time and I walked in and our data scientist for the quad code team had quad code up on on his monitor and um he he was using it and I was like this is interesting cuz you're you're a data

scientist did you have like why are you using a terminal like you didn't have NodeJS installed cuz we depended on Node.js JS back then. I I was like, "Are you are you dog fooding it? Like are you just like trying to like figure out how this thing works or something?" He's

like, "No, no, I'm like I'm using it to run queries." He was just like using it

run queries." He was just like using it to run SQL and it had like little like ASKI visualizations uh in the terminal.

Uh and then the next week the entire row of data scientists had quad code running on their computers and and this expanded and so if you look at the team today on

the quad code team everyone codes the engineers code our engineering manager codes designers code uh data scientists

code uh our finance guy codes everyone on the team codes and I think part of it is quad code just makes it so easy so you don't really have to understand the codebase. You can just like dive in and

codebase. You can just like dive in and and kind of make small changes quite easily. But I think another thing is

easily. But I think another thing is people are able to use cloud code to do their jobs more whether it's you know financial forecast or you know data science or whatever and by doing this

it's actually quite an easy crossover to just use it to write a little bit of code also. So it's just a way to dip

code also. So it's just a way to dip your toe in the water. One other

interesting thing about how you work is Cat Woo was talking about she is I guess you the title is the same but people might gravitate for role a bit more. I

understand she's a little bit more on a product role but you said that PRDs are just not really written inside entropy and PRD's product requirement document.

It's a well-known artifact across big tech and increasingly over larger startups where you write a spec and the idea is that you write down your thoughts, people align, you send it over and now you know what to build. But

apparently you're not doing much of this or at all.

>> Some of this I think is because Anthropic is still, you know, it's still a startup. So you you don't actually

a startup. So you you don't actually have to align with that many people usually. You can just kind of talk about

usually. You can just kind of talk about it or do it in Slack or whatever. Um but

yeah, also part of it is, you know, like Cat used to be an engineering manager.

She's she's extremely technical and I think this is this is the way that you know our product team thinks about it too is you know better send a PR.

>> You're you're doing a lot of prototyping instead. So like that that's also

instead. So like that that's also something where when we talked about how you were building cloud code early on you were showing actually you had a whole thread about the number I think

you did like 15 or 20 prototypes for the the to-do list and all of them interactive working and what surprised me compared to my past tech experience and you said that well you did this in

like a day and a half all all 20 tried it out got a feeling for it which incomprehensible for me it would have taken a week or two weeks and people would have not done 20 they would have done three. Yeah.

done three. Yeah.

>> So like are are you seeing this? Is

there an increase in in prototyping and and building and showing instead of you know writing things?

>> Yeah. Absolutely. I mean on our team the culture is we don't really write stuff.

We just we show. It's a little hard to to reflect back on the time before cuz I I think now just prototyping everything is so baked into the way that we build.

Just everything is prototype multiple times. Like uh you know we launched

times. Like uh you know we launched agent teams earlier this week. This is

our implementation of swarms. It it's very exciting because uh it just lets Claude do more work for longer, more autonomously. You have a bunch of

autonomously. You have a bunch of different uh uncorrelated context windows and you have this kind of communication between agents. They can

just do more. This is something that uh Daisy and Suzanne and other folks on the team uh and and Karen, they they prototyped this for months and they tried all in all probably hundreds of

versions of this before they got a user experience that felt really good. um it

was just really really hard to get right. There's just no way we could have

right. There's just no way we could have shipped this if if we started with, you know, like static mocks in Figma or if we started with a PRD or something like this. It's a thing that you have to

this. It's a thing that you have to build and you have to feel and you have to see how it feels. And to me, one of the big takeaways even from there was like we probably should prototype more and just be more daring or just release

your priors of how long it took to build a prototype or who needed to build. Back

then it was always an engineer that needed to build, but it's probably not true anymore. Yeah, that's right. I

true anymore. Yeah, that's right. I

mean, we're in this world right now also where we just we don't know what the right answer is. You know, like I I think back in the old way of building you the cost of building was high and so you had to actually spend a lot of

effort to aim very carefully before you take your shot because after you take your shot um it it's very hard to course correct. You can only take so few shots.

correct. You can only take so few shots.

But now it's changed. The cost of building is very low. Um but also we don't know where we're aiming. So we

just have to like we have to try and we have to see what feels good. And it's

just very very exploratory. And I think also a big part of it is humility where you know personally I'm wrong like half the time I'd say like most of my ideas are bad. At least half of them are bad.

are bad. At least half of them are bad.

And I don't know which half until I try it.

>> And I get feedback from others as well sometimes.

>> That's right. It's like I I have to try it myself and then I have to see what others think cuz you know my intuition does not always match others. When you

were showing these prototypes of just how the the tasks were built, you were telling me that you built the prototypes and then your process was always you first like looked at it, you tried it

out, you got a feel for it and then for the ones that you felt were good, you showed it to others and sometimes they give you feedback like nah this doesn't work and then sometimes when it felt good then you shared it even broader. So

I feel like you know like it's a mix right where like sometimes you can decide already and then sometimes you get feedback and then eventually some good ideas come out of it. Yeah, and

there's a lot of examples of this like uh we we launched this kind of condensed view for file reads and file search just because the the model is just so agentic now like I felt like half the screen is these like file reads and I actually

don't care like I you know I read a thing I don't really care what it is and so we condensed this down to make the output a little bit more readable. I

really liked it after probably 30 prototypes or something like this. It

took it took so much effort to make that feel really good and clean. We rolled it out to employees at Enthropic for about a month and we had everyone dog fooded and I fixed another probably dozen dozen

bugs, dozen tweaks based on all this feedback. We launched it externally and

feedback. We launched it externally and you know almost all users liked it but there were a few users that didn't because they want more expanded output.

Um and so on the GitHub issue I was just going back and forth with people to be like you know what like what don't you like and people gave a lot of feedback.

I shipped another version. Then some

people liked it, some people didn't. And

so I iterated again and kind of made it good. And it it's actually I think

good. And it it's actually I think almost there where people can configure it the way that they want, but still the default is really good. But this is just the process. You know, we we get it

the process. You know, we we get it right some of the time. We have to learn from our users. We want to hear from people so we can get it right.

>> Do you use ticketing systems for your work where you know where where you capture like, all right, here's the work I I want to or do you just pretty much do the work as as it comes in?

>> So at Anthropic, we leave it up to teams on the quad code team. and we leave it up to every person. Uh different people use uh use this differently. For

example, I don't use a ticketing system.

Some people like to use a sauna or notes or something like this. One of the coolest things that I saw, this was maybe like 3 months ago or something. We

launched plugins and the way we launched that is uh Daisy for a weekend, she had a very early version of swarms and she let the swarm run and she told that your

job is to build plugins. You have to come up with a spec. Then you have to make a asauna board and split up into tasks. And then all the different agents

tasks. And then all the different agents have to build it. And uh she set up a container and she set up a quad in dangerous mode. And she let it run for

dangerous mode. And she let it run for the entire weekend. It spawned a couple hundred agents. They made 100 tasks on

hundred agents. They made 100 tasks on the sauna board. Uh and then they implemented it. And that's pretty much

implemented it. And that's pretty much the version of plugins that we shipped.

These kind of coordination systems that used to be for humans, but um I think nowadays it's just as much for models.

Let's let's talk about cloud co-work. Uh

it's one of the very impressing things about this. It looks great. So I tried

about this. It looks great. So I tried it out. It's inside cloud. You have the

it out. It's inside cloud. You have the co-work tab there and and you can I I feel it's a lot more visual way of of running agents interacting with them.

One of the surprising thing I heard that it was built in 10 days. Can can you take us through like what it took to build it and what does actually mean?

Was it from the idea or like from the decision of of building it? And how big was the team building it?

>> The team was really small. It was just a few people for a long time. We felt that there is some product to be built for non-engineers. The reason we felt this

non-engineers. The reason we felt this is for a long time people that were using cloud code are non-engineers. Um

and so you know in the product world when you see latent demand you see people jumping through hoops to use a product that was not designed for them.

That's a really good sign it's time to build another product that is built just for them. There's all these people on

for them. There's all these people on Twitter that there's this one guy that was using uh quadco to like monitor his tomato plants. I just I love this. It

tomato plants. I just I love this. It

was like he had like a webcam set up and quad was like, "Oh my god, I'm so happy that our plant is budding." And because it was it had like a webcam and just like every day was like monitoring it and it it was so happy that the tomatoes were growing. There was someone that was

were growing. There was someone that was using quad code to, you know, recover photos off of a corrupted hard drive and it was like his wedding photos.

>> Wow.

>> Um you know, like I said, our entire finance team at Anthropic uses quad code. Our sales team uses quad code. So

code. Our sales team uses quad code. So

there there's just all these people that are non-engineers that were using it.

And at that point quad code it's available in a lot of form factors right like we started in a terminal then we expanded and we added support for ideides. So we have extensions for you

ideides. So we have extensions for you know every VS code based ID every Jet Brains based IDE there's also iOS and Android apps there's the desktop app uh

there's web. So uh then then there's

there's web. So uh then then there's like Slack and GitHub apps. So we kind of expanded to all these places to make cloud code easier for engineers. But

ultimately none of these are built still for non-engineers. And so cloud code

for non-engineers. And so cloud code evolved a lot, but it still felt like there's a there's kind of a gap and there's a product that could make this even easier for people. And so for the last couple months, the team was kind of

hacking around and just saying like what is the right product? And at some point someone came up with this idea of like what if we just take quad code, add some guardrails. So for example, co-works

guardrails. So for example, co-works with a virtual machine. This is one of the many ways that we make sure it's really safe. Um, especially for

really safe. Um, especially for nontechnical users that don't want to read like bash commands to figure out what it what it's doing. And they were hacking on this. I think it was something like 10 days end to end or

something. It was just fully built with

something. It was just fully built with quad code. Uh, and then we shipped it.

quad code. Uh, and then we shipped it.

>> And can you give us a sense of like the complexity behind an app like this? And

if if we can walk through like what parts needed to be built because from the outside it's a little bit hard to tell like is this just a nice UI wrapper that's you know like I don't know like a few hundred lines of code. I'm just

being obviously I'm I'm provocative here or behind the scenes it's actually really complex piece of software. And

the reason I ask is like Uber is a great example where people look at the app it looks really simple. I've worked there and I know it's it's really really complex because you don't see a lot of the complexity. There's a a lot of

the complexity. There's a a lot of regional things. There's a lot of

regional things. There's a lot of backend things that are all hidden. So

from just from looking at it, claude coowork, it's it's hard to tell how much of this is is additional business logic that needed to be carefully thought out versus it's actually just a nice little

thin wrapper on top of the the model. In

some places, I think there's less complexity than you would think. In some

places, there's more complexity. So on

the product side, it's quite simple um cuz it's just the quad desktop app. So

you know, you download the Quad app.

It's it's a single desktop app. It has a tab for co-work, it has a tab for code, it has a tab for chat. So it is just one app and we were able to inherit a lot of that product logic. There's some UI rendering code under the hood. You know

it's just the same quad code running.

It's the same quad agent SDK that powers quad code. A lot of the complexity

quad code. A lot of the complexity actually is about safety because we know like I said we know the user is nontechnical and so we just want to make sure they have a good experience and so for example if someone launches the app

and then you know like they delete a bunch of family photos that's really not good and so we wanted to make sure that we protect against this so you can't accidentally do that. And so that's where a lot of the guardrails came from.

So there's a bunch of classifiers running on the back end. This is for safety and again extra mitigations for things like prompt injection and you know risks like this around security. On

the front end there's an entire virtual machine that we ship. There's a bunch of operating system system level integrations to make sure people don't accidentally delete things. So just

around safety there there's a lot there.

And then we also had to rethink the permission system because we inherit the permission system from quad code. Um but

also for co-work actually a big part of the value is not just running locally but it's using all of your tools the way that quad code uses it. But the thing is for nontechnical users your tools aren't

really available as CLIs. Some of them are available over MCP. Many of them are available in a browser. And so co-work is really really good when you pair it with a Chrome extension. And this is the

way that I usually use it. So, you know, for example, I use it every week to do uh project management for the team. We

have like we have a spreadsheet that tracks kind of at a really high level what everyone's working on. And this is kind of my personal way of project managing. You know, other people, like I

managing. You know, other people, like I said, use ASA, other people use notes or whatever. For my own test, I don't use

whatever. For my own test, I don't use anything, but kind of for the team overall, I have the spreadsheet and I have co-work kind of check-in and I I just ask co-work every week, hey, can you look at the rows for any status that

has not been filled out? Can you just ping the engineer on Slack? And so it'll open one tab in Chrome for the spreadsheet. It'll open another tab with

spreadsheet. It'll open another tab with Slack and then it'll just start messaging engineers in Slack and it just oneshots it. There's like one engineer's

oneshots it. There's like one engineer's name for some reason it can't autocomplete. Um but every everything

autocomplete. Um but every everything else it just gets. And so this is actually like from a safety point of view, we also thought pretty deeply about this Chrome extension and how this works and how the permissioning model

should interact with this local permissioning model. So there's also a

permissioning model. So there's also a bunch of code to kind of make sure that that's that feels smooth. And what's the tech side behind this? I assume a lot of will be similar to the the cloud app, but is it is it electron, typescript, those kind of things or or something

else?

>> Yeah. Yeah, just electron and typescript. Actually, some of the people

typescript. Actually, some of the people working on it are early electron folks.

So, uh Felix who's uh you know the creator of of co-worker on electron. He helped build it.

on electron. He helped build it.

>> Oh, amazing. And co-work launched Mac OS only. uh what was the reason for both

only. uh what was the reason for both for choosing this platform first and for now only choosing this platform?

>> Yeah, so Windows coming soon. Um I think probably by the time this podcast comes out we will have Windows support. Uh we

just wanted to start early and start learning you know like everything we do at Enthropic it's kind of like the way that I told my own story the one of the things I like about anthropic is it just

really really matches the way that people here think about it. you know,

back to this point where like we don't have high certainty about the things that we build and our intuition is often wrong and so we just have to like learn from users and figure out what people actually want and just spend a lot of time listening to people and

understanding the feedback deeply. This

is the way that we build product and so we always launch a little bit before it's ready. Um we did this for quad code

it's ready. Um we did this for quad code when we launched quad code initially it didn't even support Windows also it didn't support you know like a lot of different stacks and then over the coming weeks we added support for every

stack. Now quad code supports every

stack. Now quad code supports every single stack. Um you know like Windows

single stack. Um you know like Windows whatever weird Linux dro use Mac OS we support everything and so for core work also we just wanted to launch early we wanted to start with Mac as that was

just the starting point but um yeah it's it's going to support everything. One

thing you mentioned is is getting feedback. I'm curious both for cloud

feedback. I'm curious both for cloud code and for cloud co-work. How do you go about things like observability monitoring when you're rolling out? Do

you use any feature flags? And I'm I'm more interested in like did you build custom tools for this or did you decide to use certain vendors because es especially for observability I'm sure

that this is this is both important but it also sounds like pretty high scale in terms of the the number of users that we can derive or this will not be a small operation. Yeah there's there's some

operation. Yeah there's there's some off-the-shelf vendors that we use there's some custom code that we use. So

um it's actually it's a mix of both.

There's nothing too surprising about it.

There's one thing about Enthropic that's kind of interesting is because we're an enterprise company and we care a lot about privacy and security, we can't see people's data. Um, and so, you know,

people's data. Um, and so, you know, like if someone reports a bug, like I actually can't pull up your logs to kind of see what's going on. A lot of work goes into kind of figuring out how to log events and things like this in a

privacy preserving way. Um, this is just very important to the way that we operate >> for co-work. What kind of learnings have you had so far? It's it's it's been out for I think a few weeks now. Did you see

something unexpected? uh are you shaping

something unexpected? uh are you shaping the product based on feedback that you're getting?

>> Yeah. Uh every day the team is landing so many fixes. The most surprising thing is just how much people are loving it.

To be honest, when Quad Code first came out, it actually wasn't an overnight hit. This is something people think it

hit. This is something people think it was, but it was sort of a slow take off at the beginning. And I think the first big inflection was in May when we released Opus 4 and Sonnet 4. That's

when it really clicked and that's when our growth became exponential. But at

the beginning, it was sort of a research preview. people didn't really know how

preview. people didn't really know how to use it. Some people got it immediately, but most people didn't. It

took it took a little while. For

co-work, it's a much steeper growth trajectory than quad code was at the beginning. So, it it's just been an

beginning. So, it it's just been an instant hit. And that that's actually

instant hit. And that that's actually been very surprising. I I didn't really expect that. One of your new releases,

expect that. One of your new releases, which came out just very recently, it was I think yesterday or the day before when we're recording this podcast, was agent teams. And I as I understand the

idea with what agent teams agents forms instead of single agent you can have a lead agent and it can delegate to its different teammates. How did you start

different teammates. How did you start experimenting with this and how did you decide to ship it? Now we're always doing experiments right there's uh there's there's all sorts of ways uh to

get more mileage out of out of quad code. Um one way you can do it is by

code. Um one way you can do it is by extending context. Another way is autoco

extending context. Another way is autoco compacting context. So it's essentially

compacting context. So it's essentially infinite context and that's what we have right now. Another way is using sub

right now. Another way is using sub agents. So you have multiple agents kind

agents. So you have multiple agents kind of working together. Um there's just like a lot of different approaches to get a little bit more mileage out of the context window. There's this one idea

context window. There's this one idea called uncorrelated context windows.

That's what we call it. And the the idea is you have multiple context windows. Um

but they essentially start fresh. So

they don't know about each other. And so

an example of this is like a correlated context window is if you have one if you have the model and it does a task and then you have it just do a second task in that same context window. Um and in this case the the second task knows

about the first one cuz it's in the same window. But for something like a sub

window. But for something like a sub aent it's uncorrelated because the main agent prompts the sub aent but the sub aents context window is fresh. Besides

that prompt it doesn't know what's in the parent context window. And you can see this actually a little bit in uh for example like sub agents versus uh skills because when you run a skill uh you know

or slash command it sees the parent context window versus for a sub agent it doesn't. So it's uncorrelated. There's

doesn't. So it's uncorrelated. There's

some cases where you want that context.

There's some cases when you don't. Um

and there's this kind of interesting thing where uncorrelated context windows and just throwing more context at the problem and throwing more tokens at it when the windows are uncorrelated gives you better results. Um, it's actually a

form of test time compute to do this.

And for something like teams, we've been experimenting with this for a while. I

think since maybe like October or September or something like this, and it really just felt like with Opus 4.6, it clicked where the model figured out really how to use this. And sometimes

you see these kind of cute exchanges where the agents are talking to each other and they're like discussing something and it's just very cool to see. It's very like humanistic in a way.

see. It's very like humanistic in a way.

But there's other times where you just get very good results. And so we had a bunch of internal evaluations for example where we have quad build something very very complex, something more complex than what a single quad

would build. And we saw the results just

would build. And we saw the results just really really improved with Opus 4.6 with teams. And that's why we felt it's the right time to release it. We also

wanted to be careful. Um, and the reason you have to opt into it, the reason it's a research preview is it uses a ton of tokens cuz it's just a bunch of quads that are running. Um, not everyone wants

this all the time. So just excited to see how people use it and uh you know to to hear the feedback. It's it's

something you want for fairly complex tasks. You don't probably want this for

tasks. You don't probably want this for every task. The main quad decides the

every task. The main quad decides the rules for the sub quads. We don't have a kind of a regimented way to do this.

It's context specific. I wouldn't say there's one right way to do it. I think

actually a lot of the magic of this comes out of this idea of uncorrelated context windows. It's less about the

context windows. It's less about the specific configuration of the agents.

But you know it's something that people should experiment with. I don't think there's a one-sizefits-all.

>> Have you seen use cases even in even I I know it's it's still research, but have you seen use cases where it could look it looks promising this approach, the swarm approach?

>> Well, you know, like I said before, plugins were fully built with swarms. There there's a bunch of other feature since that were built in this way. So

yeah, I I think for anything where you see a single cloud struggling, swarms can help. It's it's an interesting to

can help. It's it's an interesting to look at. Talking about change in in

look at. Talking about change in in general with Andrew Carpathy, you had a really interesting exchange back in December where when he posted that he's never felt as much behind as as a

programmer as he is now because of the progress with AI. And then you shared the story about how you started to debug a memory leak the oldfashioned way and then Claude just one shot at it. I think

it was a reflection of like how everyone is feeling that things are changing so fast and in the in the holiday break I started to feel that things have have really shifted. How did you I guess come

really shifted. How did you I guess come to terms with this or or start to embrace this change? This is something I really struggle with. The model is

improving so quickly that the ideas that worked with the old model might not work with a new model. the things that didn't work with the new model might work or with the old model might work with a new

model. And it's weird because there's

model. And it's weird because there's just not a lot a lot of other technologies like this. So I I just don't really have a lot of experience to draw on to figure out how I should

approach this. And it's been this new

approach this. And it's been this new skill that I've had to learn. In a way, it's like you just always have to bring this beginner mindset. Honestly, like

I'm using the word humility a lot, but you always just have to bring this kind of intellectual humility because just all these ideas that were bad before are now good and and and the inverse. I I

think that's honestly it it's something I I constantly have to remind myself about. And back in the It's funny back

about. And back in the It's funny back in the old world when someone tries an idea again and we've tried it in the past and it didn't work, usually the feedback is like, why are you doing this again?

>> Yeah. Yeah. You should learn. This used

I mean we used to call a bit of a gatekeeping but it was somewhat valid where I know with architecture someone came and said like why don't we do microser and someone said we tried it and it didn't work and if you tried it a

year or two or 3 years ago it was kind of valid right cuz not much has changed.

Yeah, that's right. That's right. And

something with Microsoft, it's it's funny because it's like every 10 years it goes in and out of in and out of style. But yeah, now now it's I think

style. But yeah, now now it's I think the first time ever where it's actually not crazy to just try the same idea every few months because the model improves and it just works. And I I actually see this with engineers on the

team. Like new people that are newer to

team. Like new people that are newer to the team, people that are newer to engineering sometimes do things in a better way than than I do. Um and I just have to like look at them and I have to learn and I have to adjust my

expectations. you know, like an an

expectations. you know, like an an example of this is, you know, when when we release features, sometimes I'll like screenshot myself using them on, you know, on X or on threads or whatever just to kind of talk about it. Um, but

recently, Tar, our um, you know, our devro guy, he actually codes a lot. Um,

he's amazing and he just started automating this. So, he's having like

automating this. So, he's having like quad code generate its own videos for for its launches and he just started doing this and, you know, this is something like I thought would be, you know, maybe it's possible. It's not

something I would have tried because I wouldn't have thought the model was ready, but he just he just did it and it just kind of worked.

>> One thing that I've I felt like just a bit like odd about and I think a lot of developers can relate is I've come to terms with this starting from Opus 4.5

the and and also similar models like I think GPT 5.2 gave me similar vibes as well. the models have been just really

well. the models have been just really good at writing code and I I realize that I don't think I will handr write the code when I'm get I when I want to get stuff done if if I actually want to

you know get the pleasure of writing I can still do it but one thing I reflected on is it's just been so much effort to get good at coding I I remember when I when I was learning when

I I started from like kind of hacking around to go into university to learning C and C++ and it it was just bloody hard and actually you know going through my my first few jobs where I started to become better at it. I became better at

debugging and there's a point where like a lot of my identity was tied to being good at coding. That's how we used to get jobs or higher paying jobs. When I

was an engineering manager when we designed the interview loop at Uber, we we had talk with managers of what we need to screen for and we we talk like well what do developers do most of their time? About 50% of the time they code.

time? About 50% of the time they code.

Therefore, we placed about 50% of the signal was all about coding. So there

was a lot of things tied into coding because it it is just hard. I think we all know that it takes grit. It takes

some level of intelligence to get good at it. And there's a sense of loss of

at it. And there's a sense of loss of like well I I think it's great on one end that the model can do it. But it

feels that something really quickly got taken away that I don't think I personally thought it would happen this quickly. And I'm

quickly. And I'm I think a lot of other people are feeling like some people move on a bit easier, but there's definitely this sense of of grief. How did you think

about it? Because again, you're you're

about it? Because again, you're you're an example of you you wrote so much code at at Facebook also outside of it. I

know it was just a tool of doing it, but not many people could do what what you did. And now the models can also work as

did. And now the models can also work as good as you have or if not better.

>> That's the challenge. Yeah. I think it's it's something that used to be a thing that we do as software engineers. It's

becoming a thing that everyone is able to do. There was a moment, you know,

to do. There was a moment, you know, like when I started coding, it was a very practical thing and it was a way to get things done. And at some point I just fell in love with the art of coding

and like languages and kind of the the the tools themselves. And at some point I I kind of fell down this rabbit hole.

I wrote this like I wrote I wrote a book about, you know, a programming language.

>> Typescript. You wrote the first ever TypeScript uh book at with O'Reilly.

>> Yeah. Yeah. Yeah. That's right. Um it it was funny actually. There there was this like there was this amazing moment for me in my little town in Japan. I went to the bookstore and I I found that book translated in Japanese.

>> No.

>> In this tiny town and that was just like the coolest moment. And then I actually realized I I don't remember Typescript at all cuz I was only writing Python for a couple years at that point. Yeah. And

like at some point I started the the first the the biggest TypeScript meetup in the world. That was in that was in SF. And I got to meet kind of a lot of

SF. And I got to meet kind of a lot of my heroes. There was like Chris Cowell

my heroes. There was like Chris Cowell who wrote like general theory of reactivity. There was Ryan Doll the guy

reactivity. There was Ryan Doll the guy that made Node. one of the first times that I I went really deep into this this community and um just the language

itself and the the tools themselves and for something like TypeScript there's this beauty in the type in the type system cuz Hilesburg is just like he he he's just brilliant like the idea of

like conditional types and just like anything can be a literal type and there there's these very deep ideas that even the most hardcore functional languages

do not have like even in something like Haskell like it doesn't go this far and H Anders just took it and he pushed it much further than than it had had been pushed and you know like Joe Pamer and a

bunch of other folks kind of explored a lot of these ideas and thought of this and I think for them it was also very practical right because they had these large untyped JavaScript code bases how do you gradually migrated to something

typed and you have to come up with these very beautiful ideas to to do this for me is Scala was another kind of rabbit hole that I fell into in kind of like this functional programming world And still when I write code and when the

model writes code I always think in the types first that that's what matters is what what is the type signature that matters more than the code itself and getting that right. So there is this beauty to it. There's a there's an art

to it for sure. But in the end it's a practical thing and in the end this is a thing that we use to to build things and you know it's a means it's a means to an

end. It's not an it's not an end to

end. It's not an it's not an end to itself. I I think one metaphor I have

itself. I I think one metaphor I have for kind of the this moment in time that we're in is the the printing press in, you know, like the the 1400s or whatever >> because at that moment it it was

actually quite similar, right? Like

there was a group of scribes that you know knew how to write >> and it it it was as I understand of course we never lived there but as as I imagine it was it was a art process to learn. You needed to learn you needed to

learn. You needed to learn you needed to get the equipment. You probably needed some sponsorship or being selected practicing because you needed to produce the same thing over and over again and few people could do that and I assume it

was either high prestige or highly paid or who knows let's assume it was >> but then the printed press came along.

>> Yeah. Yeah. And at least in Europe like you had to like a lord or a king or something had to had to employ you and then you had to go through you know years of training and there was this class of scribes that knew how to write.

They were employed by someone like this.

often the king themselves like or you know the queen was was not literate. So

it was this very very niche skill and it was like less than 1% of the population was literate in Europe you know back then and then the printing press came

out and what happened so the cost of printed material went down something like 100x over the next I think 30 years 50 years or something the quantity of

printed materials went up like 10,000x in the next 50 100 years this was the first effect literacy it took a little while for it to catch up so I think global literacy it went up to something

like 70%. But that took like another 200

like 70%. But that took like another 200 years, 300 years because learning learning to read is just very hard.

Learning to write is hard. It takes a lot of effort. It takes uh education system. It takes you know infrastructure

system. It takes you know infrastructure to have paper and ink uh and the free time to do this instead of working on a farm. So it kind of it took early stage

farm. So it kind of it took early stage of of of industrialization to actually get there. But I but I think this effect

get there. But I but I think this effect of making it so this thing that was locked away in ivory tower and now it's accessible to everyone. This is just, you know, like none of the things around

us would exist today without this. Like

if if we weren't literate, if the people that built, you know, this microphone weren't weren't literate, it would have just been very hard to have a modern economy. None of these things would

economy. None of these things would exist. And I I just kind of think about

exist. And I I just kind of think about back then if people had to predict what would happen when the printing press came out, no one would have predicted that the microphone would become a

thing. So, I I just feel like this is uh

thing. So, I I just feel like this is uh this is the best the best uh analog for for the moment that we're in right now.

>> Yeah, it's interesting that you say that some of the kings were illiterate who are employing the scribes because if we're being honest with ourselves, we have business owners who know what

they want to build and there are employing software engineers because they themselves cannot write code. And I

think we we like to mock the CEOs who are coming there coming to the team.

They they might even have a drawn prototype or whiteboard and saying this should be easy but of course they don't understand how difficult it is. There

seems to be a bit of analogy where where there's a person who wants what they want but until now they needed to hire a software a specialist who can build that and there's always that disconnect

between the idea and the person and just like with the printing press like what would happen if they could actually express and like the king could actually read or write their own letters they wouldn't need that middleman and it

things become more efficient. But I mean of course for the scribe it's not the best news necessarily but I mean smart scribes can also do so someone needs to like write the books run the press etc.

Yeah, exactly. And and if you think

Yeah, exactly. And and if you think about what happened to the scribes, right? Like they cease to become

right? Like they cease to become scribes, but now there's a category of writers and and authors like the these people now exist. And uh the reason they exist is because the market for

literature just expanded a ton.

>> And I guess also if we think about like back then a scrib's work was read by a few people and with the printing press and author there's a lot more authors and some of them are not really read but

some of them have wider reach than than they could imagine. There's new careers that that exist because of that.

>> Yeah, >> I love the analogy.

>> And the most exciting thing for me is it's just so impossible to say today what will happen after this happens and after this transition

happens just you know the the economy as we know it would not have existed without it. So what's next? like what

without it. So what's next? like what

what is the thing that we can't even predict today that will exist because anyone can do this?

>> Well, we cannot predict but I think we can look at what is working right now.

If you look around in your environment, may that be the team across entropic who are software engineers or or builders or members of technical staff, however we call them, who to you are stand out.

What are they doing? What skills have they built up? And and how have they changed the way they they work? It's

hard to name individuals because honestly this is just the strongest the these are the strongest people I've ever worked with in my career. There's all

sorts of different archetypes. There's

some people that are really amazing prototypers. Um so take something from

prototypers. Um so take something from zero to.5. Just you know figure out like

zero to.5. Just you know figure out like what are some cool ideas? What is the technology on walk? There's other people that are amazing at finding product market fit. So kind of 0.5 to one or

market fit. So kind of 0.5 to one or maybe 0ero to one. There's other people that span different disciplines and I I'm just seeing more and more of these people like I said like people that span uh product engineering and

infrastructure engineering or you know product and design or design and engineering. I I think I'm just seeing a

engineering. I I think I'm just seeing a lot more of these of these hybrids.

>> What's a belief that changed from last year to this year? Something that you know like you either believed or or a conviction that you had that you've either revised or completely threw away.

I think one thing I wasn't sure about is how big a problem is safety to be totally honest. Um I jo I joined

totally honest. Um I jo I joined Anthropic because like I said I read a lot of sci-fi and I kind of I know how bad this thing can go if it goes bad. It

wasn't something I was sure about. Um

but seeing it from the inside and then seeing how the new risks that have arisen in the last year, it just makes me much much more worried about it. Um

so I I think it's it was kind of an important thing for me. Now it's just the most important thing for me is how do we make sure this thing goes well.

>> I think it's safe to say you you were a really great software engineer even before all all the AI things started and you seem to be a very productive engineer of course part of a team as

well but but also individually. What are

some skills of like you know before being a software engineer that are are still as valuable or maybe even more valuable than before and what are ones that are maybe just not as much and and

they're best left behind probably. Okay,

so the stuff that's left behind is uh best left behind is maybe like very strong opinions about like code style and languages and things like this. Like

I I can't wait to get past like these endless language debates and framework debates and all the stuff because the model can just like you know use whatever language and framework and if you don't like it it can just rewrite it for you. So it just doesn't matter

for you. So it just doesn't matter anymore. I think something that still

anymore. I think something that still matters a lot today is things it's being methodical and hypothesis driven. This

matters both in product design in this world where everything is being disrupted and we need to figure out what to build next and this is something everyone is thinking about. Um, but it also matters for engineering day-to-day,

you know, like something like debugging.

You just have to be very methodical about it. And the model can can do this

about it. And the model can can do this and it can help a lot. Um, but I think still we're in this transition point where you still need to have the skill.

I don't know if you you're you're still going to need to have it in 6 months.

Other skills that I think are more valuable are being curious and being open to doing things beyond your swim lane. So, you

know, if you're working on engineering, but you really understand the business side, you can just build really awesome products. And I and I think the next,

products. And I and I think the next, you know, billion dollar product, you know, like after quad code, whatever the next startup is that, you know, becomes the next trillion dollar startup, it

might just be like one person that has some cool idea and their brain just is able to think across, you know, engineering and product and business or, you know, like design and finance and

something else. It's like it's people

something else. It's like it's people are going to become more and more multi-disipline and this will become more and more rewarded. So in in some ways I think this will be the year of the generalist. I think the other skill

the generalist. I think the other skill that's actually been been rewarded of it is uh having a short attention span.

>> I was being rewarded now. Oh yeah. It's

uh you know like people you know like teenagers are using you know like like Tik Tok and and all this stuff and I think in some ways it's kind of dangerous for society um because like you want people that can think deeply

and can contemplate ideas and uh aren't just moving on to the next idea very quick but in some ways I think this year is kind of the year that is going to

reward uh it's like the year of ADHD because the work for me has become jumping between quads. has become

managing clouds and so it's not so much about deep work it's about how good am I about context switching and you know jumping across multiple different contexts very quickly

>> could I add that from what I unders what all you said maybe we could add one thing which is adaptability because you're saying of course that ADHD and and you can jump across but of course

earlier you are very good at focusing deeply on one thing as well and what strikes me about you and maybe this is true for other people as well you you're just kind of very open to adapt ting your working style and seeing what works

well for this stage, especially when things are changing. I think the one certain thing we can be sure is whenever the next model comes out, it'll change again. And you need to be curious and

again. And you need to be curious and open to adapting how you work, right?

>> Yeah. And as closing, what's a book or books that that you would recommend?

I've gone down a rabbit hole. Um, so

he's the threebody problem guy, but he actually has like a lot of other really great books. I really love his uh short

great books. I really love his uh short stories. Um, he has a couple books of

stories. Um, he has a couple books of short stories. I'm a big fan. For people

short stories. I'm a big fan. For people

that are new to sci-fi and you want like a little bit like harder sci-fi, um I really love Accelerondo by St. This is a book I would totally recommend. It's

like essentially the product roadmap for the next 50 years. Um it it it starts with takeoff kind of starting to happen and kind of AI singularity and then it

ends up with like uh this kind of like group lobster consciousnesses orbiting Jupiter and it's just like amazing. And

the thing that I think it really captures is just the pace this like quickening quickening quickening pace of how this feels. It really matches the feeling right now. And then on the technical side, I would strongly recommend functional programming in

Scola. Even if language choice just

Scola. Even if language choice just doesn't matter as much anymore, I think there is this art to functional programming that just teaches you how to code better. Um, and it'll just teach

code better. Um, and it'll just teach you how to think in types. If you read this book, I think what's really important is to do the exercises also.

And I've gone through and I've done all of them probably like three times over and it's just amazing. It it really just like knocks this idea of functional types into your head and it's just a thing you can't stop thinking about.

>> Boris, thank you so much. This was

awesome.

>> Yeah, thanks Kirk. This was a really interesting conversation and the thing that I keep coming back to is to Boris's prickic press analogy. The idea that medieval scribes were this tiny elite

who could write employed by kings who themselves were often illiterate and that we soft rangers might be in a similar position today. We are the scribes. We spent years mastering this

scribes. We spent years mastering this craft. And now the printer press is

craft. And now the printer press is arriving. But what Boris told me is that

arriving. But what Boris told me is that the scribes did not disappear. They

became writers and authors and the entire market for written work expanded beyond anything anyone could have predicted. I do find this hopeful and

predicted. I do find this hopeful and also appreciate that Boris didn't sugarcoat it. The other thing that

sugarcoat it. The other thing that struck with me is just how differently the Cloud Code team built software. No

PRDS, no mandatory ticketing system, designers and data scientists and finance people all writing code and building dozens or hundreds of prototypes before shipping a feature.

And Boris is shipping 20 to 30 pore requests a day without editing a single line by hand. And there are different verification systems in place. Claw code

reviewing its code, automated lint rules, best of end passes, and human code review. If you've enjoyed this

code review. If you've enjoyed this podcast, please do subscribe on your favorite podcast platform and on YouTube. A special thank you if you also

YouTube. A special thank you if you also leave a rating on the show. Thanks and

see you on the next one.

Loading...

Loading video analysis...