LongCut logo

Inside How Anthropic Is Building the Next Claude | Alex Albert

By Peter Yang

Summary

Topics Covered

  • Context shapes whether AI should think hard
  • AI agents dream to consolidate memories
  • Focus planning on irreversible one-way doors
  • Coordination emerges as the new bottleneck
  • AI character matters as agents gain autonomy

Full Transcript

I was definitely the first prompt engineer at Anthropic. I might have been the first in the world. We treat the model as if it's a product to some degree. With every new model, we are

degree. With every new model, we are specking out exactly what do we want this model to be good at. When the agent isn't running a task for you, or maybe it's in the background, it's actually going through its memories, finding things that [music] might contradict,

pruning them, cleaning them up. This

concept of dreaming. Engine time isn't as much of a one-way door these days. If

it's something that's not a one-way door, that's effectively free at this point. Agents that are running on task

point. Agents that are running on task for a long amount of time and they're having to make a lot [music] of judgment decisions. The questions of what its

decisions. The questions of what its character is and what it cares about are very important. Did you have to Did you

very important. Did you have to Did you have to try to avoid consciousness when you were training it and stuff? This is

a big question.

Alex, it's great to have you here today with Clock Conference. And you used to be head of DevRel at Anthropic, right?

And you recently became product manager for the research team, right? Is that

it? So,

I've been a PM for like over a decade now, too. As a PM, you kind of try to

now, too. As a PM, you kind of try to understand the user problem, you try to identify solution, you try to build stuff. But, you know, I have no clue

stuff. But, you know, I have no clue about how a PM works on a research team.

Maybe we can talk a little bit about that.

It's very similar in that sense.

Um I'm always wanting to talk to customers, get as close to our users as we can.

Uh And we we treat the model as if it's a product to some degree. So, with every new model, we are specking out exactly what are the requirements for this model? What do we want this model to be

model? What do we want this model to be good at? Uh what do we think it's going

good at? Uh what do we think it's going to be good at? Because that's that's actually part of uh the interesting part of model development compared to product development is that in many ways we're

growing the model and we have intuitions based on training setups, techniques, things that we've made in terms of the architectures, whatever decisions we've we've made for that particular model.

We've intuitions about what it's going to be good at, but we we actually don't know to the full extent until it's in that training process. Hm, okay, I got it. But, yeah, the research PM team

it. But, yeah, the research PM team attaches to these models early on from their like ideation phase and uh kind of follow along with the whole journey all the way through training up until

launch. Okay. Can you give me like some

launch. Okay. Can you give me like some examples like next model has to be good at coding or has to be good at knowledge work, or is it some other things like more broad than that? Yeah, I think that's right. There's like

that's right. There's like there's like buckets of capabilities that we care a lot about. Of course,

coding has been a big one, right? Um

knowledge work recently has been a big one as well, so uh with some of our recent models uh we've tried to make them really good at like working with our products like client for Excel, making spreadsheets.

Um so, that's been more of an emerging new uh area of capability.

So, there is that side of things, but then there's also the the side of on every model we want to fix and improve on the things the last model didn't do so well. Mhm, got it. So, going out,

so well. Mhm, got it. So, going out, talking with our customers, trying to get a read on, "Hey, how are you finding this model? Where is it like excelling

this model? Where is it like excelling for you?

Uh what ways is it falling down? What

are fixes we can make?" If we notice some like really interesting behavior, are there are there tweaks and interventions that we can take in training on the next one. And your

customers both like the Clockwork team and internal teams and also users? 100%.

Yeah, so it's it's everybody. Um that's

kind of uh one of the cool parts about being um and working on these models is that they just touch so many different surface areas. So, as a research PM, you need to

areas. So, as a research PM, you need to think about how this model is going to be exposed through all our surfaces, whether it's API or it's Cloud Code or

it's Co-work. The product has a somewhat

it's Co-work. The product has a somewhat of like a blend with the model, and that affects your actual end users' experience, so you need to think through that entire process. The product because because of different like different

prompts Exactly. Prompts.

prompts Exactly. Prompts.

it. Use cases. Got it. Uh ways that people are are kind of using the model within that surface. Yeah. It all has an effect. Man, that that's like really

effect. Man, that that's like really hard because like uh like for example, Clockwork, like you can say it's for coding.

But there's people like me just using it for like knowledge work or like even as a therapist. So, do you even know?

a therapist. So, do you even know?

Broad space. Yeah, things. Yes. Yes.

Thankfully, we've got a whole lot of amazing researchers that span this this entire range of capabilities and they all focus on different things. And then you probably

different things. And then you probably like cuz like a lot of people like there's millions of people using Claude, you probably have some sort of like they can give you feedback in in Claude 2, right? And you probably have some sort

right? And you probably have some sort of way to get themes out of things.

That's right. Otherwise, it's like a firehose of feedback. Like how do you even I mean, we we do a lot of things here.

Um and this is actually one of the interesting things that's changed over time as I've been in this role is our increasing use of Claude to help us as PMs. Um

We There's a million things we can talk about here, but in in terms of like feedback collection, it's been insanely helpful to me for getting uh for getting insights into large amounts

of this data. So, when we have a ton of feedback coming in from certain channels, we can use Claude to group and cluster things, find top thing uh themes, create synthetic versions of that problem so that we can see if we

can turn it into an eval or some other way to like actually diagnose what's happening. Oh, interesting. Yeah.

happening. Oh, interesting. Yeah.

There's a lot that you can do with Claude to help you in your own identification of Claude problems. Do you have like a specific example of that or something from your from the past model or something? Well, one that's that's pretty relevant right now is

uh how we handle feedback on new features. So, one of our our newest

features. So, one of our our newest features with the past few models has been adaptive thinking. So, previously

we had extended thinking, which just allowed the model when you turned it on, um the model would just think. Adaptive

thinking lets the model choose when to think. So, on some questions, it's going

think. So, on some questions, it's going to choose to think because it's a complex hard question that requires, you know, more upfront planning.

On some, it might not choose to think.

This has been a feature that we're continuously tweaking and dialing model over model, and we really, really listen to a lot of feedback from users on is it thinking correctly in the right settings for you? Got it.

for you? Got it.

Are these questions that you want it to spend a lot of tokens, you know, reasoning about? Are they actually

reasoning about? Are they actually triggering that thinking within Claude?

Interesting.

Yeah. Yeah, sometimes sometimes when it like gives me a response too fast to like some life questions that I have, I'm like [laughter] it's a little bit disappointing actually because I was hoping that you would

think more deeply about this. Right.

Um I think the the problem with thinking is sometimes there's actually a lot of context that goes into the decision whether to think

hard on a on a question or not. Okay. Um

for example, like if I was talking to a complete stranger and they said like, "What should I be doing right now?"

Maybe I'd just like quickly give them an off-the-cuff response cuz I actually don't know that much about them. I'm

just going to like provide them a pretty generic answer around like, "Oh, you should focus on this and do this and this." But if I actually knew you as a

this." But if I actually knew you as a person and I know what you care about, what your interests are, what you've done before, that's going to cause me to actually spend a lot more time thinking about

wait, what actually is like the best solution for me to provide here. Make it

make sense. Yeah, so I think there's a similar thing with models where if they don't have that context built up, if they haven't really uh built that mental model of it who the user actually is, Yeah. then their decision whether to

Yeah. then their decision whether to think hard on a problem or not could be incorrect because they just actually just don't know. That's such a really good point. Like, do you also work on

good point. Like, do you also work on the memory feature that Claude has or Yeah, memory is definitely a big feature on the research side. Cuz I'll I'll tell you what I do. Like, I just have a Google Doc where I've summarized my life situation

like my household, all my kid his name is I was sharing too much and then like, you know, what gives me energy, my you know, like what doesn't give me energy.

And then I just attach it to like a Claude project.

Yeah. And then just by doing that, it gives me like way better answers to my questions. Yeah. Yeah. But how does the

questions. Yeah. Yeah. But how does the default memory work? Like I I guess every night it kind of like compiles Uh, everything? Yeah, um it it depends on

everything? Yeah, um it it depends on the product surface. Uh, they all have uh, memory implemented in different ways.

So, for example, in um, Claude.ai, it it writes to a memory file, um, and then it has things where it overnight will like prune those

memories, look at them again, and we actually just implemented a similar thing with managed agents. So, this this concept of dreaming. Dreaming in humans is is I I guess it's somewhat unknown as

to what its purpose is, but like some some folks say that like maybe it's like a memory reconsolidation process, right?

Um, we're like, okay, how can we bring something like that to Claude's memories? So, when the agent isn't

memories? So, when the agent isn't running a task for you or maybe it's in the background, it's actually going through its memories, finding things that might contradict, pruning them, cleaning them up, kind of doing that second pass, which I think is really

interesting. So, basically like it it

interesting. So, basically like it it kind of just like does some sort of just like dumb it down a little bit, there's some sort of prompt up like, hey, review all the conversations the user had with this Yeah.

you and just like try to identify themes. Exactly.

themes. Exactly.

Like summarize. Yeah. Yeah.

Interesting.

This episode is brought to you by Oceans. I hired someone through Oceans

Oceans. I hired someone through Oceans for podcast post-production a few months back and can't imagine running the podcast without his help. He's

proactive, picks [music] up new tools fast, and uses AI to compound everything that he ships. Oceans doesn't just place assistants, they place operators.

[music] Their talent is AI fluent and delivers the same output as a senior US hire at three to five X [music] less cost. They reject 99% of applicants, so

cost. They reject 99% of applicants, so the person who lands on your team is already operating from day one. If

you're scaling and need marketing ops, finance, or EA help, I highly recommend giving Oceans a try. Check it out at oceanstalent.com/peter.

oceanstalent.com/peter.

Now, back to our episode.

Let's switch back to talking about product management. Mhm. Yeah. So, so

product management. Mhm. Yeah. So, so

before we start this thing, you said you're always trying to find the latest bottleneck, right? So, so I guess

bottleneck, right? So, so I guess in the whole product development process, like uh, which parts are the having most streamlined and which parts are syllabic? Yeah, I think

syllabic? Yeah, I think the process of shipping something was pretty stagnant for the past

I don't know, 20 years. Like we've had incremental improvements and like things have definitely made it more efficient to some degree and some like new organizational structures have come and gone in terms of like, you know, sprint

processing and planning and like we've tried different things to like make things go faster, yeah. Yeah, mhm.

Fundamentally, like there's not been too much that have like compressed the like different windows that actually form up like the bulk of your your product development process until the past like

year or two. And now, all of a sudden, we're in this like paradigm where the cost and the time required to produce something is pretty low. You can spin up

prototypes, you can even spin up now MVPs of initial things that you might ship to production in, you know, a day instead of two, three, four weeks. Yeah, it's like it's like Claude telling me it'll take a

week and then it spins up being like a Yeah, exactly. I know, right? It doesn't

Yeah, exactly. I know, right? It doesn't

have like it's still Claude itself is still stuck in like the the the olden days of 2021 or whatever. And I think that's like really does an interesting

twist on like the whole product development life cycle in terms of okay, as a PM, how do I think about like my my planning? If I'm writing a PRD and I'm like scoping out these requirements and I'm trying to get like a end

estimate on something. Yeah. What does

that actually look like now? It's kind

of a waste of time, right? Like do you still do that stuff like point estimates and stuff?

It depends. Some some I think like some projects have more considerations than others and of course it just is like a function of like the scope and the complexity of it.

Okay. Usually what we're always trying to get to is like what are our our one-way doors? So So what are our

one-way doors? So So what are our irreversible decisions?

Got it. Because those are the ones you want to focus the most amount of time on. If it's something that's not a

on. If it's something that's not a one-way door, like we do it but we can reverse it, that's effectively cheap and that's effectively free at this point.

Got it. Cuz end time isn't as much of a one-way door these days. Yeah. But if

it's something that affects an end user experience, can affect a decision that we have to make later on down the road, or it's like a physical thing that we literally have to buy or do or whatever it is, Yeah. um those are much harder to

reverse and and that is something we want to put a lot more time and thought into. And like I'm going to ask you for

into. And like I'm going to ask you for an example of that again, maybe on a research side or Yeah, um for example, if we're thinking about new models, picking a model architecture before it

starts pre-training or anything like that. That's a big decision. Like model

that. That's a big decision. Like model

timelines can be, you know, a month long at some cases in which you want to be like training a model. So we

need to put a lot of time and thought into like what is the optimal choice here. Um Models have a lot of more of

here. Um Models have a lot of more of those like one-way doors to some degree just because they require a lot of time, intensity, compute, everything to actually get them out in production

compared to like building a new feature in Cloud Code.

That's quick. That's just like a process of like iterating code, thinking about putting it into users' hands, getting quick feedback and and um and cycling on that. So yeah, the process still differs

that. So yeah, the process still differs depending on like what type of thing you're thinking about shipping. Um

But yeah, I think increasingly the bottleneck that we're shifting to now is like more of the the coordination problem. So if we can build things

problem. So if we can build things really fast, there's still a we need to get these people in the room and decide if this is the right strategy problem. That's right. Um we need to

problem. That's right. Um we need to figure out how we're going to communicate this to our users. We need

to figure out like what are all the other fuzzy things that come along with any sort of launch. Yeah. And those are still areas that we're looking for Cloud to help us with, but it hasn't been as,

you know, 10 100x speed up like it has in code. Got it. Yeah. I see. So, like

in code. Got it. Yeah. I see. So, like

let's say when you're launching like Opus 4.7 or something, like you still need to put together like a doc with a plan. Still need a plan. Yeah, you still

plan. Still need a plan. Yeah, you still need to think about how we're going to like message this and like the models definitely do exist on like a jagged frontier. So, we're using Claude

frontier. So, we're using Claude everywhere we can.

Um I would say in coding it's having the biggest impact right now. And in these other areas, there's still an element of like human strategic thought that needs to go into it. Got it. But, when you have the review meeting with like your

like, you know, marketing or your peers, like do you have Claude open? Oh, of

course. Yeah. Oh, yeah. I I think like a huge speed up for me has been that I'm I'm actually not blocked as much on getting to answers and getting to data.

Whereas before, let's say I had a question of like, oh, how is this feature doing in production? Yeah. How

many users are like using it every day and what's their feedback? That's going

to go require me asking somebody on our data science team to like kick off a full investigation and come back a few days later with like results. Now, I can do it in 10 minutes and just start a session with Claude code, has access to

our, you know, product databases. It can

like go look at the logs, figure it out, go look through our Slack. That is like a major speed up for me as I'm actually going through the process of like strategic thinking on something. I'm not

like blocked before I make my next decision. You can get the inputs much

decision. You can get the inputs much faster and synthesize it. Exactly. Yeah,

we can we can get the inputs much much faster now. But, even with the strategic

faster now. But, even with the strategic thinking, do you like can you just like build out a skill or something to ask a bunch of questions to help you think through the stuff? Oh, of course. Yeah,

yeah, yeah. I think like Claude to me um has been like the best brainstorming partner in the world. I'm I'm able to like all of a sudden, at any second, get

feedback on an idea that I have. That's

right. Yeah. Um I think that is like super super powerful, especially when you do want to move fast. And like uh people are doing a million things at Anthropic. Everybody's busy. So, having

Anthropic. Everybody's busy. So, having

that immediate access to something that can give me feedback on or criticisms on a doc I wrote or whatever it is is like really, really helpful. I personally

like cuz that's probably the most common I mean let's let's admit it. It's

probably the most common PM loop. You

have a doc and you want some feedback.

Yeah, oh yeah. Oh, yeah. Do you use like Claude code to do that or like you just like use Claude AI? Yeah, a lot these days I'm actually using Co-work a lot.

Co-work, okay. I really I really love the form factor of Co-work. I think it's just like a really nice interface and the team has done like an amazing job over the past few months from shipping it back in just like a few months ago to

now getting it to the place where it's at today where I find it to be like a really high-quality experience. Okay.

So, Co-work's been a an awesome tool, one of my favorites for sure. So, you basically have your draft

sure. So, you basically have your draft doc and then you have a bunch of reference material or like you have like some sort of do you have like some sort of skill to ask you to think through like the whole decision-making process?

That's right. Yeah. Got it. Okay.

Yeah. Like, all right, think through this from like the perspective of X, Y, Z. Like, what sort of questions would

Z. Like, what sort of questions would you have here for me? Or challenge my assumptions.

Challenge the assumptions I'm making here. Like, where are my arguments weak?

here. Like, where are my arguments weak?

Got it. Um Got it. I think like a lot of that thinking can't be fully outsourced because through the process of writing you are thinking. Um and you need to to write these things to get your thoughts

out and to start to like mull over it in your brain a little bit. But, Claude can help you unstick yourself and attack things from a different angle than you might have been able to on your own. I

like to like give it like two different personas, like two different points of view, and have it like argue with it itself. And then I I just read the

itself. And then I I just read the transcript and it helps me think too.

Yeah, right. You can see like, oh, this person brought up or this Claude brought up this point and then like it was countered with this point. It's like

watching a debate in like real time.

It's pretty cool. Even on the research team, are you shipping to code or Yeah, so the question is it depends. I think a a bigger part of the things I'm shipping is on evals in

particular. So, I want to make sure that

particular. So, I want to make sure that I'm able to measure the model on the things that I care about and communicate those findings on where the model is good, where it's falling down, back with

our researchers on our research team.

Got it. And then we conjointly come up with the strategy of how we want to tackle it and what sort of research interventions we want to make and what will be the most effective to actually hill climb on that eval and improve on the problem.

And like the evals aren't like terminal bench or something, right? They're like

actual cuz I I feel like all all those like things can be like somewhat like gamed. Mhm. So, [clears throat] are the

gamed. Mhm. So, [clears throat] are the evals more like Tell me about the evals you How how do you eval a model? Yeah,

how do you eval a model? Um Is like different like buckets like personality Oh, for sure. Yeah. So, I mean, let's take something like we want to test Claude's vision ability. Like can it

count the number of things in this image? Oh, like image recognition. Okay.

image? Oh, like image recognition. Okay.

Yeah. For that, you could have like okay, I found this image. It seems like Claude can't count things with more than 10 elements or whatever whatever. Like I

think it can, but let's just for example say that. I can take that and now think

say that. I can take that and now think like okay, how am I going to get more test cases of this type to really prove out my hypothesis here. So, maybe that's Claude is generating synthetic data for me. Maybe it's going to like render

me. Maybe it's going to like render images for me and then I'll pass those back into Claude as a visual and see if it can like actually identify it. Or

maybe I'm I'm um finding examples from the internet or whatever it is. Uh any sort of other sourcing mechanism you can think of to like generate those test cases. And

we're talking about like we're talking about thousands of test cases, right?

Could be, but sometimes it's even as low as like dozens can prove pre- can prove that there's something wrong that we need to fix with the model. It really

does not need to be super comprehensive to prove and have something that you can hill climb against. So, let's say you give it like 10 images and then it can't recognize the little tiny numbers.

Right. So, then so then what do you do?

Go to the research team and be like hey, like this is the problem. Like can you Yeah so So, So, we think about it in a few ways.

I mean first we want to think about beyond just the fact that there's an issue with the model, how is this going to be valuable for our customers and our use cases? Um because Claude can or

use cases? Um because Claude can or can't see something in that image, how does that actually affect downstream what somebody's trying to do with Claude?

So, the the more realistic and on distribution to the actual like task shape that somebody an end user would experience, the better. So, we're going to try to get things of that nature.

We're going to try to make sure that we can get data into that flavor.

And then there's a range of interventions. Maybe we need to go back

interventions. Maybe we need to go back into pre-training and look at things.

Maybe we can do address it in RL. Um

that's where the like strategic uh brainstorming comes into play with the research team on like what's the best thing to do here. How fast is the turnaround to try it again? Like yeah,

it depends on what what we're thinking is, I guess. Yeah, it it depends. It

depends like where, you know, if it's if it's something later uh that we can address in a new RL environment, maybe that can be spun up very quickly. And

when you try to tie it to real customer use cases, I mean there's like millions of people talk to Claude every day, right? So,

maybe maybe you can like try to synthesize into like other people are trying to use this for tax prep or something. Yeah.

something. Yeah.

Cuz there's like uh thousands of use cases. How how do you How do we pick out

cases. How how do you How do we pick out ones to particularly we want to get better on?

Yeah, how do you convince the like the team like, "Hey, this is what we should actually improve on." I mean this is where data wins the day, right? Um it's

all it's all just about like X% of our users trying to do this. We really care about this. Yeah. We have customers that

about this. Yeah. We have customers that are using this much amount of Claude and they want to get better at this. Um or

and this is a largely what a lot of our our processes is is um driven by is like, "What do we care about internally when we're using the model?

I'm using the model. I'm running into this blocker every day in my own work.

We should fix this." That's really compelling. So, there is an element of

compelling. So, there is an element of that as well. One of the best things I love about Claude uh is the model's personality. I think it's even gotten

personality. I think it's even gotten better over time. Uh like it will push back on me on the right spots.

And you know, some of the other models are just like, hey, you know, what else can I do? Like they're like they're very sociopathic right?

Mhm. So, [clears throat] um the model's personality is is just like a it's not just a prop, right? There's

there's like training involved. Yeah,

yeah, a lot of training. This is a huge focus of ours. Um so, Claude's character, as we call it. Um

think this is very, very important. We

have a lot of folks that are like dedicating a ton of time towards figuring out what how should Claude represent itself? What

are its beliefs? What are its values?

Um how does it behave? Yeah.

These are like really fuzzy questions and early on I think some people kind of like disregarded them because it was like ah you know, this is just a thing I tell it what to do and it goes and and does it.

Why do I need to care about how it sounds or what it thinks about? But

increasingly I think as we move into the world of these things being agents that are running on task for a long amount of time and they're having to make a lot of judgment decisions. Um

judgment decisions. Um the the questions of what its character is and what it cares about are very important. And it's not like a code like

important. And it's not like a code like does it run or not, right? Like how do you evaluate the personality? Like the

character? Yeah. So, you're trying to find a nicer person inside of Anthropic you sort of compare [laughter] sort of compare that person. Yeah, we

have very special people that have been appointed the judge. No.

Um um there's a there's a combination of things. You know, there's like

things. You know, there's like quantifiable metrics that we can look at and we can have Claude look at at Claude's outputs and see like, all right, how does how does it sound?

Yeah, yeah. And this is a very important skill for any researcher, just read transcripts and be like, oh, I see it's it's doing this now or it's doing this now and detect subtle differences there.

And over time you do kind of develop this sharper intuition as you just read, you know, hundreds and thousands of like model transcripts and Yeah, you kind of know, right? Yeah, you know, it's it's

know, right? Yeah, you know, it's it's just like playing around with the model a lot through Claude.ai. You get a sense of like how it feels. Okay, so it's not like you know, this model is like seven out of 10 on blah blah blah. It's more

just like getting a sense. It's a bit of both. Yeah, I'd say it's a bit of both.

both. Yeah, I'd say it's a bit of both.

Got it. Okay. Interesting. It's harder

to like um quantify this maybe than it is to like quantify coding performance, but there's like ways. There's ways to do it right?

Yeah. Yeah.

Very interesting.

And how about um people who want to learn how to build products and become PMs like the AI the AI native way, you know? Mhm. Like like

what kind of advice do you have? I mean,

I think like the the the best like simple nudge I can I can provide is just like try it out. Like it sounds so simple, but like whenever you're

about to do something, you have a hard problem and you're like, "Okay, I'm going to go do this. I'm going to go ask this person

do this. I'm going to go ask this person this question." Maybe

this question." Maybe in parallel to doing that, you ask the same question to Claude. Just compare.

Just compare the results.

Um for example, if you're like yeah, trying to trying to run an analysis on your users and like pull out top themes of like what customers are caring about with your

like latest feature that you shipped and you're going to go ask your your data science team going back to that previous example, like what are users thinking about this or your UX or researcher, whoever.

Um do that because I think there's still a ton of value in like working with a person on that. But in parallel, also just try sending that question off to Claude.

You know, enable a few tools for it to like go explore. And like let it the time to like really dig into that problem. And then just compare.

problem. And then just compare.

You'll get like a great sense of where the model can excel and where it still falls down. And

through that process over many many prompts and questions, you kind of like build up your own map of like what I should be using Claude for. Like where

is reliable and where where is not, right?

Exactly. Yeah. Yeah, I always like to like I always ask it to do deep research when I'm trying to make a decision cuz like the the normal web searches isn't good enough for me. I got to got to got to do deep research.

Yeah. Yeah. Having to like scan through 1,000 web pages, you know, something just superhuman. Yeah, and they probably

just superhuman. Yeah, and they probably inside on topic if you just go to a DS and be like, "Hey, can you do this for me?" They'll probably just tell you,

me?" They'll probably just tell you, "Hey, what what did you ask Claude for?"

Sorry what [laughter] Oh, yeah. Yeah. There is an element of

Oh, yeah. Yeah. There is an element of that, you know. It's like, "Hey, Someone's expected, right? That you ask Claude first. Yeah, I mean, I think it's

Claude first. Yeah, I mean, I think it's just like we're moving up the abstraction layers, right? Like it's

right? Like it's it's not it's it's worth it for that data science trust person to be thinking about problems at a different level now than just like having to go do like manual retrieval work of like the

Cuz that was sucks. That's what no one wants to do that stuff.

Yeah, everybody wants to be thinking about like a harder the harder questions, the strategic questions, like Yeah. Um how can we measure this in a

Yeah. Um how can we measure this in a whole new way? What are like new things to be doing here instead of just like, "Oh, we need to find what our latest uh DAUs were for this product." Talked to the DS I work with. A lot of times they

get stuck doing the basic SQL stuff. And

they all want to do the strategic stuff and now AI can finally unlock them.

Completely to do it. Yeah. Yeah, it's

like we're enabling everybody that like surrounds them in that way. And that's

the same for for all these roles. For

like for example, scoping out new features.

Um in the past, if you're a PM and uh whether you're technical or non-technical, you often don't have enough time to like go dig into the code base and figure out exactly how we need to implement this new feature, figure

out like, "Okay, this is going to take X amount of work. We'll have to overhaul this system, but actually this is a limitation." Yeah. Yeah. And it was just

limitation." Yeah. Yeah. And it was just better to to work with an engineering partner and figure that out with them. But now I can actually send Claude off to go do that investigation for me. And [snorts] you can be like, "Actually, this feature it

just requires like 10 lines of code change here and we just need to like flip this uh simple flag in this gate."

Yeah. Yeah. And it's like, "Oh, okay, that actually leads to like a completely different way of how I'm prioritizing this like decision." Um so now as I'm like specking this out, I can get to that sort of uh

uh prioritization much faster. Yeah,

that's huge, man.

Couple more questions. I think um at a lot of like legacy companies, they do a lot of time like planning, like annual planning, and like, you know, quarterly planning, and like road mapping. Yeah. And maybe maybe that's

mapping. Yeah. And maybe maybe that's actually more true on the research team, right? Because you guys got to think

right? Because you guys got to think longer term than like shipping stuff every day.

Like, did you guys do you guys do that stuff or We do. Yeah. There is still an element of like unpredictability, of course, with models. So, plans it's like that famous like

you know, Churchill quote of like plans are indispensable, but like planning is useless. That's right. That's right. Uh

useless. That's right. That's right. Uh

so, like, you want to think through it, but you also have to acknowledge that Yeah.

your plan might go out the window. I

think it's planning is indispensable, but the plan itself is useless. Yeah.

Yes. Maybe that's what it is. Yes. Yes.

The planning the act of planning is is is important. I think one of the

is important. I think one of the toughest challenges of PMs is like, how much do you spend time planning? Right.

Cuz like it's always balance between planning and trying to ship something.

Yeah. You know. Yeah.

So, like, is there some like um inside of Anthropic, is there some best practices around this stuff or Yeah.

Yeah. Cuz you can literally spend like, you know, writing 10 pages with Claude.

Yeah. Right. Yeah. Yeah. Yeah. Um

It's I think it's hard to like uh Is it team-dependent? I think it's it's it's

team-dependent? I think it's it's it's it's like product-dependent, and it it's hard to maybe like assign a blanket statement across the board of like you need to produce docs of of this length

of this much like we definitely don't do that. Okay. Um

that. Okay. Um

We really it's more about like have you done enough thinking to think through all the possible one-way door implications of this decision? Got it.

If we've hit that, it doesn't matter what the format of your doc is or how many pages you've got or whatever else.

It's just like, do we feel good enough that like we aren't missing something here and we can move forward, and we can address things as they come up as long as like Yeah. there's no long pole here that's going to be like blocking us.

There's no like one-way door could be like really really bad.

Yeah. Yeah, got it. Okay. Another

question is you know when I go with Cloe at home, I have all these different projects going on and like I'm like constantly context switching between different projects to get stuff to build stuff. I'm I'm

wondering if that applies like PM work here too. Like do do you have like

here too. Like do do you have like different projects?

Yeah totally.

Yeah, there's many different projects.

have to wait for the agent to work, right?

Yeah. Yeah. Yeah, you do.

Um So then what's the next thing? I

think there's there's actually a huge opportunity here Okay. um in terms of how do we as we move towards you're managing agents and they're doing bigger and

bigger chunks of of work for you and now you can start many more projects in parallel. Yeah.

parallel. Yeah.

How do we think about that context management problem for yourself?

What's the best like way to expose this as an interface?

Um how do I keep track of like what's actually important, where my agents are being blocked, where they need help from me? Yeah. Um Something better than just

me? Yeah. Um Something better than just like a little list of chats.

Exactly. Yeah. There There feels There feels like there's an opportunity in there. I think it's too early to say

there. I think it's too early to say exactly what it is, but we're seeing tons of experimentation even within Anthropic around what does that look like. Also people just like

like. Also people just like like engineers just like build prototypes. Oh yeah, we have a huge huge

prototypes. Oh yeah, we have a huge huge prototype culture in inside the company.

Yeah, people are building stuff all the time trying to share things. Yeah, so

you're going to have a agency cuz no one is asking you to build those prototypes, right? It's just like do Yeah, I mean

right? It's just like do Yeah, I mean that's been that's been one of like the the coolest [clears throat] things to see just working here is is the amount of agency that every everyone has across the org. From sales to recruiting to

the org. From sales to recruiting to engineering to research, like everybody's is very agentic in that sense and they'll they'll start doing things that they weren't just assigned.

Yeah, you got to let the thousand flowers bloom. Exactly.

flowers bloom. Exactly.

Got it. What are some interesting I know Dario writes like super long essays to share in Slack, but what are some other interesting you know, co-cultural things that happen at Anthropic? Well, there's there's a

at Anthropic? Well, there's there's a few things. Yeah, there's a few things

few things. Yeah, there's a few things here.

Um Dario's way of of writing these long essays is is um not necessarily unique to him as well. Like there's a lot of folks within

well. Like there's a lot of folks within Anthropic that put a lot of time and effort into writing. We've a really strong written culture. Um we've a lot of folks that uh write docs about things

and are writing long Slack messages and communicating in that way. Okay. We have

a fun thing that we do in a lot of our meetings, too, which I think is is somewhat common, but I haven't seen it everywhere, uh where people come in with a doc and we'll spend a good amount of time up front

just like kind of communicating on the doc. And like it's a little funny at

doc. And like it's a little funny at times cuz like the room will be silent.

You have a ton of people in the room.

And yeah, we'll do a silent read and like we'll really like write out long discussions within the doc, comment on things, and do all that. Um

So, I think we have a very doc-heavy which I like because it is a way I like to operate, but also it's very beneficial for Claude. Yeah. Um when

everything is being written down, Yeah.

we have now this like corpus of information for Claude to go off of. So,

I actually encourage organizations out there to move towards thinking about how you can get all your tactic

knowledge into written forms, whether that's through transcribing meetings, whether that's through encouraging more writing about like workflows and onboarding processes and everything like that. Yeah.

that. Yeah.

Get things written down. Make them

accessible to Claude. Cuz that's just more context that it has.

Exactly. That's that's that's very interesting. So, you still have a very

interesting. So, you still have a very strong writing culture and docs even though it's like a lot of stuff ships up now. Just like like you know, why call

now. Just like like you know, why call myself? I I just get Claude to generate

myself? I I just get Claude to generate all the MD files. Yeah. But like I I I still read through it, but like Yeah.

[laughter] Yeah. But like like it's like working

Yeah. But like like it's like working inside a company is very different. You

still have to think through things and like Yeah. Yeah, we want to do. Yeah.

like Yeah. Yeah, we want to do. Yeah.

Yeah, yeah, makes sense. So, on the research team, every like people talk about like AGI or whatever and I feel like it's very vague what that is, but I feel like one thing I worry about is

just like when these models actually have some sort of consciousness and then they actually like I if I tell them to do random work, they'd be like, "No, I don't want to do this."

And then like that'll be the end of humanity dude.

Well, what what do you think?

Do you should have you tried to avoid consciousness when you're training this stuff? This is a big question. So, we

stuff? This is a big question. So, we

actually do have folks that are Think about this? Think

about this. We have we have a few folks now whose whole job is to think about um what does it mean for Claude to be a a conscious actor and a conscious agent.

Um there's no official position on whether Claude is or isn't yet. Um

and I think even talking about it can sound a little crazy at times, but it is something we're we're putting a lot of thought into.

Um and there's a lot we can learn from it outside of even determining whether Claude is conscious or not in terms of just like how it interacts and how it behaves. How it thinks, man. How it

behaves. How it thinks, man. How it

thinks. If you go into like our model cards for our models, these are like in my opinion just treasure troves of information. Um and you'll see a lot of

information. Um and you'll see a lot of work that we do in terms of trying to quantify how does Claude act in this situation.

What is its mental model?

Um if it's being presented with this scenario, is it going to do X or is it going to do Y? Yeah.

So, through this process of just thinking about the way Claude thinks, we actually learn a lot that can actually translate that into product experiences as well. Make a Claude that's actually

as well. Make a Claude that's actually better to interact with, better to use.

I see. I see. I think this is a really interesting question and there's like downstream long-term effects and there's near-term things that we can bring to our own experiences as well. Yeah, cuz

like I feel like we're going to trust the model for longer and longer work Yeah. without human supervision Yeah.

Yeah. without human supervision Yeah.

and it's going to make a bunch of decisions along the way that you probably have no oversight on. So, you

know, what what it does is actually pretty important.

It's very important. Yeah, if this thing is is writing all your code and deciding which like you know, database system you're going to use and like making all these architectural decisions, you kind of

want to trust it to some degree. That's

right. So, it is important that it has that that high character like we were talking about earlier. I'm glad you guys are thinking about that cuz I just like dangerously skip permissions all the time. I'm glad

time. I'm glad [laughter] I am. Yeah. You know, use auto mode.

I am. Yeah. You know, use auto mode.

It's a little bit It's a little bit better now. Yeah. All right, Alex.

better now. Yeah. All right, Alex.

Thanks so much, man.

Thank you. Good chatting with you.

Loading...

Loading video analysis...