Building more effective AI agents

By Anthropic

Summary

## Key takeaways - **Coding unlocks agent capabilities beyond code**: An agent's ability to code is a foundational skill that enables it to perform a wide range of tasks, including web searches, planning, and generating complex artifacts like detailed diagrams more efficiently than direct manual creation. [01:30], [02:50] - **Claude Code SDK streamlines agent development**: The Claude Code SDK provides a pre-built agent loop, tools, and functionalities, allowing developers to focus on custom business logic and specific tools rather than reinventing core agent infrastructure. [03:50], [04:23] - **Skills equip agents with direct resources**: Claude Skills extend the concept of Claude MD files by allowing agents to utilize any file type, such as PowerPoint templates, helper scripts, or image assets, providing them with necessary resources beyond just instructions. [05:32], [06:08] - **Multi-agent systems offer parallel processing power**: Multi-agent systems allow multiple Claudes to work on a problem simultaneously, delegating tasks to sub-agents that can operate in parallel, leading to faster results and better overall answers, similar to a group of people collaborating. [09:56], [10:27] - **Simplicity is key in complex agent systems**: Despite the potential for complex multi-agent systems, it's crucial to start with the simplest possible solution and gradually add complexity only as needed to maintain observability and efficiency, avoiding 'dead weight'. [09:03], [09:30] - **Tools should mirror UI, not API**: When creating tools or MCPs for agents, they should be designed to mirror the user interface experience, presenting information all at once with minimal interaction, rather than a one-to-one mapping with backend API endpoints. [16:17], [16:59]

Topics Covered

Coding is the fundamental skill for all agents.
Skills provide agents with resources, not just instructions.
Agents outperform workflows for quality, not latency.
Multi-agent systems delegate tasks for parallel processing.
Tools should map to user interfaces, not APIs.

Full Transcript

- I think there's also a lot of interesting things

to explore of multi-agent as a form of test time compute.

- Basically letting Claude,

many Claudes work on a problem can be, you know,

get you a better final answer than just one.

- Hey, I'm Alex, I lead Claude Relations here at Anthropic.

Today we're gonna be talking about

building more effective agents

and I'm joined by my colleague.

- I'm Erik, I work on multi-agent research

here at Anthropic.

- Erik, to kick us off here,

can you just explain why Claude is so good at agent tasks?

- Yeah, sure.

So during our training,

we let Claude practice being an agent.

We give it open-ended problems for it to work on

where it can take many steps

and use tools, explore where it is and what it's working on

before giving a final answer.

And by getting lots of practice at being an agent,

Claude becomes really good at this.

- Okay, so it's these long running tasks

and a variety of domains basically.

And through the process of RL

and other training mechanisms,

Claude is learning an objective of how to do these things

with basically limited guidance or feedback.

- Exactly, we do lots of RL on coding tasks,

on search tasks, lots of things for Claude

to practice being an agent in different environments.

- There's kind of this conception, I think of Claude models

that they're really, really strong in code,

but that doesn't always maybe transfer into other domains

or that coding is its own separate thing.

What are your kind of views on that generally?

- So coding has been the first task

that we've really focused on,

but once you have an amazing coding agent,

a coding agent can do any other kind of work.

- If you need to do search, you can do web search,

you know, via APIs, you can plan a weekend by,

you know, creating a schedule.

So we really see coding as a very fundamental skill

for an agent that's gonna have a lot of spillover effect,

to be able to make Claude great at all sorts of things

and sort of like train on the hardest thing first

and then everything else will become easy.

- One interesting thing I've seen here recently

with a feature that we released in Claude AI on the web

was the ability for Claude to create actual files

through writing code.

So it was like writing a Python script

and then the Python script got ran

and all of a sudden you have like a Excel sheet

that popped out of that.

Is that kind of the future direction that we're headed

is like Claude's writing scripts

and taking actions on computers to create files

or do things that are traditionally not code related?

- I think that's one of the really effective ways

Claude will be able to do these things.

Actually, just a few days ago Claude was helping me

make some diagrams for a presentation

and it was able to create files

just by writing out the SVGs,

but then I wanted it to make a much more detailed diagram

that would need a lot of repetition

and so Claude was actually able to do this

by writing some code to generate the SVG,

which ran much, much faster than Claude itself

needing to write you know,

it was a very, very repetitive image file

with lots and lots of sort of detailed patterns in it.

- Yeah. - So, yeah,

I think that for a lot of cases writing code

to produce some artifact will be much better

than just trying to create that artifact directly.

So it's one way to do it for harder cases.

- Okay, right, yeah.

Code allows for kind of this speed up

that's not even possible with like a human like clicking

and dragging and using their mouse on a computer.

Like repeated actions.

- Exactly, Claude gets a for loop.

- Yeah, if you're a developer

and you're building an agent with Claude,

one thing that we've started to see become really popular

is this Claude Code SDK.

Can you walk me through what that is

and how you're seeing developers starting to use that?

- Yeah, so we're really excited about developers

using the Claude Code SDK.

This is something where previously if you wanted

to build a coding agent or sort of any agent,

you had to really go from

nothing but hitting an API endpoint,

build the loops yourself, build all the tools,

build executing these tools,

interacting with files, interacting with MCP.

We basically have already built all of that into Claude Code

and even though its name is Claude Code,

really Claude Code is just a general purpose agent

that is most often used for code.

Yeah, we are encouraging a lot of developers to use this SDK

as the core of their agent loop

and that way they don't have to spend a lot of time

reinventing the wheel that we've already put a lot of time

into polishing and perfecting that core agent loop

and instead they can use that

and then just add their tools

for their own custom business logic

or affordances into that via MCP.

- Right, so it offers that sort of customizability

to where you can remove the coding-specific bits.

- Exactly. - And put in

whatever sort of prompt or tools that you need,

just like slots nicely into the scaffold.

- Yeah, I think also the people

have been using Claude Code for all sorts of things.

I think the, my strangest use of Claude Code

is I once had it plan a date for me

where I did a bunch of web searches,

found interesting activities and restaurants in the area

and so not code related at all, but it has all the tools.

- How was the date?

- It was pretty good.

It was great, yeah. - Claude did a good job?

- Yeah, Filoli Gardens

and then a Chinese restaurant nearby.

- Wow, okay. - Claude did a good job.

- I'm impressed. - Yeah.

One other thing on Claude Code

that has been another popular feature

I've seen a lot of software engineers use lately

is Claude MD files.

So these are files that you, you know,

define within a project

and gives Claude relevant information about

what your programming style is

or like what the layout of the directories are,

things like that.

We've now launched a similar concept

that maybe takes a step further called Skills.

Can you explain what Skills are

and how we're starting to see developers use them

and what they mean for Agents?

- Yeah, so Claude Skills are a very exciting extension

of Claude MD files where instead of just giving it

notes files, you can give it any sort of file.

That can be PowerPoint template files, it can be code

and like helper scripts that you want it to use.

It can be images or assets.

And I think this extension of not just instructions

but resources for the agent to use

is a really, really powerful tool where you might say,

not just these in are my instructions

for making PowerPoint presentations, but here's, you know,

the head shots of all of our company leadership

that you might need to reuse in many presentations

and just giving it all to Claude in a reusable way.

So it has everything it needs right there.

- One analogy I've heard used internally

that I really, really like is,

it's kind of like in "The Matrix"

when Neo is learning kung fu for the first time

and they like inject him with the Kung Fu information

and all of a sudden he is like a Kung Fu master.

That feels like very similar to when I give Claude a skill

of some type of like, here's how you create spreadsheets.

And it's like, oh, all of a sudden

Claude's like a banker now

and it can create a financial model for me.

- That and where they load in all of the racks of equipment

and tools and stuff for them to grab.

- Yes. - It's like, you know,

you can start with these things, not just instructions.

- Yeah, I love that.

Switching gears a little bit,

so last time we chatted on camera here a few months back

and we were talking about Agents

and at the time we were in this transition

from maybe workflows which are like very defined ways

of how you chain together prompts

to what was just like a single agent system

where you're running a model in a loop.

Since then, what's been the evolution in the space?

- Yeah, so we've really seen Agents take over from workflows

where Claude has gotten so good at responding to feedback

and correcting its own work

that now Agent loops really dramatically outperform

workflows for most things where you care most about

absolute quality.

Workflows are still great,

where you need very low latency

and you want Claude to just give a best answer, single shot.

Agents are really, really high performance now.

I think one of the things that I've seen develop since then

is what I call workflows of Agents.

Whereas previously an application might have had

a workflow that had Claude in single shot

write a SQL command in order to load data

and then that would go to another step in the workflow

where it would then write a chart to display that data.

And if the SQL command failed, you know this,

it doesn't know that it's not returning any data

and then the second step of the workflow--

- Right. - Is kind of screwed.

- Completely falls apart.

- But now I've seen people where each one of those steps

in the workflow is actually a closed loop

where instead of just writing a single attempt

at a SQL query, it then runs, Claude sees the output

and then it can keep iterating and repeat

until it knows that it got the right value

and then it transitions to the next step in the workflow.

- Okay, interesting.

So yes, this evolution I guess

of like chaining together prompts

to now chaining together agents in these loops themself,

we'll see where that goes from there.

One other big topic of discussion,

I feel like that has taken a lot more chatter as of late

is this question around observability and verification.

Can you explain what that challenge is

and how people are starting to think about it?

- Yeah, so observability is very hard for Agents,

especially as the systems get more complex

and I think that's one of the reasons

where I still really believe

that even though the models are much more capable today

than they were a year ago and they can work better

in an agent or even more complex setups,

I think that simplicity is still a really important thing

and that even though you can build a big workflow of agents,

you should still start sort of by

from the simplest possible thing

and then work up to a more complex solution.

And you know, that's first trying single shotting things

or trying, you know, single shot prompt to Claude Code SDK,

which is now just sort of such a simple, easy thing to use.

And then I think only as needed adding layers

and layers of complexity

because that's gonna make the absorbability harder.

- Another term here maybe in parallel to workflows

of agents is multi-agent, is that the same thing

or is that something different?

- Yeah, so multi-agent is my main area of research now.

I'd say it's pretty different from a workflow of agent.

- Okay. - Workflows of agents,

where sort of an one agent goes,

finishes and then it transitions

or its output gets sent to the next agent to work on.

Multi-agent is where fundamentally you have multiple agents

or multiple Claudes working at the same time

where maybe one parent agent delegates tasks

to five sub-agents that can each then work in parallel.

And this is how our deep research

search product works is the main orchestrator agent

will decide and create several sub-agents.

They can do lots of searches in parallel

and that's way better for the user

because you know, all this happens in parallel

and you get the answer back much sooner.

We also see things like in Claude Code

the model will use a subagent.

So if something, if some sub-task is gonna take

tens of thousands of tokens,

like maybe finding a certain implementation of a class,

but the answer really boils down to something very small,

it can do that work in a sub-agent

to protect the main context from all of that,

those tokens that aren't necessary for the main work.

So yeah, basically can offload this piece of work

and just get back the final answer that it needs.

- So are we exposing then this subagent in this case

is like a tool that Claude can call upon?

- Exactly. - Pass in,

it'll pass in the prompt

as like a parameter or something?

- Exactly, yeah.

So to the, to Claude sub-agents look like a tool

where it can pass prompts to the sub-agents

that will then go and do work.

And part of my research is training Claude

to be a better manager and know how to--

- Oh interesting. - Give clear instructions

to its sub-agents and make sure

that they gets the right things

and needs out of them.

- How is this different than,

or is this maybe like a specialized part

of tool calling overall or is it different in some ways?

- I would say that this uses the framework of tool calling

for that communication protocol

and it just happens to be a tool that itself

is backed by Claude, by another Claude.

- Does Claude have like an intuitive understanding

of what a subagent is or do we have to like teach it?

Like you're actually talking to another version

of yourself, Claude,

like don't get freaked out sort of thing?

- I would say that Claude makes a lot of the same mistakes

that first time managers make

of where it will give incomplete

or sort of unclear instructions.

- Right, right. - To a sub-agent.

- Right. - And you know,

kind of expect the subagent

to have the right context when actually it doesn't.

And I think something we've seen

during training on sub-agents is that Claude

starts to get much more verbose and much more detailed

and give its subagent the overall context

of what's going on. - Interesting.

- So that they can do better work

that adds them to the whole, so.

I'd say that, you know, it definitely Claude,

Claude has a lot to learn

and is learning to get better at this.

- Okay, cool. - Yeah.

- What are, what are some of the use cases here?

So their search is one in like preserving context,

is there other things

that people are using multi-agent for right now?

- Yeah, I think coding is,

there's a lot of subagent use in coding.

Anything that can be parallelized or MapReduced.

If you have something where you need to produce

a lot of output or there's maybe 10 parts

of some output you're creating,

if you can split that up among 10 sub-agents,

that can be really, really effective for saving context

and getting faster results.

I think there's also a lot of interesting things to explore

of multi-agent as a form of test time compute.

- Basically letting Claude,

many Claudes work on a problem can be, you know,

get you a better final answer than just one.

Just like with people, you know,

a bunch of people putting their heads together

can get better results.

- In that case, are we specializing these agents in any way?

Do we gear them towards like one type of persona or another,

or is it just kind of let them take whatever form?

- I think you can do either.

You know, sometimes it's helpful to give a bunch of people

the same exact task and see

what the different answers they come up with are.

Sometimes it's good to have many people

or many agents work from different approaches

to the same problem or split it up.

One thing I've seen a lot is customers

that have a lot of tools, maybe 100 or 200 tools

that they want an agent to use,

they found that it's really good to split up

those tools among sub-agents.

So the main agent, all it has to know is hey,

I want to use this bucket of tools

and then there's a subagent that goes

and does the actual work there.

So that each subagent just has maybe 20 tools

that it needs to understand and know how to use.

- Have we tried like scaling agents like all the way up?

Like what happens if you have like a thousand versions

of Claude all working on one problem?

Does it just turn into chaos?

- I've not tried that yet. - Okay.

- But I'll get back to you.

- Good research idea right there.

What are some of other like failure modes

that we're seeing right now with agents or multi-agents?

- Yeah, I think just like any sort of complex system,

I think it's easy to overbuild something

and lose a lot of efficiency

and just create sort of a lot of like dead weight.

And so I've seen overbuilt multi-agent systems

spend too much time just talking back and forth

with each other and not actually making progress

on the main task, and you know,

human agents or human organizations suffer from the stew.

As companies get bigger,

you have more communication overhead and you know,

less and less work is actually, you know,

the people on the ground making progress on things.

- And so I think that's another interesting thing to study

is like how can we make organizations of Claudes

very effective while keeping the overhead small?

- If I'm a developer and I want to get started with agents,

whether I'm building on the Claude Code SDK

or just trying to on my own, do you have any tips

or best practices that you'd give them?

- Yeah, I think the best practices really remain

start simple and make sure, you know,

you only add complexities you need.

I think another really important thing

is think from the point of view of your agents.

If you are giving Claude tools or prompts

or sort of any affordances, put yourself in Claude's shoes

and read what it actually gets, what it sees as the model

and make sure there's actually enough information there

for you to solve the problem.

It's very easy to sort of forget, you know,

that we're seeing everything

and the model only sees what we showed.

- Right. - And it's.

- Yeah. - Yeah, I feel like

it's always important to go back

into like the raw transcript of like your tool calls

and your logs and everything and just view that.

- Exactly, and I think another thing

is that as people are building more things like MCPS

and trying to connect Claude to more things,

I think a very natural first instinct that people have

that's very wrong is that an MCP

or tools should be one-to-one with your API

and I think actually tools for the model or MCPs

should be one-to-one with your UI, not your API.

Because ultimately the model is a user of these things.

It's not, it doesn't work like a traditional program.

So if your API might have three separate endpoints

for say loading a Slack conversation

and turning a user ID into a username

and turning a channel ID into a channel name.

If those are the tools you give the model

to understand Slack, for it to understand anything,

it's gonna have to make three tool calls.

Versus as a user, you know,

we just see everything all nicely rendered.

- Oh, that's interesting. - So you wanna create a tool

or an MCP for the model

that it presents everything all at once

with as little interaction as possible.

Just like for a user it'll be terrible

if every time you had Slack

you had to like click on a user ID

to see what the name was, et cetera.

- Right, right, I like that.

So kind of working back from like the end state almost

instead of like just trying to map

the technical specs one to one.

- Exactly, and sort of surround

whatever context you need with it.

- What do you think the future of Agents

has in store for us?

Any predictions on these next six to 12 months?

- I think Agents are gonna become a lot more pervasive

sort of starting in areas that are verifiable

like software engineering.

You know, coding agents have already changed how I work

and how tons of people at Anthropic work

and I think there's still a huge amount

to be be gained there.

I think one of the really exciting things is if agents

can start getting better at verifying their own work

with things like computer use of, they can write a web app,

but can they go actually open it up and test it

and then find their own bug

instead of you needing to do that.

I think that's one of the most exciting things.

- Yeah. - Is like closing

that loop of testing

so that I don't have to be Claude's QA engineer.

- Right, so kind of combining all these things

from the software engineering abilities

to the computer use abilities

once we put all these pieces together.

- Yep, and I think the computer use

is also gonna really open up a lot of other avenues

and domains where agents

have been sort of locked out of so far.

- What would be an example of that?

- I think that if you want to have Claude

sort of do work for you in a Google Doc.

- Yeah. - Right now it's,

you know, Claude can write for you

but you're copy and pasting back and forth.

- Right. - But if you have computer use

and you say, Hey Claude, can you clean up this Google doc?

It can just do it right there for you.

Scrolling around, clicking, editing the text

and that's just such a nicer experience

than needing to copy and paste back and forth.

- Yeah. - It's like wherever you are,

Claude can be there with you if it has with computer use.

- Well I'm very excited to have Claude write my Google Docs

and respond to all of my comments for me.

- Exactly. - That'd be

a very nice future.

Erik, this has been great.

Thank you so much for the conversation.

- Absolutely, thank you.

Loading...

Loading video analysis...