AWS re:Invent 2025 - Architecting Scalable AI Agents using Amazon Bedrock AgentCore (TNC330)

By AWS Events

Summary

Topics Covered

AgentCore Treats Agents as Microservices
Runtime Framework Flexibility Wins
Dual Token Swapping Secures Tools
Memory Prevents Goldfish Conversations
Evaluations Measure Agent Quality

Full Transcript

So We are here in the last session of the day in the stage.

And we are going to talk about a subject that I think it was very recurring during the whole year for you guys.

For me, it was.

We are going to talk about agentic assistance.

But an agentic assistant, is not a simple solution that requires a knowledge basis, a model, and we would simply answer questions.

We are thinking about a more sophisticated system.

Think about an approach where we are treating our agentic assistant like a microservices architecture.

So in this session, we are going to talk about AgentCore.

My name is Marilia Brito.

I'm a senior technical instructor and leader of the instructors in Brazil, Sao Paulo.

So, this is our agenda.

We're going to check every single component of this modular architecture that AgentCore has.

But let's start simple, right?

First, we have the elements to build this agentic assistant.

We have the goals and instructions, so we have to perform something in this assistant.

And also the tools.

The tools are able to reach internal and external systems. The tool can be a Lambda function, why not?

Or it can be an external API as well.

It depends on our business necessities.

We have also here an important thing of an assistant, of any assistant, the observability.

We must understand how our components are working, individually and together, and how the agent is behaving.

So, what's happening during the reasoning loops?

What's happening when we call a different tool?

Is this the correct tool?

So we are only going to know about it if we have monitoring.

But, let me ask you a question.

How many of you built an agentic assistant working with any, it doesn't matter the framework, with any framework from end-to-end in this last year?

Oh, we have a few people over here, but I have a different question right now.

Was it easy?

To manage the components, was it or not?

No, it's not easy.

Because in this microservice architecture...

Well, the premise is simple.

We have a distributed system, so we have many components to take take care.

And sometimes we don't have the purpose-built solution to work with this.

So, we must think about infrastructure.

We must think about GPUs.

We must think about maybe more than one model for a specific purpose.

And sometimes people tend to think that I'm going to work with the biggest model that ever existed, with billions of parameters and this will be enough for me.

But, actually, this doesn't matter.

What matters is to structure your architecture with the modular components that will be configured according to your needs.

So that's why we are going to talk about AgentCore.

AgentCore is the managed and purpose-built service, focused on bringing all the magic that we have in agentic assistance to a place where we are going to think about this modular architecture piece by piece, configuring according to our needs.

And, firstly, we can think that, "Oh, but this is too much complexity."

No, it's much simpler than you are thinking.

On AgentCore, we have these models.

So the first model, the runtime, will take care about the execution environment and working with isolated sessions.

After that, we have the AgentCore gateway.

The gateway will expose the tools, even if the tools are internal.

But we don't want to provide a full access to every single tool, so we have the AgentCore identity to take care about the authentication and the authorization of the actions that the agent is going to perform.

Observability, because we said that monitoring is very important, and it is.

And also memory, so we can work with context.

And now we are going to explore every single component here.

And also on Monday, there was a few launches about AgentCore, and I tried to incorporate a little bit of this, of the launches in this presentation.

Okay, I woke up today at 5:30 to insert this news, so I hope you enjoy it.

Let's start with runtime.

The runtime is the component where the the reasoning loop will occur.

So here we have a standardized execution environment that is isolated for each session.

So one session is not able to check the other session, even using the same agent.

And the magical thing about the runtime is that...

Okay, I know everybody has their favorite model.

Everybody has their favorite framework.

And it's Okay, I have mine too.

But for a runtime, it doesn't matter.

You can choose whatever open source framework you want.

So if you want to work with Strands, it's okay.

CrewAI LlamaIndex LangChain it will work in the same way.

You can also bring models from Amazon Bedrock, Amazon SageMaker AI, and external models.

So, we have a clear use case dependency here and we have a model and framework flexibility as well.

So it doesn't matter if you have a simple workflow, I just have to check a knowledge base, a calculation tool, and that's it.

Runtime can handle this.

I have to work with 100 tools.

I have to access 30 different knowledge bases.

It's okay.

It will work at the same time.

How can we configure the runtime?

First, for our recipe, we need the model and also the framework.

After that, we have the runtime decorator.

And the runtime decorator defines how the runtime...

(coughs) Sorry.

How the run time should execute the agent and the tools.

The second element here, is the identity configuration.

And the identity configuration, will be part of a different model that we are going to see next, but it will define authorization here.

So, which actions are we going to be able to perform in the session?

And the observability configuration.

So, which metrics are we going to use to check our system configurability and also if it is performing in the way that we are expecting.

We are going to package this using Docker.

And after that, the Docker image will be registered in the ECR, in the Elastic Container Registry.

After that, we are going to launch our execution environment.

So, when there is a need to call an agent, the execution environment will be launched, and we have the runtime agent, so where the reasoning is going to happen, and the runtime endpoint.

We have a different element that we are going to talk about.

The name is gateway, but this is how the gateway is going to connect to the runtime.

But let's pretend that we already started our runtime.

Let's talk about the life cycle.

In the life cycle of the runtime, first, the session is active.

So we are performing actions.

We are using tools, we are performing reasoning, we are checking the knowledge bases, the system is observing, okay?

So, we are working with something here.

But after a task is finished, the session will turn to an idle stage.

And the idle stage for the fall, it stays up to 15 minutes.

But idle, it's okay, it's waiting for a new task.

But if nothing happens in these 15 minutes, the session will be terminated.

But what if 15 minutes of waiting is not enough for us?

We want to maintain the environment warm.

We don't want to start a cold session.

So in this case, we can extend the maximum session duration for up to eight hours.

After eight hours, the runtime environment will be terminated.

Okay?

So, let's proceed to the identity model.

The identity model will take care about the authorization and authentication, and there are three different ways that we can work with this component.

So, we can use IAM.

We can use also Amazon Cognito or an external provider like Okta, Microsoft Entra ID.

So, in this case, it's a little bit different how we are going to work with tokens.

Yes, there are tokens involved.

The identity, we work with two separated sessions.

The inbound and outbound.

And we can check that the inbound is happening, the authorization is happening, before the solicitation reaches the agent.

So first of all, any user, because usually we are working with users, right?

Users are going to use these agents to respond, to have an answer about about something.

So in this case, first, the session must be authorized on the inbound, using IAM or using the OAuth, for an external provider.

After, everything is approved.

The agent is going to start to check the input, so the instructions that we sent, decompose into steps, and each step is going to be sent to a different tool.

If the tool is internal, so a little bit of function, for instance, we are going to use the IAM.

The IAM signed before to authorize this kind of operation.

But what if the tool is external?

In this case, we are going to use the OAuth.

So we are trying to reach an external system, it's okay, we are going to use the OAuth.

But what actually happened in the outbound?

Are we going to use the same tokens that we received in the inbound?

No.

The first token that will be stored in the next component that we are going to check, will be swapped like a trusted relationship.

So if you were able to access the agent, I'm going to use your token, swap for a new token to authorize this external access.

But in cases that we started the inbound with IAM or OAuth, and in the second part in the outbound, we are going to use a different kind of authentication.

What is going to happen?

It's pretty similar, but the token is not renewed, it is not swapped.

In this case, the first token will be used to fetch the second token.

So, that's how we can access these external tools.

The next component is the gateway.

So how are we going to expose the tools?

Well, we can expose the tools, so RESTful, OpenAPI, Lambda functions, or even MCP servers, using the gateway.

And there is a few strategies that we can adopt to improve how these tools are going to be used.

So not just by keywords.

In this case, we can use also the semantic search to match the right tool.

So let's explore the AgentCore gateway.

It started with the architecture that we checked before.

So I have a user prompt reaching the agent.

We already had the inbound authentication before it, and now the gateway must provide the access for the tools.

Okay.

Take a look in that secure token vault.

This is pretty important.

Why are we talking about the identity model?

I'm told that the tokens are stored somewhere.

So this is the place where the tokens will be stored.

We don't want to store the tokens inside the runtime because the runtime will have plenty of things happening over there.

We want to store this externally, so it makes sense to be inside the gateway.

When we are working with the gateway, first we have a session ID that is provided by the runtime.

And this session ID will be binded with the workload ID.

So, what the agent is going to perform here.

Okay, we have these two IDs that we're going to check after in the observability model.

So in this case, it's providing the safe access.

I'm not going to say it's the standard practice to work with MCP for external tools.

You can use Lambda function as well to access an external API.

But, I must say that MCP during my customers' conversation is the king of popularity.

So here we have the access for the external tools.

This is new.

This was launched Monday.

In the identity configuration, yes, we have security over there.

We are checking if someone can assume a session inside an agent and to start the runtime.

But with this new capability, the policy in AgentCore that is still in preview, it's pretty fresh, we can work with more detail, more granularity in our policies.

You can write the policy using natural language, in this case in English.

But if you rather, you can use the Cedar language as well to structure this policy.

So here we're going to think about the tools that are going to be accessed through the gateway.

If you want to know more about these capabilities, there is a blog post from Danilo Poncea or is Daniel, I think it's Danilo Poccia.

And there is this one and four more launches.

Okay, I don't have the QR code.

I'm sorry, but I think that you find very easily.

I have a quick demo about the gateway.

So it's the coding from a workshop where we have the creation of the gateway, so providing the name and the description.

I'm going to use a semantic model to find the correct tools, in this case.

Okay, after that, there is the creation of the gateway.

And I'm going to use a KMS key.

This is the ARN.

Amazon Resource Name.

The protocol. MCP.

Now we are going to add the target in the gateway.

So we have to pass the gateway ID, the name of the target, the target description, a Lambda function in this case that I'm going to use.

Okay.

And for authentication, I'm working with Cognito.

This is the architecture that we already checked on the e-slide.

So we have Lambda functions here and we also have external APIs.

But in this demo, I'm going to focus more on the Lambda function.

Okay. The gateway was created.

And now we are going to add the targets.

So first we have a Lambda function that will handle with restaurant reservations.

And also a calculator.

A different function with a calculator.

But just to show that we can add more tools.

So, up to 1000 tools per target, I'm adding more calculators over here.

I'm actually duplicate it, but we are going to interpret it as more tools .

Okay.

And after that, we are going to print.

And show that the tools were added.

I swear it works.

Perfect.

And then we are going to list the tools.

And it's done.

So up to 100 targets per gateway and up to 1000 tools.

Let's take a different model now, the memory model.

It's very important to think about the memory because we don't want to repeat every single word in every single interaction with an agent.

It's pretty boring, right?

It's like to talk with a goldfish.

So an agent without memory is like a goldfish.

It will forget what you said three seconds before.

This is a little bit exaggerated, I know, but it's almost the same case.

So here we have a person asking about how to switch apps on the phone.

We have the answers.

So, what model are you using?

And the model.

Okay, we have an answer how to switch the apps.

Two days before this, how do I take a screenshot on my phone?

Oh, what's the model?

We had to explain everything.

So, we don't want to have this kind of problem.

Our customers are not going to be happy to repeat every single thing again.

There are two kinds...

Actually, now there are three kinds of memory managing on AgentCore.

The first one is the short-term memory, so the memory that we are going to use during a session.

So here we have three components.

So the first one, the events are the records of a single interaction with the user and the agents.

So, we can have the two actions, we can have the assistant and also the user messages included here.

We can start this one by one or in batches.

It depends on the volume of the conversation.

The actors.

So, which users and which agents, are included in this conversation.

And the session.

So, in which period we had this conversation.

Here is a small example.

But it happened in the same day, so it's not the same case that we are talking about in the first slide of this component.

(coughs) Sorry.

So let's check now the long-term memory.

In the long-term memory, I'm going to talk a little bit about summary, user preference, semantics, but there is a new capability, okay?

This new capability helps the agent.

Actually helps the agent and the memory to learn more about how are you communicating with your agent.

So, what's the tone that this customer is using?

What are the subjects that this customer is interested in?

What about the details about the subject?

So in this case, we are going to use to collect this kind of information for a better experience.

But let's check the other components.

The user preference.

So, here we have a different conversation.

I like this this brand of headphones.

My company gives me 25% for the item, and I prefer white open-ear headphones.

White?

No, it gets dirty.

Okay.

This conversation, this session, is stored in the memory.

And now we have the preference of these users.

So it depends on the context, it depends on the interactions that you want to store.

If you're selling electronics, this makes no sense.

So, we are storing the color, the brand, and the model.

Four days.

So, four days after that in June 5th, I want new headphones.

It's time to change the old ones.

And the agent will be smart enough to fetch the previous information and include in this conversation, even with the 25% of discount.

Semantic.

We are not going to store every single sentence, in any case, that we are checking here.

So in semantics, we are choosing, not us, but the model is choosing the most important information of this conversation.

So, a summary.

This person is pretty hard to make this person happy.

Now this person wants to return the headphones they bought last week, because found $100 cheaper elsewhere.

But our agent is pretty smart, so it will help this customer to pay less, so we'll refund $100 in difference.

I really like this agent.

Summary.

Well, in this case, (coughs) we have just a summarization.

I told you that we don't have all the lines of a chat, just the preference or the semantics or what is similar, and here the summary.

The person broke the phones.

So, now, I broke the phones, and now the agent is going to happen maybe to fix it or maybe to send the customer for some kind of shop that they can fix these headphones.

About the memory?

The default storage of the memory is seven days, but you can store the memory up to one year.

A lot of time.

Sometimes we don't remember what we said one year ago, right?

Now let's check the browser.

The browser model, we are going to use...

It's correct.

We are going to use when there is a need to access a web interface, so a website that we must search for something.

So, in this case, we can use also the security controls of the VPC.

We have a micro VM that we are going to work for this kind of operations.

Actually, almost everything on AgentCore, we are going to use a micro VM.

But it's serverless, so we don't have to worry about the size of the instance or even the scalability configurations.

This is built in the servicing.

But, in this case, we can control the network level security, using the Amazon VPC.

So, what happens when we use the browser?

First, there is the user.

"Oh, I want to buy new shoes."

Me too.

Send this to the agent and the agent will invoke the model, but the model is not responsible to access the site.

The browser is going to access the site.

So, in this case, we are going to use a computer to call, indicating what actions need to be performed in this browser.

So, here it's pretty simple.

It's pretty simple, right?

We are going to click using the left button and we have also the position to click in the screen.

This will be translated to CDP comments and we are going to use the browser as a headless browser, so similar to Selenium, but we're going to use a Playwright browser.

And then we had a screenshot, and this was returned to our customer.

But, I have this in more details.

How do we work with the browser?

First, in this example, We have a situation where I asked it to search on Amazon for a MacBook, list the options, and return the description of the first option.

So, first here, we are starting with the imports that are necessary.

Okay, so the browser on AgentCore, JSON and another different kinds of import.

But to open our flow, I'm going to use this Nova Act.

The Nova Act is an action dialect that will translate the actions that need to be performed by the browser.

Okay so to click, to copy, to type, to take a screenshot.

So here we are starting the session, and also I want to see what the browser is doing.

Yeah, we can check.

So we're going to use a DCV, SDK as well, to live stream what is happening in the browser for us in a local server, using the Port 8000.

And this session is also isolated.

We are going to work with signed URLs to perform this kind of access.

There is also the configuration about the user interface that we can see in the live browser.

And the actions that are going to be performed.

So we are going to use the Playwright with the Nova Act.

I need an API key.

And this is also part of the layout that we are going to see.

Now we are starting the viewer session.

Generating the pre-sign URL, and it's a temporary access when we work with pre-sign URL, there is a token that will be used.

Starting the session of the new browser, using the Amazon.com. site.

We have also the instructions over there and part of the reasoning.

So, I am on the Amazon page, my task is to search for MacBooks and extract the details of the first one.

This is the page.

Oh, I thought it was a video.

(Marilia chuckles) But this is the page after the first one has been chosen, who has been selected.

This information, so from Apple to Midnight is what we are going to collect.

So, the description of the product.

And as we can see here in the first box, that's what we collected.

There is also the execution metrics, so the prompt that were used, the number of steps that were executed in the site.

So first we open the site, type it, list it, and copy the first item.

So, this is the browser, secure sandbox so we can perform external actions in websites.

One more.

The code interpreter.

We can say that browser and code interpreter, are tools as well.

The code interpreter is to run your code, so to execute your code, also in an isolated environment, serverless environment using a micro VM.

This is not going to change, okay?

We can use JavaScript, TypeScript, and Python, for these executions, and there are two ways to work with that.

The first is the system code interpreter.

In this case, we are going to use the built-in libraries and all the built-in configurations to execute a simple code that is not required to import a specific library, so custom libraries.

So the agent receives the instruction, decides that it is necessary to call the code interpreter, then the session will be mapped to a specific VM, used only for this case, for this session.

We have also a terminal over there and a file system.

This file system is ephemeral, so after the execution, it will be terminated.

And then we have the result of the execution being returned to the agent.

But if I need a higher degree of customization, there is a custom code interpreter.

So, in this case, you are going to package the requirements to use in this ecosystem.

You package all the dependencies.

In our last model, a bunch of models, right?

But this is good because we can configure with a higher degree of granularity, the observability.

So in the observability, we have a full enterprise visualization.

It's compatible to OpenTelemetry, so you can check your metrics outside the service if you want, or you can check inside the CloudWatch as well.

So we can see tool use, agent actions, using detailed traces.

And there are three different visions that will be provided.

The first vision in the GenAI observability dashboard, I'm inside the CloudWatch system right now.

We can see, almost as needed.

We can see how many agents were instrumentalized so we can check the metrics.

So, just a vision about the agents and the sessions that are active, and the traces that we collected.

The second vision that we have here, is the agent specific metrics.

So, how many sessions per agent, also the consumer, the tokens, the latency.

So, we can see how the agent is performing, but what if we need more granularity?

In this case, we can check a specific session that was that ran in the runtime.

I can check the traces as well.

Here we only have three traces, so it was a very small execution.

But there is something new over here as well.

And this is pretty amazing because we were talking about latency, how the agent is performing, but what about the quality of the answers?

Now we can see with AgentCore, evaluations.

It is still in preview, but we can check metrics like accuracy, helpfulness, faithfulness, harmless, and stereotyping.

But sometimes these metrics are not enough, but it's okay.

You can work out with custom metrics, you have to describe the metric and also provide a hunking system.

It can be numerical or it can be also personalized.

For those who already worked with Bedrock evaluations, the interface, it looks like almost the same.

But in this case, where we are working with the built-in metrics...

Sorry, I said accuracy, but I was trying to say faithfulness.

But in this case, we are working with a programmatic evaluation.

When we are talking about custom metrics, we are going to use an LLM as a judge, so almost the same way that we work outside AgentCore.

So, I hope you like it, our tour on the components of AgentCore.

And the main thing here, is that now we can work with a higher degree of customization.

We can use whatever model fits you best.

We can use any framework that your team or even you, why not?

You can work in your personal project.

It's more user too.

So, if you have any questions, I'll be available next to the stage, and I hope you enjoy it.

We have a little bit of slides here, the Developers Community, but today is the last day for the expo.

You have to run to reach the other hotel, to reach Venetian.

During the event we had several activations from training certification with the gamification team, the amazing gamification team.

Also, if you want to know more about the cloud, you want to know more about the agents, you can find this free inside the Skill Builder platform.

So more than 1000 trainings for you to gain more knowledge.

Here we have also training paths to gain more more experience in specific areas.

And I want to see everybody here with certifications.

I'm a golden jacket myself.

I want to see everybody to become a golden jacket too.

And I want to say thank you so much for being here in the last session, and I hope you had an amazing event.

Loading...

Loading video analysis...