Guide to Architect Secure AI Agents: Best Practices for Safety

By IBM Technology

Summary

Topics Covered

The AI Paradigm Shift: From Static Code to Adaptive Systems
Six Ways AI Agents Expand Your Attack Surface
Prompt Injection: The Number One Threat to AI Systems
Just-in-Time Access: Minimal Exposure for AI Agents
AI Firewalls: Your Proxy Shield Against Prompt Injections

Full Transcript

AI agents are all the rage. These systems are able to perceive context, reason over goals and constraints, and take actions through tools and services. Some have called them models, using tools in a loop. Making agents even more powerful is their ability to do all of this autonomously,

without human intervention. Just tell the agent what you want done and it will figure out the details and make it happen. But with that kind of power also comes considerable risk. We need to make sure these things are true, that they operate within explicit boundaries, that they provide observable traces of the decisions that they make and the actions they take. In other

words, they need to be secured to make sure that they aren't leaking data or have been hijacked by an attacker. They need to be governed so that they're reliable and operate within the context

an attacker. They need to be governed so that they're reliable and operate within the context we expect. And they need to be audited to ensure compliance with organizational policies and

we expect. And they need to be audited to ensure compliance with organizational policies and regulatory requirements. IBM and Anthropic recently released a guide to architecting secure

regulatory requirements. IBM and Anthropic recently released a guide to architecting secure enterprise AI agents with MCP. I think you're going to find it interesting. In this video, we're going to do a dive into that document to discuss the risks with AI agents and how you can address them with secure architecture principles. When we're talking about agents, there's a paradigm

shift that we have to consider. The paradigm shift I'm talking about is one where we move from more deterministic kind of logic, where everything, if you give it a single set of inputs, you'll always get the same outputs, to a more probabilistic type of system where it's based on probabilities and

statistics. So the pattern might resolve to one answer in one case, and even with identical inputs,

statistics. So the pattern might resolve to one answer in one case, and even with identical inputs, we might get a different decision in another case. So it's dynamic decisions. Also we're moving from a static environment to a more adaptive environment. And this adaptive environment

is one that learns over time. It evolves its behavior based upon interaction and based upon human feedback. If we give it a thumbs-up, it will do more of the same thing. And we move also from a a

human feedback. If we give it a thumbs-up, it will do more of the same thing. And we move also from a a mindset of code first—I need to implement this thing, write the code, come up with the recipe—to one of evaluation first. This is now the main thing that we're focusing on. With the evaluation, it's

less about implementation details and a lot more about measurement of outcomes and seeing if those outcomes are moving us closer to the endgame, the goal that was stated for this particular system. All of this needs a an agent development lifecycle. We need a way

system. All of this needs a an agent development lifecycle. We need a way that is structured that we approach all of this. So how would we do that? Well, it's fairly obvious—there should be—that we do something that involves building and then managing the system. So

we're going to start off with planning the system, then we're going to move to coding, then we're going to move to testing. So that's pretty much the build part of this system. And then we're going to go from there to debugging and deploying and monitoring and

then ultimately back into planning again. So this is monitoring the system and managing it. So we

have a structured way to look at all of this. And we do this across a DevSecOps approach. DevSecOps

means, if you know about DevOps, in that case we're taking the development aspects and integrating them with the operational aspects. So we don't just write the code and then throw it over the wall; we're involving operations throughout the process and giving away to integrate those things. With DevSecOps, now I'm inserting security at the beginning and the middle and the end. So

things. With DevSecOps, now I'm inserting security at the beginning and the middle and the end. So

security is something that is preserved throughout. Ultimately we want agents that are going to be safe. They're going to be reliable. They need to be secure and they need to be aligned with the organization's goals. Let's take a look at the security threats that an agent

could potentially face. Right off the bat, an agent is going to extend our attack surface.

It's going to expand it. In other words, every new technology does that to us; an AI is no different.

So we've got the AI portion of the agent that now is something someone could attack. But there's

also the MCP protocol. That's the thing that allows agents to talk to tools and other services.

That's yet another potential attack vector. We could also have a case of excessive access or excessive agency, where the agent has more control and more access to things than it needs. It could

also be the case that the agent takes it upon itself to escalate its own privileges, which we don't want it to do, unless that's something that we intended. The agent could leak data. We need to make sure that doesn't occur. The agent could be prompt-injected. So that's our number one attack type against large language models. A prompt injection is where someone injects commands into

the system and kind of cake takes control of it remotely through those commands. It could also be, if we're not really careful, an attack amplifier. An agent, because it's operating autonomously and has certain capabilities, if it becomes compromised, it's just going to operate on light speed and do all of this in a way that we may not be able to control. So we also ultimately have to

make sure that we don't drift out of compliance, that we stay in the area of where we need this thing to be. So those are the threats. What are we supposed to do about those? Well, we'll

start off with system controls. What kinds of things do we need to have? Well, first of all, I want this thing tightly controlled. I need it constrained. I need it operating within the boundaries that we expect it to operate within. I want it permissioned. We use this concept of roles-based access control. We need to assign roles to these agents as well and not let them do

things that they shouldn't be doing. RBAC could also—doesn't mean this to most people, but I often think of in terms of risk-based access control. We give you only the amount of access control that you need, the amount of access that you need based upon certain levels of risk and and how high-risk is what you're doing going to be. And then we want it sandboxed. Have the agent operate within that

sandbox so that when it does things and gets outside, it can't do more damage. What are the design principles then that we can apply to these system controls? Well, we need to consider what is acceptable agency. What are the things that we want this agent to be able to do, and what are the

acceptable agency. What are the things that we want this agent to be able to do, and what are the things we don't want it to be able to do? We need it to be interoperable. It has to interoperate and integrate with lots of different tools if it's going to be effective. But we need to know what those tools are and what those tools are ultimately going to do, because those things

create downstream risk for us. We need to make it secure by design. Security is not something that's done well if you bolt it on after the fact. It needs to be in there right from the start. It

needs to align with the business goals and meet the business objectives. We also need to make sure that this thing mitigates risk. It's going to introduce new risk. We certainly don't want it to introduce more risk than we have to. So be able to do that as much as we can. We're going to continuously observe the reasoning and govern for compliance actions. We need to be able to know

and have oversight into what this thing is doing, because it's operating autonomously and we need some checks and balances in place. We need to understand what the key performance indicators are and make sure that we're mapping to those. The business will tell us what those are, and make sure we're in alignment. And then we need to have boundaries. The principle of least privilege, which

I've talked about many times before. The principle of least privilege is a system can only have access or a person can only have access to just what they need to do their job and nothing more.

And the instant they don't need it, we take those accesses away. So we need to be able to implement those boundaries and, and really lean into that principle of least privilege, especially when it comes to agents. It's never been more important than it is now. And then ultimately, I'm going to say, we need a human in the loop. I keep talking about oversight, this is where the oversight comes.

Now let's talk about a security framework that embodies and encompasses those kinds of design principles that I just talked about. We're going to start off with identity and access management.

So I'm drawing a person here, but we actually also have to consider nonhuman identities.

These nonhuman identities are what the agents are. And they need to have their own unique credentials, their own logins, their way of us making sure that they are uniquely them. And if something happens, I want to go trace it back to which agent was the one that was misbehaving. Just like with users, they shouldn't be sharing passwords. Agents shouldn't be sharing credentials, they should be

unique. We want just-in-time access, that is, the give them, this agent, the

unique. We want just-in-time access, that is, the give them, this agent, the ability to do the thing it needs to do, and then take that away if it no longer needs to do it. So

it might even be time based. So you have this only for a few minutes or a few hours or a day or something like that. We're going to do roles-based access control. So we're going to assign certain roles to agents, just as we do with users, and make sure that those agents are in conformance to that.

And then ultimately I need to audit all of this. In other words, be able to go back and see if I did all of these other things correctly. So that way I have oversight. Now, the next piece we're going to talk about is the data and the model. So we can think about this as something where we've

got a a user that might be trying to come in and hit our AI system, and if they're trying to get that directly, then they could send a prompt injection or all kinds of other things in. What I'd rather do is instead of having them go directly, I'd rather have them go through an AI

in. What I'd rather do is instead of having them go directly, I'd rather have them go through an AI uh, firewall or a proxy or a gateway, however you want to think about this. Have them come through here, have it examine the policy, have it look for the the prompt injections and things of that sort, and

then it hits the AI. That's how we can secure a large language model. But also we could do this for MCP, for the the protocol where we've got an agent talking out to a tool or a service. We could have it come through this same firewall, this same proxy. We

could be looking in this case, for instance, for data loss prevention. If we see data coming back out of the system that through an MCP call, that might be something we'd want to know about. And I

gave an example here already of prompt injections that we might want to be detecting there. Now, the

next one that we're going to take a look at then is the threat case. So we've got these threats that are sitting out here. We need to be able to detect those threats in real time. That means I need monitoring capabilities that are seeing what the agents are doing, seeing the tools that they're calling, the services that you're that they're using, and seeing what the effects of

those are. And being able to look over time and understand if it's doing something that's

those are. And being able to look over time and understand if it's doing something that's abnormal, if it's getting too much access, if it's downloading too much data, if it's changing things that it shouldn't be. So I need to have alarms in place that will detect that. Then, that's

reactive. We also want to be proactive. I need to be able to do threat hunting. I want to go out and imagine certain things. Imagine a hypothesis and come up with that and see what if this might be happening, and then be able to go out and see if that's happening. So be proactive about looking at

the security of these agents. And then finally, I want to be able to assess the risk of these systems. I want to be able to to see what kinds of of assessment. So as

I just said, we have to see what kind of risk is this system exposing us to? I want to understand what the agent is doing, what it's able to do, where its limitations are, where it's going beyond those limitations. I also mentioned earlier that we want security across the entire uh, agent

those limitations. I also mentioned earlier that we want security across the entire uh, agent development lifecycle. And then, to monitor this system, which is a part of the

development lifecycle. And then, to monitor this system, which is a part of the detection. But it's also looking for other things like configuration drifts. If these agents are

detection. But it's also looking for other things like configuration drifts. If these agents are doing operations on the system itself, they may change some of their own parameters. We want to make sure that the model doesn't drift over time. We want to make sure that the configuration of the system doesn't change over time in ways that are unexpected. And also look at their access

patterns. What is this agent doing, and is it doing the right things or not? AI agents have tremendous

patterns. What is this agent doing, and is it doing the right things or not? AI agents have tremendous potential to improve productivity and find solutions where previously there were none, but they also extend the attack surface and if we aren't careful, amplify risk. The good news is that solid guidance exists in the form of the collaboration from IBM and Anthropic that deals

with architecting secure AI agents, which is linked in the description below. For those who get it right, agents will be the competitive differentiator. And for those who don't, well, you don't want to think about that.

Loading...

Loading video analysis...