AWS re:Invent 2022 - [NEW] Introducing Amazon VPC Lattice: Simplifying app networking (NET215)
By AWS Events
Summary
Topics Covered
- Microservices Explode Networking Complexity
- IP Addresses Fail as Identity Framework
- Admins and Developers Tug-of-War Hurts All
- Service Networks Draw Application Boundaries
- IAM Policies Secure Service Traffic Natively
Full Transcript
Hello everyone. I'm Satya Ramaseeshian and with me is Justin Davies. We're both product managers in the EC2 networking organization. We're so excited to be here. Before we get started, I do want to thank everyone who's here to attend in person. And
we're going to be introducing VPC Lattice to you guys. It is a new EC2 networking service that was launched earlier this week in preview.
So Lattice was built to answer one key question. How do we make it simple for developers to connect, secure, and monitor their services without sacrificing the controls admins need to audit and secure their environments? To put
it in simple terms, with Lattice, we strive to solve key pain points for both developers as well as admins. A quick run-through of the agenda. This is
a 200-level class. and I will be covering the need for service-to-service communication, current solutions and overview, and to identify some of the key pain points.
I will then turn things over to Justin, who will be covering Lattice's key features and functionality, some popular use cases, and how you can sign up. So let's get started. Over the last few years,
started. Over the last few years, a major trend that we've observed is interest among our customers to modernize their application, which is to transform your application from being this one big monolith to applications being composed of tens of hundreds of
thousands of smaller and decoupled services, popularly known as microservices.
In the context of Lattice, we just refer to these as services, and so Justin and I will do the same throughout this presentation. The shift to modernization or these service-oriented architectures have become popular because it has the potential to give you developer agility, ability to reduce your costs,
and also potentially scale your applications faster. But as your modern application or your service-oriented applications scale, there are two things that you need to keep in mind. The first thing is, how do I keep my scope of impact as small
mind. The first thing is, how do I keep my scope of impact as small as possible, and how do I secure my workloads by limiting exposure?
Today, one of the most effective ways to do this is to adopt a multi-account and multi-VPC strategy. This also tends to be an obvious choice.
because it is easy for personnel with an organization to start up their environments in their own VPC or in their own account. And this is crucial when you are building these service-oriented or modernized architectures where teams are distributed and personnel can come from any org. Apart from the quick start, multi-account, multi-VPC
strategy, when used with the right controls, can also reduce your scope of impact and also limit your exposure. This is because you can think of your VPC as your networking boundary. And what I mean by that is nothing can get in or out of it unless you poke some kind of hole into it. In a
similar token, you can think about your account as your permission boundary. Making these boundaries smaller and intentional provides the ability to enable granular permission controls, which can increase your overall application's security posture.
But the challenge is you have these services that all depend on each other, and then they are spread across multiple VPCs and accounts. You don't have a nice mapping of your services or your open-world applications boundary to your network and VPC boundaries.
So how do you ensure you have the right connectivity solution? How can you be certain that the right applications or your right services are talking to each other and the wrong things aren't? The answer to these questions are not straightforward.
As a result, modernization can introduce increased networking complexity.
Apart from the networking complexity, typical efforts in modernization are indexed on one type of compute platform, typically it is containers. However, wouldn't it be nice if you could choose any type of compute platform
containers. However, wouldn't it be nice if you could choose any type of compute platform Instances, containers, or serverless as you build your application? Wouldn't it be nice for your developers to have this choice so that they can iterate on their business logic with the platform that best fits their needs? The challenge this introduces is each
compute platform has its own specific solution to enable service-to-service communication, which exacerbates the networking complexity that you need to deal with. So your trade-off winds up being, do I give? flexibility with my compute platform, or do I give consistency with deployment? And in this trade-off, typically organizations choose the consistency in deployment.
with deployment? And in this trade-off, typically organizations choose the consistency in deployment.
Now, within EC2 and within VPC, we have a number of solutions for connectivity that you can use for various use cases. But staying current with the latest and greatest requires one to become a networking wizard. However,
we don't want our developers to be networking wizards. We want them to iterate fast on their business logic. At the same time, we want to empower our admins to enforce the security posture as well as the audit controls that they need in order to be successful with their environment.
So let's talk about current solutions for the network portion of things.
So today, if you have VPCs and services in your VPCs, the way you establish connectivity with them is by poking a hole in it. And an option is to use an internet gateway. With this, you expose your VPCs to the internet, and you have the ability to use EIPs and IPv6 to communicate across your VPCs using the
Amazon backbone. A second option is to poke a medium-sized hole.
Amazon backbone. A second option is to poke a medium-sized hole.
in which you use a specific range of private IPs to communicate across VPCs. This
is typically available with AWS Transit Gateway or VPC peering. We consider Transit Gateway to be a slightly bigger hole than the VPC peering because Transit Gateway involves multiple VPCs, whereas peering involves a one-to-one, two VPCs. And a final option is to use AWS PrivateLink, where you have your consumer VPC establishing a unidirectional connection with the service
behind a load balancer. Now, as you can see, as you go from top to bottom, the size of the hole reduces, and you can use additional VPC features like security group, network ACLs, firewalls, et cetera, to further lock things down. But how do you know you're making the right
down. But how do you know you're making the right connectivity choice for the short-term as well as the long-term? And also, how do you ensure that nothing falls through the cracks when services can be so different within your application, and it's built by teams that are distributed.
So far, what we've talked about is just the network connectivity, and what that is is just the plumbing. And I don't mean that in a bad way, but the internet might be a bit of a boring place if it was just connectivity and no applications, right? In the same way, for a service-oriented architecture or a modernized application, you need network connectivity, and also application awareness.
So, what do you need more than network connectivity? To build secure, scalable, and operationally ready applications, you need service discovery. You need traffic management. You need load balancing. You need authentication, authorization, as well as observability. For example, How do you
balancing. You need authentication, authorization, as well as observability. For example, How do you use a consolidated mechanism for service discovery when operating across VPCs and accounts and across various compute platforms? How can you route and load balance to VPCs and accounts that are different from where your clients are? How do you have a monitoring solution
that is consistent across compute platform? Imagine if you had the same solution for Kubernetes as well as Lambda. And how do you have confidence in your application security posture?
Let's double click a little bit on security. It's not like people don't like their networks. It's just that they don't want it to be the only thing that they
networks. It's just that they don't want it to be the only thing that they rely on. And this is because relying on networks really
rely on. And this is because relying on networks really implicitly means you're relying on IPs as your identity framework. But that's an insufficient choice if your architecture involves containerization or serverless. What customers
really need is strong authentication, and context-specific authorization for which you need good secrets management. For example, how do you distribute your credentials?
How do you rotate it? How do you revoke it? And to do all of this, the security features, the load balancing, the observability, all of it, you need some kind of application layer on top of your service. You can accomplish this either by using custom code or using an application layer proxy.
So, let's talk about the application portion of connectivity. For discovery, you could use Route 53. Cloud Map is another option. Staying with AWS services, API Gateway
53. Cloud Map is another option. Staying with AWS services, API Gateway and ALBs provide you routing and auth capability. And these are fully managed services.
Now, when you move to containers, you get the basics out of the box, right?
You get service discovery. If you're using Kubernetes, you get service discovery with Kube Proxy, as well as if you're using, again, Kubernetes, you would, I'm sorry, you get service discovery with Core DNS, and then you get connection level round robin load balancing with Kube Proxy. But if you want to build against scalable, secure, and available applications,
Kube Proxy. But if you want to build against scalable, secure, and available applications, you may need request level load balancing and traffic management. In addition to that, if you're even in the same cluster and you have multiple teams using it across multiple namespaces, you still need context-specific authorization and strong
authentication. And telemetry is always important for data operations.
authentication. And telemetry is always important for data operations.
On top of that, if you want to connect to dependencies that live outside your cluster, you still need to implement some kind of application-layered networking again. So,
Similar to the previous slide, this could be an application load balancer that acts as an ingress controller. It is a fully managed solution and can route to targets across various namespaces in the cluster.
A second option is to use a service mesh.
With service mesh, you have a centralized control plane and a distributed data plane in which you have sidecars that are your application layer proxies that live right next to your service workloads. Service Mesh has a range of features and functionality, but it doesn't always lend well to non-containerized architectures. For
example, how do you deploy a sidecar right next to a Lambda function? And also,
do you really want to manage a distributed fleet of sidecar proxies right next to every workload as you scale? It also reintroduces the challenge of managing a proxy that was already solved with something like an ALB. On top of that, you still have the network. It didn't go away. Admins need to pick the
right connectivity solution, and developers still need to act in close coordination with the network owner to ensure that they have the They're operating in the right subnets, right VPC, right account, et cetera, et cetera. And not having this close coordinated behavior will make it very difficult for people to build modernized applications.
This is because developers and admin fundamentally operate in different layers of the OSI model. Admins focus on layer three. Developers focus on layer seven.
OSI model. Admins focus on layer three. Developers focus on layer seven.
To complicate matters further, when you need to secure your application, you need to build defense and depth across all the layers. However, both teams managing these layers are completely disjointed today. This creates a suboptimal situation wherein both the admin and the developer want the same outcome, but there could be
unnecessary tension between the two. And this is because choices made by the admin could potentially adversely impact the developer and vice versa.
To make this point more clearly, let's imagine a seesaw. You have your admin on the left, your developer on the right. Let's just say the person on top gets to make the choice. And then the person on the bottom is kind of weighed down by this choice. And so let's start with admin making the choice. And
from a connectivity standpoint, Your admin might enforce strict requirements on VPCs and subnets that can be part of the application. This focus on admin's consistency could create problems for the developer. For example, they may need a rigorous ticketing process just to get started. Or they may have to learn a whole
bunch of networking and have to become that networking wizard that they may not want to become. Now let's flip the paradigm. Let's imagine that you give the developer
to become. Now let's flip the paradigm. Let's imagine that you give the developer infinite flexibility. So they can spin up in every VPC, any account that they choose.
infinite flexibility. So they can spin up in every VPC, any account that they choose.
So this could potentially have a bit of an adverse effect on the admin.
For example, what can the network admin do if every developer in your organization goes ahead and allocates 10-16 for every single VPC? Or let's say you have an acquisition where the developers have allocated 10-16. Some of you like that.
It's probably because you're an admin and you've dealt with this problem of IP overlap.
Others here might be curious about my keen interest in this one-sided block.
But this dynamic is the problem. Having to think about services as a series of IPs is completely unintuitive for the developer. But Well,
not thinking about this as a series of IPs can cause IP overlap problems for the network admin. As a result, you have these two personas pulling at opposite ends of the same string that causes unnecessary tension between the two, right? Even though they're both trying to work towards the same outcome for the organization.
right? Even though they're both trying to work towards the same outcome for the organization.
And in spite of the best laid plans, it's possible that you wind up in this type of a situation as your application scales. Now let's flip over to security.
An admin would like complete awareness of every participant in the network, right? They want
the smallest possible exposure. Now, towards accomplishing this, they may limit the choice of platform. Now what does that do to the developer?
He or she or they are not empowered to make the right choice of compute for the service, which could potentially impact their agility. Again, a suboptimal choice for your developer. Let's flip it again. Let's think about what the admin can
developer. Let's flip it again. Let's think about what the admin can do. Let's think about what the developer might want. It's not like the developers don't
do. Let's think about what the developer might want. It's not like the developers don't care about security. They just want it to be easy. And let's say the developers make a choice to hardcore credentials to simplify authentication. This choice could put the admin on the back foot, right? Admins are responsible for the overall security posture of the organization. So the chosen solution by the developer does authentication, but it doesn't
necessarily meet the security bar. So if you're an admin, you're not particularly thrilled with this choice. From a monitoring standpoint, the network admin wants to
this choice. From a monitoring standpoint, the network admin wants to monitor all traffic in the network so that they can react to drops in availability, so on and so forth. They need data from all the developers in a standardized format because that helps them with triage, right? This could limit the developer's
ability to use their own favorite logging solution because that is tied to the preferred tools and scripts that they'd want in order to complete their troubleshooting-related activities.
Now, let's talk about giving the choice to the developer again. Developers would
like detailed request-level telemetry for their applications to troubleshoot current issues, key emphasis being current issues. They don't necessarily worry about the length of time that you store something for.
issues. They don't necessarily worry about the length of time that you store something for.
On the other hand, admins worry about compliance. They would like historical logs.
And if a developer forgets to store their logs for enough time, you're again in a suboptimal situation for the org because your compliance need might go unmet.
So as you see across connectivity, security, and monitoring, there are scenarios that lead to suboptimal outcomes for either the admin or the developer.
And then there are situations where both admin and the developer are just weighed down by the current solutions. Imagine it's 2 a.m., there is an outage, your service owner logs in, looks at his or her or their telemetry, and they conclude it's the network. The admin pings the IPs that they're aware of, and then
they say, it is the service, right? And this happens, this friction happens because they don't have the same context, and this makes triage difficult.
So the ideal state is admin wants the ability to empower developers without losing controls.
So they just want a centralized set of tools so that they can be successful in their roles. Developers want to focus on building services. and not
networks. They're interested in building business logic efficiently and not learning a whole bunch of networking. With these challenges in mind, we've
networking. With these challenges in mind, we've purpose-built VPC Lattice. It provides admins the tools and audit controls to secure their environment and meet compliance goals, whereas developers will still have an easy approach to secure, connect, secure, and monitor their inner services communication. For more on Lattice, I'm going to
turn things over to Justin.
Thank you, Satya. How many people got the chills thinking about that 2 a.m. It's
a network problem? I did. So,
you know, I think that I want to just dive right in. You know, I think that we've done a pretty good job of kind of going through the background.
I think looking at people's faces while I was just sitting down here going through those pain points, I could see which side was the developer and which side was the admin on those pain points. So, I think it resounds, right? So, I think, I want to take a minute to walk through some of the key capabilities of VPC Lattice. I'm sure you've probably all seen the recent announcement. You've probably seen a
VPC Lattice. I'm sure you've probably all seen the recent announcement. You've probably seen a couple of different blog posts that have come out, but I want to kind of walk through kind of some of this, the things that we're going to do before I go into some of the key components. I think that will kind of help you understand how we're trying to help bridge the gap between the admins and the
developers. So the
developers. So the first thing that I want to kind of go over that VPC Lattice is really addressing is this combination of network connectivity and application layer proxy functionality. You heard Safiya talk about this before, kind
of the network layer and the application layer. And what we're doing with VPC Lattice is that we're actually kind of bridging those two things together. We're bringing it into a single platform. Lattice is a fully managed application layer networking service, and it's doing that higher level functionality. You know, the service discovery,
load balancing, network connectivity, authentication, authorization, traffic management, which means kind of, you know, the request level routing, things that you would expect from an application layer load balancer, as well as the observability. And that's kind of to help solve some of the things that we come across a lot. I'm gonna
go over use cases in a little bit here. It won't be an exhaustive list, but it'll be the ones that we come across most often, and the ones that we actually were looking at when we started designing VPC Lattice, The other thing you heard Satya talk about was this idea of there being two different layers in kind of security. You've got the network level controls and
you've got the application layer controls. And how a lot of people talk about zero trust today is they talk about just the application level controls. Strong authentication, context specific authorization. And again, they don't hate the network. It's just they don't want to
specific authorization. And again, they don't hate the network. It's just they don't want to use that as an identity framework. What we're trying to say here is, it's not an or, right? It's an and. It should be making this thing simple so that customers can kind of have a defense in depth strategy without having to overthink it. I've been working in operations and different things for a very long time
overthink it. I've been working in operations and different things for a very long time and I've made mistakes, right? When I think I understand what a policy says, I get it wrong sometimes. And so having that kind of different layered approach is kind of just that one extra thing. And so we want to make it really, really simple to have both the application side as well as the network level controls. And
we'll dive into some of that in a little bit here.
In addition, the other key component here is that VPC Lattice is actually a construct that's built into the VPC infrastructure. It's outside of user space. It's not an agent that you install. It's not a, sidecar proxy, it's not something that's necessarily something that you're reconfiguring your application to use. It's built into the VPC infrastructure,
and with doing that, we're able to kind of provide that consistent approach across a wide range of compute platforms, you know, instances, containers, and serverless. We
come across this all the time. It's actually really rare for us, especially the bigger the customer, but this also happens with smaller customers, for you to have a, would you call it polyglot? What would you call it? Mixed compute environment, right? You've lift
and shifted some of your environment and it's still on EC2. You've modernized some of your environment and it's spread out between ECS and EKS. Some of your workload you've identified as a really good fit for serverless and it's on Lambda or Fargate or something like that. But it's all over the place and then people figure out, oh, hmm, do I need to reinvent the wheel just to get some of this kind
of, features and functionality that I want. So,
consistency across compute. That was kind of the other big piece we were trying to solve. Okay, so how does VPC Lattice do this?
solve. Okay, so how does VPC Lattice do this?
There's four key components that we're introducing as part of VPC Lattice. Okay, and I'm going to walk through each one of these. A lot of this stuff is really new, so I'm going to, I'll take you for a ride a little bit slower, okay? The first thing is services. The second thing, service
okay? The first thing is services. The second thing, service networks, auth policy, and service directory. Okay. Now,
what is a service? So, a service, Sathya kind of touched on it earlier, it's just a unit of application. You can define this however you see fit. It's built
up of listeners, rules, and targets. This is very similar to an application load balancer.
I want to point out that it's not an application load balancer, but it's a very familiar construct that a lot of our customers know. Okay, so,
You can think of a listener, it's just kind of what ports is this service listening on? The rules is how you kind of do that request level routing. It's
listening on? The rules is how you kind of do that request level routing. It's
an application layer proxy, you know, path-based routing, header-based routing, method-based routing, things of that nature. And then the target groups are a way for you to group like-minded things. So if you have one of those targets that specifies something like
like-minded things. So if you have one of those targets that specifies something like slash path one, go to this set of EC2 instances, and slash path two, go to this Lambda function. You know, so you can kind of get that kind of mixed environment. The target groups is how you kind of group
those services. And these can be auto-scaling groups, IP addresses, instances,
those services. And these can be auto-scaling groups, IP addresses, instances, Kubernetes pods. So it's kind of a wide range of things that you can do
Kubernetes pods. So it's kind of a wide range of things that you can do there. And we'll go over some of that as well. So the next...
there. And we'll go over some of that as well. So the next...
piece there that is a very key component here. So, service we just kind of went over. Next key component is the service network. And you heard
went over. Next key component is the service network. And you heard Sathya talk about how modern applications look in Amazon today and in the cloud today, I would say. And it doesn't, I don't want to anchor on the microservices word because this doesn't just apply to new things.
People have adopted multi-VPC and multi-architecture strategies. They've got their applications that depend on other things all over the place. It might not even be things that they own. It
could be things that are in different accounts that they depend on or partner solutions or anything like that. But they're all over the place, and it creates this thing where their application boundary no longer maps to their network level boundary. And so we wanted to create a new construct that allowed a customer to create a new thing that they could put their services in, apply common policy, and then share this thing
around between accounts and VPCs to draw that new boundary, right? A new
application layer boundary or kind of thing like that. Now, it doesn't have to be just for an application. It could be for something like shared services. But it's just this kind of new thing that allows you to kind of draw your own thing, which whatever makes sense for your company or your deployment. Again, we'll go into a little bit of this in a second here. Third component
is Auth Policy. This is basically actually a really big feature that I would consider is probably my favorite feature of this. I'm of the admin background.
And this is how you are going to enforce authentication and authorization. So with VPC Lattice, we've actually integrated with AWS. And what I mean by that is, of course, you still get the regular, you know, authentication and authorization of what APIs your developers are allowed to call or what things and what components your
admins are allowed to control, so on and so forth. That regular IAM integration is there. The other piece of it though is that we're actually bringing IAM authentication and
there. The other piece of it though is that we're actually bringing IAM authentication and authorization for your own service-to-service communication. So there are resource policies that you can apply at the service network or at the service level, okay, and we'll go into that in a second, where your resources, are your services
instead of AWS services. Okay, so it gives you IAM authentication, the kind of tried and true system that handles over 400 million requests per second for your own service-to-service authentication authorization, including hands-off secrets management. So if you have instance profiles for EC2 instances, IAM roles for service accounts, task
roles for ECS, Lambda function roles, instead of dealing with the credential distribution, It's hands-off, right? It's part of the IAM kind of integration there. So this
is what it looks like. These are just kind of examples, kind of made up examples. They are resource policies. They're IAM resource policies. Same kind of policy that you
examples. They are resource policies. They're IAM resource policies. Same kind of policy that you would put on a S3 bucket. And there's two places to put them.
You don't have to follow my recommendation here, but this is just my preference as far as a architecture pattern. I would typically, on the network, on the service network, I would apply something that's pretty coarse grained because this is kind of your guardrails.
As an admin, you're coming in, you just kind of don't want to end up on Hacker News, but you also don't want to get in the way of your developers and make it really, really bad. So you put a pretty generic policy on there and you say something like, hey, I want to 100% enforce authentication, but allow all authenticated requests if it's from my org ID. Okay, so now you're
basically, it's pretty strong policy right there, right? Because you're saying like, unless you're authenticated, you can't get through this gate. Like, first you got to authenticate. But as long as you're from my org ID, okay. And then the developer, and again, everybody wears different hats at different organizations, especially depending on the size. This could still be an admin that's applying the policy on the service. But what I'm trying to point here
is that you can also do fine-grained policy. And it's going to be a combination of these policies. You got to get through both of them, right? This one could say something like, only allow specific principles with these additional context keys, like only from my org ID within these AWS accounts, and if the principle
tag equals blah, blah, blah. You know, so you can get very, very fine-grained authorization rules in here. On top of that, it's also the L7 stuff. So you can write only allow get requests with the query string parameter equals 10.
and the header match equals something or other, you name it. You can kind of get into that fine-grained policy for your service side. I want to point out too again, these are just resource policies. So you can apply that really fine-grained one on the service network as well if you wanted to. It's just, here, here's the thing.
The more complex you make your policy, the harder it is to understand, the more likely you are to get it wrong. And so the idea is that you can use these constructs to scope your network down, scope your policy down, make it really easy to understand so that you actually know what's happening.
Service Directory. This is an account level view of all of the services that you created or were shared with you through Resource Access Manager.
And so I mentioned this probably a couple times and I'm probably saying it over and over and over again, but we integrated with Resource Access Manager. And so you can share two different types of resources. You can share service networks, or you can share services and or. So it gives you a lot of kind of flexibility, depending on who you're dealing with. For example, if I'm dealing with maybe an account that
I don't necessarily trust, or maybe it's a a partner solution, not that you don't trust your partners, but just maybe you don't want to share your whole service network with them. Maybe you would rather have them share their services and then you attach
with them. Maybe you would rather have them share their services and then you attach it to your own service network. So we'll talk about some of these design principles in a second here. So who does what? We spent a lot of time here talking about personas, admins, developers. There's a whole
slew of things in between. But typically, if we're real, we can kind of bucket them into these categories. You've got the admin star is what we always joke about, but it's your Cloud Admin, your Network Admin, your Security Admin, your Cluster Admin. It's
kind of the person that's kind of trying to make the developer's life simple so that they can move really fast. Doesn't always happen that way. And then you've got the developer, and this, honestly, I like to call this a service owner a lot of the time, because a lot of the time, it might not be the developer.
if the developer might be dealing with GitHub and some sort of CICD pipeline and some DevOps person that's really kind of being that service owner that's setting up their blue-green deployment rules and all that kind of stuff. And so whatever that person is in your organization, that's what I'm going to be calling the service owner, okay, or the developer. Okay. So here's the flow.
the developer. Okay. So here's the flow.
You've got the admin who's going to probably In my situation, I'm going to describe it to you how I really like it. They're going to be the one that creates the service network. They're going to have their own centralized account, similar to how a lot of Transit Gateway architectures work today. I'm going to have my own centralized account where it's admin only, and I'm going to create the service network. As soon
as I create it, I'm going to choose if I want to enable authentication and authorization and define any of that kind of coarse-grained guardrails that I want. Stuff like
only allow authenticated requests from my org ID. I might also say there's a couple of VPCs that I'm okay, they don't even need to authenticate. They're okay, maybe because they're just not ready to yet or something like that. So you can define that.
You can also, and this is kind of a pretty cool feature, is you can set up logs and set a destination to CloudWatch Logs, S3, Kinesis Data Firehose, and what you're actually setting up is logs for every single service that you put in this service network so that you don't have to track down the service owner and try to be like, hey, can you please give
me your logs? I'm getting audited tomorrow, you know, something like that. In one single API call, you can turn on logs for every single service that you get there and put it to the bucket or CloudWatch logs destination or Kinesis Data Firehose to send it to maybe a partner or something like that of your choice. So that's
what you do when you create the service network. You find access and monitoring. The
next step is that you associate VPCs and accounts. Okay, so this is where, you know, the Service Owner created the service. We're going to go over that in a second. But how you introduce connectivity is you actually take that service network and you
second. But how you introduce connectivity is you actually take that service network and you associate it to VPCs. Okay, and I'm going to show you what that looks like in just a second here. That's typically an admin job. If we're talking about multi-account, If you have multiple accounts in this situation, you're going to use Resource Access Manager to do this. The cool thing about integrating with Resource Access Manager is that it
allows you some pretty cool things if you're using an org. Of course, you don't have to use an org in this situation. You can share it directly to individual accounts. But with Resource Access Manager, you can share it up to your org and
accounts. But with Resource Access Manager, you can share it up to your org and manage that within your OU structure and all the things that orgs already provides. So
it's a lot easier than kind of getting an Excel spreadsheet with every single account that wants to see this, especially if you're a very large company with lots of VPCs and accounts. The Service Owner, or the Developer, they're going to be the one that creates the service. They also get the choice to enable authentication and authorization if they want to. If they're doing their own
thing over top, they can set the authentication and authorization to none. They don't have to use it. But you can also enforce that they can't even create a service unless they set this property. So there's a lot of really cool things to do there. Sorry, it looks like some of the wording got a little messed up there.
there. Sorry, it looks like some of the wording got a little messed up there.
I'm not trying to play tricks on you. The next thing they can do is define traffic management policies. And what this is is your request-level routing stuff. So if it's just a simple service, you just put a default rule,
routing stuff. So if it's just a simple service, you just put a default rule, and you just say forward traffic to the Lambda function. Maybe it's just a service that just consists of a Lambda function. That truly would be, in my mind, like a microservice. But if you have a kind of more complex or advanced scenario where
a microservice. But if you have a kind of more complex or advanced scenario where maybe you are transitioning from EC2 and you've identified a workload that really makes sense to go on Lambda, you could put a rule that says like, hey, send 90% of the traffic to EC2 and 10% of the traffic to Lambda. Try it out, see how it looks, and then kind of fool around with it there. And that's
how you do that with traffic management policy. The last thing that the service owner does, and this can be the developer that does it, admin that does it, but in this situation, what I'm showing is the service owner can actually associate their service to service networks. And you can do many. It doesn't have to be
one. It's not a one-to-one relationship. If there's a lot of different unique requirements that
one. It's not a one-to-one relationship. If there's a lot of different unique requirements that you need to share this around on, and your admin wants to divide out their service networks for shared services or PCI workloads or something like that, they can attach it to as many as they want. So kind of something for everyone. That's kind
of the idea here. Keeping the admins in control while providing a simple but flexible onboarding experience for the service owner. That's really the kind of idea that we're having here. So we can try to avoid those 2 a.m. finger-pointing
here. So we can try to avoid those 2 a.m. finger-pointing
sessions where it always ends up being the network problem. Okay,
so again, but how? So here's a typical walkthrough. Maybe it's a service that was developed on Lambda. Our integration with Lambda
walkthrough. Maybe it's a service that was developed on Lambda. Our integration with Lambda is basically, you know, you create a new trigger for the Lambda function. It can
be an existing Lambda function. That trigger type would be VPC Lattice, where you can create the service, give it a name, and get going. You create the service. At
that point, it will show up in that account's local service directory. At that point, If your admin has already shared a service network with you, or you have created one yourself, you can associate that service that you just created with the service network, and then associate the service network to the VPCs. That right there is the action that, yeah, this is the picture slide because I don't have any more transitions
after this one. That's the action that basically turns on the network connectivity, the service discovery, where you can get going.
Everything in that VPC, don't let this scare you, I have another slide to talk about this, everything in that VPC now can discover and connect to the services in that service network. As long as Access Policy and a couple other properties we're gonna talk about in a second allow it, okay?
Each VPC can associate with one service network, okay? and each service can associate with many service networks. So it gives you a little bit of flexibility there. If you have a really unique situation for a certain VPC, no big deal.
there. If you have a really unique situation for a certain VPC, no big deal.
This is why services can associate to many service networks. If you wanted to create a one-to-one mapping of service networks to VPCs, you could totally do that. So it
gives you an ability to map it however you need to. Security controls.
All right, let's talk about them. This is my favorite topic.
So you got network controls, okay? We're not getting rid of those. They're both important, defense and depth. So this is basically the ability to apply a security group on the VPC association, and you can do security group referencing inside the VPC. And this is how you
do fine-grained network level controls inside the VPC.
So in my previous slide, how I said when you turn on that service network for that VPC, everything in that VPC gets access. Obviously, that's not ideal in some situations. You don't want your Bastion server being able to have access to it maybe, or your external phasing services. So this is how you do it.
It's kind of neat where it's, even though it's a network level control, it's kind of like a new age network level control because it's kind of like tagging, right?
I can put a security group on all the things, all the resources in the VPC that I want to provide access to, and then go to my service network association and say, everything with this security group, allow it in, right? So that's just a very easy way. And I think the customers that we've
right? So that's just a very easy way. And I think the customers that we've been talking to over the last couple months and years, they really like this idea of using security groups in this way because it gets them out of having to deal with IP addresses. But at the same time, they might not be ready to adopt an authentication strategy. Maybe they're just not able to kind of touch anything. Maybe
they can't even put a sidecar or code change or anything, right? This gives them that ability to kind of have that network level control, get out of IP addresses, and make it work there. So that's the network level controls.
The second is, um, the application level controls. Again, we
want both. We want network level controls, we want strong authentication, reliable authentication, and context-specific authorization. This is what we get with the auth policies.
As we talked about before, these are IAM resource policies that you can apply at the service network or at individual services. It gives you that really good flexibility.
You have to get through the security groups if you did that. and then you have to get through the service network if you did that, and then you got to get through the service policy. Okay.
So, to summarize, VPC Lattice gives you both network and application layer controls, defense and depth, and the really cool part is that it is, with this model, consistent across instances, containers, and serverless. You're not totally revamping the world because you can't do MTLS with Lambda or something like that. It gives you that kind of way to get
that kind of control. You can understand your security posture, so on and so forth.
The other part that a lot of people don't talk about is scoping. We've talked
about this idea of being able to reduce your impact of scope or to reduce the amount of things you can connect to. Make tinier and tinier holes between your services. What this is allowing you to do is say, you only have access to
services. What this is allowing you to do is say, you only have access to the things that are in the service network, period, end of story. Cut off the VPC if you want. Provide access through this. And now you can enforce controls for every single thing that's going in and out of there very easily in a centralized
way. Okay. Use cases. What are some of them?
way. Okay. Use cases. What are some of them?
Most of these use cases are just kind of The things that we've been talking to, I've been at Amazon for six and a half years now. It's really interesting talking to big customers, small customers. Some of the problems are the same. Other problems
are just a fact of how big somebody's gotten and it's kind of gotten unwieldy.
So the problems of acquisitions, things like that. But there's always the problem and everybody seemed to reassure this when I looked at people laughing when we were talking about overlapping IP space. These are things that just kind of always happen. And so when we were designing VPC Lattice, we took all these things into account and said, OK, how would we be designing a new service if we'd started from scratch, right? If
we really kind of thought about this and designed it for the workloads that are made in the cloud or are operating in the cloud, right? How do we do this? What would this look like? OK, so the one that we come across all
this? What would this look like? OK, so the one that we come across all the time is multi-cluster Kubernetes. This doesn't have to be across multiple VPCs and accounts, but oftentimes it is. A very typical pattern for customers is to give their teams a Kubernetes cluster and also give them an account and also give
them a VPC. And as Satya was talking about earlier, when you're in cluster and it's just one cluster, it's really one of the easiest platforms to use. You just
kind of get up and going really quickly. You get the basics out of the box. All your service discovery is done. It's all good to go. It's as soon
box. All your service discovery is done. It's all good to go. It's as soon as you start to need that multi-cluster stuff, and especially if it's a cross-account in VPC, where things start to break down, it gets really hard. And the admins have no visibility, the developers don't understand it, it's kind of just like this really hard world. And so this was like one of the primary things we wanted to do.
world. And so this was like one of the primary things we wanted to do.
And so for Kubernetes, we spent a lot of time trying to figure out how to provide kind of that right user experience for cluster admins and developers, you know, for all the things that they like, you know, the abstraction from the underlying environment, and that consistent API so that they can be portable no matter where they're running.
We wanted to make sure that we weren't introducing this AWS-only thing that isn't portable.
We wanted them to be able to use their Kubernetes native, consistent, and abstracted API, and not have to make them kind of jump through hoops to use this service.
So, it was a very, very important thing for us to do, and we worked really, really well with the EKS team to try to figure this out. And so
for this reason, we chose to integrate with the Kubernetes native Gateway API. And if
you're not familiar, this provides kind of the same level of abstraction for Kubernetes users that is both flexible and portable. And we've built this with an AWS Gateway API controller for VPC Lattice. And this allows us to configure the Kubernetes services and define kind of more advanced traffic management rules
through the Gateway API, not some new custom resource definition that we're creating or anything like that. This is with the Kubernetes native Gateway API.
like that. This is with the Kubernetes native Gateway API.
What happens is behind the scenes, Lattice will actually go and configure your service network, register your services as pods spin up and down based on your deployment configurations, VPC Lattice controller will actually go and update all of this for you so that your users are really kind of
interacting with that gateway API and your Kubernetes clusters and not necessarily having to bounce up to the VPC Lattice APIs.
This is a fairly new API. For those of you that are not familiar with it, something to think about is that it's kind of like an evolution to the Ingress API, how you expose your services today. Ingress worked really well for several years.
It just did not necessarily support a lot of the kind of more advanced use cases that people liked, and it wasn't very flexible. And so the gateway was kind of that evolution to kind of allow customers to get that more advanced capability without having to do a whole bunch of weird annotations and custom resource definitions, so on and so forth. So with this, the
Kubernetes users can kind of pick and choose all the services that they want, in that cluster to register. They'll register it with the service network, and behind the scenes, we'll set up all the load balancing, all the network connectivity, so on and so forth to get these up and running. They will look very much like they're
right next to each other. Okay.
So, one of the big things with this and why this is important is because a lot of the time, one of the use cases we come across is customers saying, it's actually really painful to upgrade my clusters. I love everything I'm doing, but when I try to upgrade a cluster, it's like the world is ending. You
know, it's a very long maintenance window. And a lot of the time, they do like this idea of giving a kind of a sandbox to be able to light up a new cluster, get the service up and running, test that and make sure it's okay, and then fail over to it. The only challenge with that is, how do you actually do that without changing the client configurations and doing all kinds of
weird magic with DNS and doing new ingress rules and all this kind of stuff?
VPC Lattice is a really good fit for this, right? Because it sits outside of the cluster. It sits in the VPC. And so what you're able to actually do
the cluster. It sits in the VPC. And so what you're able to actually do is configure a service that has a target group in one VPC, in cluster one, and you can configure another target group that's in the cluster in a different VPC.
VPC Lattice will fully handle all the other things. And so those clients that are on the left-hand side will be able to automatically discover and connect. They won't even see the hit. You could weight the targets so that you're sending 10% of it over, verify that it's up and running, it's okay. And then once it's good, you can put it to 100%, decommission the other cluster, delete the account, so on and
so forth. So it kind of gives you a very nice way to transition there.
so forth. So it kind of gives you a very nice way to transition there.
You can also, if you've identified that, hey, I've already gone through the effort to modernize this workload and it's moved to containers, and now I found out that it's a really good fit for Lambda, it's a serverless workload, you can do the same exact thing that we were talking about there. You can mix and match compute even behind the same service. And so it really kind of gives you that flexibility in
a consistent way. All right, another use case that I'd like to talk about and highlight, and this will be the last one that I'll talk about for right now, is something we call tiny bubbles. And I've
seen a bunch of people that I've seen before in random calls before, but you're probably familiar with me saying tiny bubbles. Customers like the idea of VPCs and accounts because it gives you that kind of easier policy. When you have something that's really big, your scope of impact is really big, right? And you've got to get your policy right. But if you make these things really, really small, And allow star,
policy right. But if you make these things really, really small, And allow star, even though that sounds terrifying, isn't as terrifying when it's so small. If you get your scoping right, it's a little bit better. I'm not telling you to do allow star. That's definitely not my recommendation. But I'm saying you can think of it that
star. That's definitely not my recommendation. But I'm saying you can think of it that way, where it's easier to understand a smaller scope of impact if you make these things tiny bubbles. Okay, so what does this look like with VPC Lattice?
So in this model, you give your service teams their own VPCs and accounts. At
this point, nothing gets in, nothing gets out. There's no connectivity, there's no Internet gateways, there's no nothing. It's their own little bubble, and they can leverage it to get their services up and running. When they're ready, they can pick and choose the service resources that they want to share. It might not be everything in the VPC. Maybe it's just a couple of things. Maybe it's just the front end of
VPC. Maybe it's just a couple of things. Maybe it's just the front end of their application. Who knows? Maybe it's multiple services. They pick and choose, and they can
their application. Who knows? Maybe it's multiple services. They pick and choose, and they can actually register it with VPC Lattice. As you can remember, you can share these services and service networks with Resource Access Manager. So in this case, we're going to take all those services that were shared in those different accounts,
and we're going to share them with the centralized admin account. Now,
the admin, they've already done the work to create two different service networks. Again, this
could be a whole bunch of service networks. But it's whatever kind of fits in their organization. Maybe they've got one generic one that's like a shared services type of
their organization. Maybe they've got one generic one that's like a shared services type of service network. Then they've got another one that's PCI stuff. And this is the one
service network. Then they've got another one that's PCI stuff. And this is the one that's got some really strong infrastructure policies on there, or auth policies, that are really kind of fine-graining what has access to it. And only certain VPCs are going to get this one. But now that all the services have been shared, they can divvy these up and put the services into whichever service networks they want there. Because maybe
there's a service that literally everybody needs, so it will go in the PCI one, and it will also go in the shared services one. So you can kind of pick and choose how you want to do that there. And then the admin, if it's a cross-account, use Resource Access Manager, share the service network this time with all those other accounts, and associate them with the VPCs. And now you have this kind
of really flexible connectivity pattern with very strong authentication and authorization turned on where you need it. So, and that's it, you know, people are connecting. Literally every single request that comes in and out has to go through this
connecting. Literally every single request that comes in and out has to go through this thing, and you've got a central place to put that policy on. It's a really kind of nice way to share things. And it's completely abstracted what compute platform is behind the scenes. If somebody wants to update it later on, cool, go for it, you know, have fun. The developer should be able to do that. Okay.
So, here's kind of the bullets of this kind of architecture. I talk about this one. This one can be an extreme one. Now, I'm not necessarily recommending go and
one. This one can be an extreme one. Now, I'm not necessarily recommending go and do something for every one of your microservices. There's something in between there, right? You
find out what makes sense for you, where your dependency matrix looks, and you kind of carve it down into something that's reasonable so that you can reduce that scope of impact. Make your policy super simple. The harder you make it, the harder it
of impact. Make your policy super simple. The harder you make it, the harder it is to get right. So, it's a central place, enforce authentication authorization, simplified auth policy with proper network scoping, and VPCs with unique access requirements can use their own service networks. So, you can kind of divvy it up however you want to see it.
networks. So, you can kind of divvy it up however you want to see it.
Okay. So, next steps. What do I do? You like what you heard.
Maybe you did, maybe you didn't. VPC Lattice is in preview.
To summarize, it's basically a way to help connect, secure, and observe your service-to-service communication.
It's not just for cross-VPC and cross-account connectivity. It can also be for a single account. But one of the main use cases would pretty much be that easier network
account. But one of the main use cases would pretty much be that easier network connectivity, since it's network connectivity and application layer proxy. And you can do it in a consistent way across instances, containers, and serverless infrastructure. So,
If you are interested in kind of trying it out, it is in preview. If
you want to sign up, just go to our main page in the VPC product detail page. It's aws.amazon.com slash VPC slash Lattice, and it'll take you to the signup form right there. It'll also have some facts and some extra product details and stuff like that for you to check out. So I want to thank everybody for joining us today. Thank you for being
out. So I want to thank everybody for joining us today. Thank you for being at reInvent, and I'm really looking forward to the conversations and to take a look at some of the architectures that you want to play around with. I want to help you with it. So, thank you. Oh, and one more thing.
We'll both be outside, because I know I didn't leave any time for questions.
Loading video analysis...