AWS re:Invent 2025 - A modern approach to application migration with Amazon VPC Lattice (NET309)

By AWS Events

Summary

Topics Covered

Replace VPN Firewalls with Lattice Policies
Lattice Solves Overlapping IP Problems
Lattice Enables IPv6 Without NAT Gateways
Adopt Lattice Incrementally Before Big Moves
Lattice Fixes Shared VPC Scaling Issues

Full Transcript

- Hello everybody and welcome to a Modern Approach to Application Migration with VPC Lattice.

I'm joined here today by my good friends.

By the way, I am Jamie and I'm joined here today by my good friends Yecine and Ryan.

And they'll be going and talking to us a little bit more about using Lattice and how you can use it to not only upgrade and modernize your application, but also to upgrade and modernize your infrastructure as well.

So let's start out with our agenda.

So the first thing we're gonna do is we're gonna basically talk about what we're starting with.

And the easiest way to show you how to use or to modernize with Lattice is to kinda show you in a practical application with an actual architecture.

Funny enough, when Yecine and Ryan and I were building this architecture, we couldn't stop building it, well architected, 'cause it's beaten into our heads so many times.

So we had to add some things in there to say, "Okay, here are some things that you can improve."

So then after that, we're gonna talk about what needs to change and why it needs to change.

And then from there, we're gonna work in how Lattice helps like, you know, what is it, how does it help, and some fundamentals.

I do understand that this is a 300-level session, but we'll be getting into some fundamentals for some folks that have not used Lattice.

How many of you have used Lattice?

Okay, good. So like a good half spread.

So for some of the others we can kind of catch up on real quick on some of the fundamentals and describe why we're choosing Lattice to do some of the things that we're doing.

And then also we're gonna be talking about how we're gonna be modernizing with Lattice, like obviously.

And then the last bit is Ryan's story from Goldman Sachs and how he actually went through this and his practical application of modernizing with Lattice.

So let's talk about our current landscape.

When we're talking about anything in business, right, especially when it comes down to IT, everything comes down to a business requirement.

Those business requirements are always started with some need of the business, right?

Business could be growing or it could be a change you might be bringing on a new line for manufacturing or a new product that you wanna release.

The other thing too is that we hope that all these businesses grow and as they grow, so do your requirements, right?

And then the other thing too is all this stuff has to happen within a certain timeframe.

So like we need to pick the best tools that help us get what our business needs as quickly as possible.

So what that generally means is, is that we tend to adopt more and more things, right?

So obviously as we grow, the number of requirements, as I mentioned, none of the requirements ever grow, you know, one after the other sequentially, it's always coming at you all at once.

And then lastly, what we all have done, everyone here has ever built an architecture or built an application, knows that things build on top of themselves.

It's not like you scrape things away and they get to build new all the time.

So when we're putting together this presentation, knowing these things, we are like, "How do we address these with Lattice?"

So let's look at our current landscape and what we're gonna be talking about.

So here I have an architecture that we had made where we have a provider VPC that's connecting via VPN 'cause they need bi-directional communication over the internet to our front door, right?

Pretty much to where things come in, right?

We have our backend, which is in VPC 2, that's where we wanna modernize from a monolithic application to containers.

We have bi-directional private link that's going to one of our acquisitions.

How many of you use bi-directional private link?

Have you heard of that?

Yeah, you've got a couple.

It's real fun to maintain, right?

Because there's a lot of moving parts and I mean fun sarcastically.

We also have our IPv6 offering.

A lot of folks who generally deal with healthcare in the US or government, this is a requirement.

And then we have our hybrid solution, right?

We've got a DX going from our acquisition to a database we wanna protect.

And then we have our firewall.

So going through this, we're gonna kind of list out all of the things we want to do.

Our partners, they love the bi-directional, but they hate that every time we do an ad move or change we have to, you know, give them the keys, change IP addresses, all that.

So we wanna change that.

The next thing we wanna do is we wanna have a better connection strategy for our acquisition.

The bi-directional private link works for going to specific services, but as the companies merge more and more, they need to add more services.

And that's not quite easy to do with a bi-directional because you'll have to add more listeners, you have a max of 50 on NLB, it can be quite cumbersome for any, again, (indistinct) changes.

And then next we wanna grow our IPv6 offering.

We have to add more things to that offering.

But as you saw in our previous architecture, we're using private NAT and a bunch of things to go from our IPv4 environment to our six environment.

And again, if we have to grow that, that connectivity becomes a bottleneck and becomes a problem.

And then next we wanna make sure that the mainframe on-prem only talks to our VPC 2.

We can of course do that infrastructure wise with security groups and whatnot, but we also want to know if there's a better way and as you know, if you're doing things infrastructurally, all those hops and all those pieces need to change.

It'd be nice if we can change it in one spot.

We also want to use our containers.

This is the big meet, right?

Where we wanna upgrade those monolithic EC2 instances and VPC 2 to containers.

And then of course with our acquisition, it needs on-prem access to our database.

There's gotta be a better way than having a separate Direct Connect.

Yecine's gonna take us through VPC Lattice and how it helps us solve these problems. - Thank you, Jamie.

So before we start asking, answering why Lattice will help, let's do a small refresher on what Lattice is.

So here we see on the screen, which is VPC Lattice, we call it an application networking service.

And it connects, monitors and secure communication between services and resources.

And you see here on the screen, you've got a variety of compute type it supports from the traditional EC tools to the containers and even serverless.

It also supports compute outside of AWS in a hybrid scenario.

It also supports databases and we can do those communications over multiple protocols.

And the last but very important is it also allows you to enforce your security requirements while keeping the monitoring and observability.

So now let's take a look at the different building blocks that Lattice offers.

The first thing, the core of the service is the services.

So here you've got your application with any type of compute it supports and you will put that application into one or multiple target groups.

Once you've got that, you have this logical group, what is what we call a service.

And with that service it's basically allow you to expose your computer as an endpoint and then you'll be able to apply different routing rules, load-balancing options, and auth policies.

So that's the first building block.

The second one is the resources.

So you might create one or many services, which I just described.

And you've got the second type, which is the application resource.

And here basically it's where you put all your TCP-enabled destination.

It could be an Amazon RDS, it could be a DNS name or even an IP address.

And to configure that application resource, you will attach what we call a resource configuration.

Next is the accounts.

So with VPC Lattice, you might have your service and resources in one or many accounts.

And here you see that we've got a bunch of services and resources in account C.

And you might have those services being consumed by either the same accounts or different accounts.

And all of this cross account communication is supported and they will talk through what we call the VPC Lattice Service Network.

So as I said, the concept of providers and consumers.

And I'd like to first clarify what we mean by that.

So here on the screen you see there are four services and what we call a provider is a service or a resource that you provide to the service network by either associating the service to the service network or to create a resource gateway.

So those are what we call providers.

On the other hand, for the consumers, they'll be the VPCs and the service endpoints that are associated to the service network and they will be able to consume the services that we have exposed.

Another point I'd like to touch and clarify, because me and Jamie get that questions a lot is how does VPC Lattice compare to Transit Gateway?

And while they're very different services, they can both live very happily together.

But because we get that questions a lot, I think that it's good to clarify the differences that we can have with those services.

So I'm sure most of you know what Transit Gateway is by now, but when we talk about Transit Gateway, we really talk about what we call a core networking service.

And it's the service that allows to connect all your VPCs and hybrid and create that central hub of networking.

But we mostly staying at layer three and four.

On the other hand, with VPC Lattice, we are more on the layer seven.

That's why we call it an application networking, even though we can support any TCP-based destination, but we have more a managed service that simplifies that communication between those services and resources.

And you have that extra security, networking and observability built in.

That's why we call that the application networking service.

So that's the main difference between the two.

Another difference is also with a pricing comparison.

So here I wanted to show you if you fully remove a transit gateway by VPC Lattice, what the cost would be.

So we start with the Transit Gateway model and for every application, you will have one load balancer.

So if we have four load balancers for a certain amount of traffic, it comes down to roughly $1,300 per month.

If you want to replace that architecture by VPC Lattice, then you will need to create one service network and four services and then that can come down to $750 per month.

So that's to give you a comparison point between the two services.

And I'll let Jamie talk about our migration strategy.

- Okay, thanks, Yecine.

So now let's get back to that architecture and actually start doing stuff now that we've got some of the building blocks that Yecine has told us about.

So bringing back the architecture just again to remind you of what we're gonna talk about.

And now let's get into it.

So the first one was partners were complaining about the VPN, right?

What can Lattice do to help us with the VPN?

Now we know that Lattice is multi-account, right?

So it's a fit there.

We also know that Lattice is bi-directional, so there's a fit there too.

And it also has inherent security.

So where do we start?

We gotta create our Lattice network first, right?

So let's just kind of take a look at that.

So one key piece for sharing Lattice across accounts is RAM, right?

So our resource access manager will allow us to share our service network.

So the first thing we gotta do is create it.

We create our service network and then we associate our VPCs to the service network.

And that's gonna be in account one.

The second account is gonna share this service network and account B and C will go ahead and share their services and we can then connect them together and Lattice will then watch based on policy, which we'll get into in the presentation, and as well kind of watch that traffic as it goes back and forth.

And then lastly, we have to create the services, right?

And that's where you create and share.

So you create the services.

On our side, we're gonna associate the service network.

We've already shared our service network out on their side, they're gonna create their services and join it to the service network and share it to us.

So if we go back to our architecture, right, I'm gonna go ahead and just concentrate on the front door and I'm gonna create my Lattice service network and I'm gonna go ahead and move that in.

Now the thing that's important to note here is I didn't remove the VPN right away.

I don't need to. I can do both.

And we can get people going and customers on board and partners on board without ripping and replacing what they're used to.

There's no shock to the system.

And you also notice that the partner right now has a firewall and that firewall is to protect the traffic going back and forth.

But because we have policies, those firewalls aren't needed.

So we've got a couple of places that we're gonna put policies.

And then we generated some generic policies here.

So you know, don't copy and paste them.

These are just examples.

But what we're doing here is we're saying, "Okay, we're gonna accept traffic, right, from our front door.

And our front door has to have a token that says 'front door' before we're gonna allow that traffic to go through.

We're only gonna allow them to do a get and we're only gonna allow them to do a get on three different paths because we wanna restrict it down."

We're a partner, we want to be secure.

The next place that we would put a policy is on the service network.

Now normally, we always say service network, you wanna make it as coarse-grained as possible, right?

This is not the place you wanna fiddle with as little as you can because the blast radius is wider.

But in this case, because we're sharing our service network out to a partner, we're gonna require a couple of tokens before they can talk on the service network.

And if we add more services, which we will, you just add more tokens, right?

So you can make it as simple or as complex as you need.

It really comes down to your business use case.

And then lastly of course our front door, we're gonna do the exact reverse that we did for our partner.

So we're gonna say, "Okay, here's the certain URLs you can do.

You can only do a get.

And if you're talking to me, you have to have the Partner1 token."

So we can lock all that stuff down.

Once that's all up and set and ready to go, things can go ahead and start talking, right?

So we'll go ahead and add this to our architecture.

So now we're starting to slowly clean things up and we'll move on to the next.

So the next piece we wanna talk about is policy hierarchy.

I just mentioned about coarse grain and fine grain.

Let's just kind of put this all in perspective in one spot, right?

So the service network, that's where you want coarse grain policies, right?

That's where you want to be able to change the major things.

Some of the examples you might see in our public documentation would be you have to be a part of my organization to talk, right?

For us what we're doing is we're saying you have to have specific tokens to talk because we're sharing that out.

And then the services is where you get more fine grained as you saw, right?

Only allowing a get only going to go to these specific services.

And that's it.

And then of course we've got the resource configuration.

I'm gonna go back one for a second.

Resource configurations, they are a little bit different, right?

So when you add a resource to a service network, that resource, everything has access to it.

So you're gonna rely on your services to go ahead and tighten down who can and cannot talk as well as your service network.

Okay. Now moving on.

Accidental button click.

So let's talk about acquisition connectivity.

And I hand it back to Yecine.

- Thank you, Jamie.

So let's go back to our scenario with our acquisitions and you know, our leadership ask us to simplify our strategy when we acquire a company.

How many in this room, you know how you worked and you had to consume services or export services to a VPC that had the same IP range?

Like how many of you had this?

Yeah, the famous overlapping IP problem.

And there's nothing wrong with dealing with overlapping IP.

The only thing here is that because we want to consume sources and expose our own, we create what we call a bi-directional private link.

And again, there's nothing wrong with using that architecture, it's just you can see at scale how the operation overheads might get really hard to manage.

So let's see how we can use VPC Lattice to help with this.

And you'll see that VPC Lattice and there's overlapping perfectly fine and I'm gonna walk you through how exactly the service works so you understand how does it deal with that.

So when you have your service network, once you attach your VPC to that service network, it gets a VPC Lattice link local ENI and that ENI gets an IP from the 169.254 address range, which is the link local address range.

Once this is done, the route table get also an entry with that IP address pointing to Lattice.

And then the final step is you create a service, you expose it and it will use a DNS name.

Here you see on the screen, it's the default generated name.

You can also use a custom DNS name.

So what will happen is once you do that DNS resolution to the service, it will point to the 169 IP address.

It won't point out to the IP in the other VPC and that's how you handle the overlapping IP.

Basically VPC Lattice makes that problem a non-issue.

So we gonna apply this and update our architecture.

So here we've got our VPC 2 and the acquisition and we already have the service network like Jamie created.

So what we're gonna do the same way we've done on the first step, we're gonna create our acquisition service and our backend service.

Once this is done, we associate that to the Lattice Service Network.

And here again, we leave all the other components in place, we can do our testing, everything, make sure that it works perfectly fine.

And once we are happy with the results, we can remove all the components and simplify the architecture.

And the same way we've used the policy to replace the firewalling before, we're gonna apply the same concept here.

So from the backend here, it's the same policy type.

We're gonna allow the acquisition to talk to us with the get request on specific path and we'll use the acquisition token.

And to make this work, we'll also need to update our service network policy to allow the token to talk 'cause remember you need to both the service network and the service policy to allow the communication to work.

So we've done that on the backends.

We can do the same on the acquisition service, same thing, same story.

Allowing the back end to talk to us using a token on a various path.

Another question is, "I've got a policy that talks from the backend to my acquisition.

Can the backend still talk to my front door?"

And the answer is yes.

All you need to do is you need to update or edit your policy and add the new section that will handle that communication.

And that's a very powerful feature because now your team, when they build the service, they can edit their policy and the security requirement can grow organically.

They all have to edit the old policy to add the new requirements and that will work.

So now we've got our backend talking to our acquisition through service Lattice Service Network.

We also got our front door service talking to our backends.

Everything works perfectly fine.

So let's add this piece to our architecture.

So here we have dealt with the partner service, we've done also the front door and our backend and acquisition.

And there is another piece that now we will tackle is the V6 workloads.

So that's the new requirements.

Now we need, as Jamie said, when you deal with healthcare or government customers, they have this requirement and you need to talk to them over IPv6.

And the way we do it now is we using that private NAT gateway to do that translation between V6 and V4.

And we want to simplify this.

So the same way as I said, Lattice handle overlapping IP, it also handles the IPv4 to IPv6 communication.

So it becomes also a non problem.

And here we've gonna do the same way we've done before.

We create our V6 service, we add it to our service network.

And once we happy with everything, we can remove that private NAT gateway because this is no longer needed.

And then we have a working V6 workloads.

I'm not gonna show you the policy here, but it's the same concept that we've shown you until now.

We adjust the service network policy and the service to make everything work.

So now that it's done, we've got our V6 talking to our backends, everything is fine.

Let's add this last piece to our architecture.

So now we've modernized the partner service, the VPC, the front door, backends, the acquisition and the V6 workloads.

So let's see what's come next.

- So I promised that we would talk about hybrid as well as modernizing our application.

Our big piece in the middle, right, the monolithic to EKS.

Let's tackle the hybrid.

And you'll notice that as Yecine and I are going through this whole thing, what we're doing is we're kind of picking a lot of low-hanging fruit, right?

We're getting Lattice in the front door, we're getting it on our network.

And we're starting to use it and see the capabilities.

And I strongly recommend that's the best approach to start adopting Lattice is look for these low-hanging fruit things that we're doing and then do that before you do the big move.

'Cause as you'll see, that is a lot easier afterwards.

So let's talk about our hybrid.

So we wanna make sure that our mainframe only talks to our backend services and we want to give it a path using VPC Lattice.

We like the idea that Lattice looks over our traffic.

That service network switches the tokens every 15 minutes, right?

Everything that we know for sure using IAM is secure because that's how we even log into our AWS.

We wanna take advantage of that and not have to change all the little pieces.

So as you can see here, I've got my Direct Connect, I've got my Transit Gateway and we already have our existing Lattice network.

Now the thing to note is that at no time is the Transit Gateway or the Direct Connect going to disappear.

Lattice suddenly did not get the ability to go right to your on-prem.

So we still need it.

And this goes back to that point that Yecine was mentioning where Lattice and Transit Gateway work well together.

And this is one of those examples.

So one of our options to connect our database in, you'll remember that I was talking about service resources or Yecine actually was talking about the resources.

So we're gonna use one of those pieces for controlling resources called service network endpoints.

So we put in a service network endpoint, which just like private link, it grabs a local IP address, right?

It's actually gonna grab a range of IP addresses, but it grabs an IP address that's local to that subnet.

And then those EC2 instances, all they have to do is talk to that particular service network endpoint to gain access to those services.

So if we're gonna apply this concept of using the service network endpoints with our mainframe, we're gonna go ahead and adopt that.

Now remember that goes ahead and gets on that network, we're gonna be well architected and we're gonna put a service network endpoint in both of our subnets.

But as you can see, the initiation of the traffic is gonna come from our mainframe.

It's gonna go through Transit Gateway.

Transit Gateway has its connections already into our VPC and it has its endpoints and it's gonna flow through the service network endpoint.

Now, I would've made it flow through both service network endpoints, but then this slide would be a mess.

So we're just showing it through one, but in fact it's going through both.

And now that we've adjusted our policies and we've got our communication all set up and we're allowing our communications from our policy from our backend to say, "Okay, we're gonna allow from this IP address, from our mainframe."

We're gonna add that to our architecture.

And again, at no time does our Transit Gateway go away, it stays, Direct Connect stays.

But now we have a path and we know that our mainframe is going to and through our backend service.

Now granted, there will still be a couple of pieces because this is going over Direct Connect and Transit Gateway where we need to put in some security, security groups and things of the like, right?

But it's a lot less now that we have to change and if we wanna add more of our services in Lattice to go ahead and have that mainframe talk to it, we just need to go to those individual services, edit their policies and say the mainframe can now talk and it'll be able to talk immediately.

So the last piece we have, not the last, but the second last piece that we have is the modernization of our application.

Now we've already adopted Lattice, I've already told you guys that we're going to, you know, do all the low-hanging fruit and make this part as easy as possible.

So let's see what that looks like.

So I have my VPC, right?

And I'm gonna go ahead inside my VPC, I'm gonna create another service.

Now this doesn't have to be in the same VPC as Yecine told you, this works well with cross IP and I've seen some instances like say for instance if you have to upgrade an EKS cluster, like how we force you to upgrade every six months, some folks, right, have IPS baked into that and it's very difficult for them to go ahead and put it side by side.

So if you don't have like an extra database in this, we threw this little wrinkle in here just so we can talk about it.

Just create another VPC, right?

Whole new VPC, create a new cluster if you wanna do your cluster upgrade and then weight it and say I wanna send 40% of my traffic to one, I wanna send 60% traffic to my new.

And then when we're happy, again, both of them are being used at the same time, at no time are we ripping the bandaid off and you know, basically giving our customers a negative experience, we can just go ahead and consolidate and remove the older one.

So granted, I do appreciate that going from, you know, a monolithic EC2 to microservices and clusters.

There's a lot of application work there, but a lot of times there's a lot of burden on us networking folks to also help that along.

But again, because I'm using Lattice and I'm doing the whole low-hanging fruit thing, this was really a non-issue for me.

So we'll go ahead and add this to our architecture.

And now that we're looking at this part, we can see that we've now modernized our partner, acquisition, we've got our V6 workload, we've now fully modernized and we have a path of doing continuous upgrades to our backend services.

And we can do the same thing if we wanted to to our front end, but we have to be a little careful because the front end service also has that internet access coming in, right?

So that's could be a little bit more disruptive.

And each of those pieces, we've shown you a way to adopt Lattice side by side with your existing configuration so that you can go ahead and cut over as you need.

Not immediately.

It's not like, "Okay, I've had this new service, everyone's on this now, we're gonna be doing this, you know, right before my kids' birthday or something."

And then something goes wrong and you know, you get yelled at.

I'm speaking from experience.

So then what about our database?

Remember on that other side we have something about our database connecting to our acquisition.

Yecine's gonna take us through that solution.

- Thank you, Jamie.

Yeah, what about our database, AWS's databases?

So our acquisition needs to access that on-prem database and right now they have a separate Direct Connect connection.

They want to keep new things separate, they don't want the acquisition to get access to the rest of the network.

So they've done, you know, their own cooking there to have their dedicated Direct Connect.

But we want to change that and makes things a little bit easier and eventually remove that second Direct Connect connection.

So now we're gonna use the piece I was mentioning before, which is the resource gateway and how we're gonna apply that here.

So before going a little bit further, let's take a look at what is the resource gateway.

And here on the yellow parts, it's really your ingress points of the traffic.

And then the back backends will be defined in your resource configuration.

So it can be either public DNS name or it can be RDS database or even an IP address.

So once you create the resource gateway, you attach a resource configuration where basically you define what is the backend of that ingress points.

And remember when you add RDS instance as the database scale up and down, the resource configuration get automatically updated, you don't need to change it, but please keep in mind if you use IP addresses, then you will need to do that work yourself.

You won't necessarily be automatically updated.

So once you've got that resource configuration, you'll be able to connect to that resource gateway either by using a resource endpoint, which basically is your kind of a private link access to the resource gateway or through the Lattice Service Network.

And your clients knowing that subnet can be connecting to the service network, either by associating the VPC to Lattice or using the service network endpoints.

So let's have a look at how we can, you know, break down the different step to update that architecture.

So as we said, we've got our database, I'm gonna create my resource gateway, create my resource configuration when I will define the IP address of my database.

Once this is done, I can attach my resource gateway to my Lattice Service Network.

And from that point on, I do not need that secondary Direct Connect connection.

And here we can make sure that the acquisition VPC do not have access to any other resources from on-premise because that resource gateway only defines the backend to the site.

So here it can only connect to the database, it cannot access anything else even though it's using the Direct Connect connection.

And here you see how the traffic flows between the acquisition backends all the way down to the on-prem database.

So let's add this piece to the architecture.

So here, you know, if we summarize what we've done so far, we showed you the starting architecture, which was sort of very functional, but we had different problems: overlapping IP addresses, V6 to V4, modernizing our backends.

And we showed you like how you can modernize those step by step.

And here to arrive to that final architecture, we've got everything in place and you still see the transit gateway.

So as I said before, it does not necessarily need to replace fully the transit gateway.

You might still need that for another type of communications.

But now everything talks through the Lattice Service Network.

We use our policies to enforce our security requirements.

And now I'll hand it over to Ryan who's gonna explain to you how he's using VPC Lattice.

- All right.

Hi everyone, I'm Ryan McDonough, I'm with Goldman Sachs and I lead the technology for our managed continuous deployment platform that we call Fast Track.

Now a number of my colleagues have presented Fast Track at different AWS events, but we primarily focus on what Fast Track does in terms of evaluating policy as code to enforce our security baselines.

But one thing we haven't really covered is how we perform networking with this platform.

And today we wanted to cover how we do networking in Fast Track and how we're now leveraging VPC Lattice to enhance our network capabilities.

But first, let's talk about what Fast Track is.

So it is our primary continuous deployment platform for getting applications into AWS.

We launched this in 2021 to really improve developer productivity and reduce the number of manual reviews we've had on applications.

So to achieve this, we took a shift left approach in how we do policy evaluation and front loaded all this into pre-deployment.

So our applications are all authored in AWS CDK.

And pre-deployment we synthesize that CDK application to cloud formation.

And from there, our guardrails evaluate policy against the generated cloud formation templates.

So at this stage, we're looking at things like, you know, does an S3 bucket use a KMS key?

Is it the key owned by Goldman Sachs or is the bucket publicly accessible or not?

But there are some resources that we can't really cover pre-deployment, right?

So we do have to layer in things like SCPs and RCPs and even permission boundaries to fill in some of the gaps.

Now for networking, we've used a collection of VPCs that are shared to user accounts.

And so let's take a look at what the workflow looks like.

So when a user requests an environment, this is all self-service by the way.

Anybody can come in and request a new environment.

And when we do this, we provision two accounts.

There's one account that is your service account and this is the target of the CDK application.

So all resources in that CDK application are gonna be deployed to the service account.

The other piece is your pipeline account.

And this pipeline account is gonna be exclusive to that service account and this is going to run an AWS code pipeline that is gonna deploy your application to the service account, right?

Both these accounts are associated within OU in our organization.

And once that's done, everything is set up and we provision things like rolls for brake glass, access and detective controls.

And additionally we also do some house cleaning like disabled regions that we're not gonna support and remove the default VPC.

So once the accounts are ready, users can get to work and start pushing code to Git.

And when this happens, the pipeline gets kicked off and then we run the guardrails.

This process runs a standard CDK synthesis and deploy.

But in between there, we evaluate our guardrails and if the guardrails pass, then we allow the deployment to proceed.

If it doesn't, then we fail the deployment.

So our guardrails not only check the configuration of cloud formation resources, but we also use it to gate different types of resource types.

So for example, if AWS introduces something brand new, we have to go through a review process and we eventually onboard it.

But one resource type that we've blocked up until now is the creation of VPCs.

So you have to wonder if we are blocking VPCs, how do we do networking?

So for different application teams or business units, depending on the granularity you choose, we create a VPC and we share this to user accounts using VPC sharing.

If you're unfamiliar with VPC sharing, this is a mechanism where you can take a VPC and define it in one account and then share that VPC through RAM to a number of participant accounts.

You're gonna do this against an OU and that VPC can get shared to all participants in that OU.

And if you wanna know more about VPC sharing, you can look at the routing loop and you can see Jamie and Alex and they have a really good discussion and get really in depth.

So we chose this for a few reasons.

One of the things that we want to do with Fast Track is to make the cloud a bit more approachable.

And some of our users, they have really great expertise in AWS and cloud in general.

But some teams, this is a new experience for them and we don't want to inundate them with having to learn things about CIDRs and all networking concepts that they may be unfamiliar with.

A lot of these users are gonna be coming from an application-focused background and having them define their own VPCs and networks is a daunting task.

So this really helps simplify that onboarding experience.

And from our needs, it allows us to control the network and we control the perimeter and all the things that go in about it.

So we felt this really fit our needs really well.

So the way the shared VPC works is we define a VPC for teams and we define two network zones.

We have a routable subnet and this is basically a set of subnets that uses IPS from our own network.

And with the right firewall rules, this will allow you to reach endpoints that are deployed into that subnet.

And then the internal subnets are just, you know, they're not routable from our on-premise environment, but this is where we're gonna provision things like your AWS private endpoints and we also define all the endpoint policies and make sure that those are in alignment with our risk controls and things like that.

And to make sure that we can only access GS-managed resources and access upon by GS principles.

And then like most organizations, there's gonna be services like your SDLC endpoints, identity, HR, EMDB, all those types of services.

And a lot of these services predate Fast Track.

So they're exposed via private link.

So what we do is we create these endpoints and we configure all the DNS configurations for these endpoints so that when users get on board, these things are there for them.

So when they provision their accounts from the workflow that we just talked about, they're gonna be assigned to the same OU as the shared VPC and then through RAM, these subnets are now are visible in each user's account.

Now what's interesting about this is when you log into the console, now remember we delete the default VPC, this is gonna appear like a VPC that is in your local account.

But the nice thing about this is all your connectivity is pre-configured for you outta the box.

And all you need to do is know what your VPC is.

Now working with a shared VPC is pretty much just like working with a VPC that you've defined on your own.

There are some limitations.

You're not the owner of that VPC, so you can't modify the network, you can't create a private hosted zone here.

You can't, you know, add or remove endpoints or create your own endpoints.

Now to some users that might be step backwards, but in a lot of cases this is fine and this allows us to maintain control of the network and it works rather well.

The only thing you have to know, again, you have to figure out the VPC ID and thankfully CDK has a nice mechanism to look up these VPC IDs and resolve that.

And again, you start working with it just like any other VPC and you can provision resources into it, no problem.

So we found that this model worked really well outta the gate.

It allowed us to really simplify the developer onboarding experience and have everything pre-configured for developers before they even provision their accounts.

Now you have to be wondering why are we talking about VPC sharing in a talk about VPC Lattice?

Well VPC sharing isn't without its challenges, there are gonna be some workloads that simply aren't gonna work with shared VPC.

So as a financial institution, there are some workloads that are gonna need much stricter network isolation.

In this case a shared VPC isn't gonna cut it, but one of the bigger challenges you're gonna find with a model like this is resource management and IP starvation issues.

Because your routable IP space, it's a finite resource.

So you have to manage this.

And these types of issues are a bit of a slow burn.

They're not gonna happen out of the gate, they happen over a longer period of time because when you provision these VPCs in this manner, you're doing it at a point in time with the information you had at the time that you set it up.

As time goes on, you know, as Jamie was saying, you know, your business improves, you think you're building new things and adding things that weren't there when you created the VPC.

And so you run into things like we mentioned IP starvation, you've got noisy neighbor issues and then handling resource quotas gets to be a big challenge.

Now again, as things start to come online, if we go back to that, you know, team one and team two and let's suppose we have a team three and that maybe has a custom vendor integration or that account represents something that, you know, it's an acquisition and we need to punch holes into our endpoint policies to reach something external.

Now the question is you have to ask yourself is, "Do I want to open up these endpoint policies for all participants in that shared VPC or just one?"

And the more of those types of use cases you get the complexity of that endpoint policy gets really gnarly and it may not be something you want to go down and you really have to think about how that's gonna work.

And then finally is getting the right granularity is really tough, right?

Unless you're the application team, you're not gonna know how to right size this VPC and how many accounts you can associate with that VPC.

And you may think like, "Oh well, we can just switch the VPC ID."

It's really not that simple. It's really a one-way door.

And if you do need to move VPCs, this is now a migration.

So it's not as simple as you might think, but we think we have a better option.

But before we go there, let's go through some of the things that we want out of a next generation networking architecture.

So we already talked about the stronger network isolation requirements and we wanna avoid these resource contentions so we don't have people fighting for the same resources in the same VPC.

But then there's some other things we want to get out of this as well that we aren't able to capture today in the shared VPC model and get visibility into what endpoints are being exposed to what consumers across different accounts.

And we still wanna retain this simplified developer experience for networking so that enterprise services that we have already plugged into our shared VPC can be used just as easily as they were in that model.

So let's take a look at what this looks like with Lattice.

So now we've got our account, we've provisioned these two accounts here and now we have no more shared VPC.

So now there's nothing in here.

We update our guardrails now to allow their creation of VPCs and we also introduce an AWS CDK construct that builds this VPC in a shape that we would prefer teams use.

But one thing that we do do not permit is the creation of things like a NAT gateway or an internet gateway so that these don't have unintended access to the internet.

And in addition to that, we leverage block public access, which makes sure that even if someone was able to circumvent our guardrails, we don't have the ability to get out on the public internet unless we've explicitly done it.

And this is done at the organization level.

So now we have these VPCs that have overlapping CIDRs, but how do we get access to the services?

So these shared services that we have that were previously using Private Link, we've now updated these to expose them as resource configurations.

And in another platform managed account, we create what we call a shared services service network.

And this is basically a service network that's gonna contain all these services.

And so our construct then can associate these VPCs with this shared services network and now we have the ability to reach these services but from two different VPCs that have overlapping CIDRs.

One thing is really important to note here, and this is something that is a relatively new feature of resource configuration, is we don't need to create hosted zones in each of these VPCs to map the correct custom domain name to each service.

And that is not gonna be a responsibility of the individual team accounts.

Instead the resource configuration can declare its stated custom domain name and without any interaction from team one and team two accounts, they can resolve those services on the same FQDN, right?

So we don't need to do any of the things that we're doing with Private Link and the name that is being associated to these services is declared by the service owner and not by the application teams. Now I wanna talk about Private Link because there is some confusion about, you know, do I have to convert everything to Lattice if I'm using Private Link?

And the answer is no.

Private Link and Lattice work happily side by side.

So in this case we have, you know, team two has a vendor relationship and that vendor is using Private Link.

A lot of vendors are still using Private Link and they're not gonna switch to Lattice just because we wanted them to, right?

So here this works just fine and Lattice and private link work, great side by side.

But I wanna point out something in this model that we think is a little better than what we had with the shared VPC model.

If we had to take the same vendor connectivity in shared VPC, you're now exposing that to all participants in that VPC, right?

So in this case we wanna make sure that only team two has access to that endpoint and not anyone else.

So now we are problem solved.

Now let's suppose that team one and team two want to create a, you know, maybe it's a bi-directional private link, right?

Yeah we could go that route, but instead what we can do is we can create a new account that is gonna define a service network that is exclusive to the application team.

And this is independent from the shared services network.

So using a service network endpoint, they can add in an additional service network.

You know you think, "Well why can't they just use the shared services service network?"

And the reason is is we have permissions on how we share the shared services service network versus the team one.

With the shared services service network, we control the permissions on how RAM shares that service network.

So the only users that would be able to associate services or resource configurations to that service network is going to be the platform service teams, right?

So your SDLC team, your identity team, they'll be able to create associate services with that service network, but for general consumers they won't be.

So this isn't a place where we want to have everybody in the organization to just dump services, right?

If you need to have your own service network, you can spin one up, define what accounts need to access it and now that is a private service network that is for your work group, right?

Instead of having an entire VPC.

So one of the challenges that we had in onboarding this is how do we get all these service teams to leverage VPC Lattice, right?

We don't want to say, "Hey everybody is a new work item for you and go do a bunch of stuff," right?

But thankfully it's pretty easy.

So if we take a look at your high-level private link architecture, you have your compute resource that is your service and there's gonna be a network load balancer and a private link service, right?

So how do we get this to work with a resource configuration?

So one thing we don't wanna do is we don't want to disrupt the existing Private Link customers, right?

Those are still gonna exist and we don't have to do anything.

But what we can do is create a resource configuration and we can create a resource configuration from an ARN, we can create it from an IP address, but we can also use a domain name.

And so in this case we simply take the domain name from the NOB and this is the AWS assigned FQDN.

And we use that to create a resource configuration and now we can take that resource configuration and associated with the shared services network and instead of having all of these accounts request to be allow listed for every single Private Link endpoint, we can get everyone in the same OU path or in the entire organization to have access to the shared services network.

So this really helps these use cases where you have a service that you wanna share in a one-to-many fashion and you've got a very large number of accounts.

So what do we get on all this is now we have a much stronger network isolation story.

We don't have issues with resource contention or IP starvation, right?

All those resources are exclusive to the owning account and we've got a really nice story around how we handle cross account access.

So we can use Lattice, we can use resource configurations and we can still use Private Link when needed, right?

And these one-to-many use cases are very easy to solve now, right?

We don't have to allow list a bunch of little accounts everywhere else.

But I think the really important thing is that we now also have a very good story around simplifying the developer experience when we onboard.

Because just like with the shared VPC model, we have a mechanism now where someone can bring up a VPC and all their connectivity needs are met outta the box, at least the majority of them, not the bespoke ones.

But this is a nice simple way to get this done and maintain isolation.

So I don't want this to be like a shared VPC versus Lattice conversation, but, they're different things.

But we really think that the shared VPC has applicability for specific use cases, but at scale it can really present some challenges.

So we've been pretty happy with how this has worked out so far and we're looking forward to seeing more with Lattice down the road.

I think that wraps it up and I think we'll take questions at the door 'cause they're shutting us down to turn the room over for tomorrow.

So if you have any questions we can take it over there.

- And thank you again, everybody.

Please remember to fill out the survey.

It's important to let us know how we're doing and we'll see you at the door.

- Thank you.

(audience applauding)

Loading...

Loading video analysis...