SwiNOG#40 | netlab: bringing the joy back to virtual networking labs | Ivan Pepelnjak | ipSpace.net
By SwiNOG - Swiss Network Operators Group
Summary
Topics Covered
- Skip Lab Setup drudgery
- Intent-Driven Labs Instant
- Multi-Vendor Labs Effortless
- Scale to 60-Device Topologies
Full Transcript
So, please welcome Ivon.
[Applause] Thank you.
Um, I have 30 slides. They gave me 25 minutes.
So, I'll skip most of the stuff on the slides. It's there just for your
slides. It's there just for your reference when you get the presentation afterwards. Uh, I'll skip this one. The
afterwards. Uh, I'll skip this one. The
important part is at the bottom. I built
too many labs in in my lifetime. I hate
building labs.
So when I get a question like this one and it's based on a true question I got on my blog. Uh I was talking about unnumbered interfaces. You know that
unnumbered interfaces. You know that idea when you have the interface addresses being equal to the loop back IP address. So you save address space
IP address. So you save address space and you're running OPF on that. and
someone was you know pontificating uh whether oh I wonder what would happen and whether that is vendor specific and yada yada yada yada and like dude did
you even think about trying this out in the lab because you know this should be trivial and then I remember that labbing in the old days was like this
and do you really want to go there and set this up just to test that one feature? Nope.
feature? Nope.
Uh today we are lucky because everything is virtual and uh we get things like that that's GNS sorry that's uh CML from
Cisco but you have GNS3 you have Evang you have tons of open source things that allow you to run the virtual machines and build a lab uh with your mouse
and then at the end you know when you're done clicking and dragging cables across that screen and you start things up you still haven't done anything because you've solved the first part of this puzzle.
You have the topology and then you have to type in the IP addresses and then you get them wrong and then you have the subnet mismatches and then you have to type in the BGP neighbors and you get them wrong because you forgot the IP
addresses and they they don't don't come up and then you're like oh three hours later I have a running lab. By that time
I am in this stage.
I'll have a beer.
And now imagine in the land of unicorns or as they call it infrastructure as code or maybe if I would be working for a vendor it would be intent driven.
So my intent is to have a lab with two nodes R1 and R2. And on the right hand side is the actual topology file that my tool is processing. Uh and well we'll
make them Arista containers which means that the default device will be EOS and the provider will be container lab. And
oh they're running OPF. So we are bringing in the OPF config module and that module does all the magic. And we
need two links between them. So we'll
have links between R1 and R2 twice. And
we want to have some stub links so that we get the IP prefixes. And so those are the links. And oh by by the way they are
the links. And oh by by the way they are unnumbered. So the addressing on
unnumbered. So the addressing on pointto-point links for IPv4 will not be a prefix. True means unnumbered. False
a prefix. True means unnumbered. False
means don't use.
And then you save the file. You execute
one single command and you get a running lab.
Uh I'm not crazy enough to do a demo. So
this is a video but it is in real time.
It's not speed it up. Uh the one thing we're waiting for is for Arista to start. FRR starts way faster but Arista
start. FRR starts way faster but Arista has a lot of things to do. I don't blame them. They're fast. So first uh it
them. They're fast. So first uh it created the container lab topology file and the anible stuff. Then it start the containers and now we are running an anible playbook that builds the
configurations and deploys the configurations. first the initial
configurations. first the initial configurations and then uh OPF configurations and it's done. The lab is up and
running. This is real time. So I connect
running. This is real time. So I connect to it. Now I had to cut for 30 seconds
to it. Now I had to cut for 30 seconds for OPF to set up the adjacencies and I can do show iposf neighbors. You can see they're up. I can do show ip route.
they're up. I can do show ip route.
There's the answer.
So that guy could have produced his answer in literally one minute after spending half a day installing the tool.
This is what's going on behind the scenes. So we had to supply the topology
scenes. So we had to supply the topology file.
Then the tool uh we've been working on on this tool now for like three or four years. So it's
years. So it's pretty feature feature rich. It does the data model transformation from the high level intent in double quotes to the
device data model. So I'm running OPF on this interface and I have an IPv4 address on this interface. And then we are using the configuration templates
for different devices to translate this into device configs. So you can do the same lab with Arista and tomorrow you do it with Nexus OS and the day after you
do it with Nokia SR OS and the day after you do it with SR Linux. The day after you do it with Junos.
We handle the configs.
You tell us what you want to have and we do it. Oh, you can also mix it. One
do it. Oh, you can also mix it. One
Arista, one Junos box, one XR box, you name it.
uh we are do using two tools to prepare the virtual infrastructure and start the workload. It's vagrant with KVM and
workload. It's vagrant with KVM and libert or container lab for docker containers. So we can take virtual
containers. So we can take virtual machines or containers and the most important thing is we do all the heavy lifting for you. So you
want to check let's say whether BGP at path functionality works with Arista.
Oh, you need addressing, you need the interfaces, you need OPF, you need BGP neighbors, you need all that. Oh, we do all that. You just add the one command
all that. You just add the one command you want to test.
So instead of wasting three hours setting things up plus five minutes figuring out whether it works, you skip the first three hours.
And uh as you can see we do pretty much everything apart from IP multiccast. I
hate multicust.
Of course we know that every network is special. So you can have custom
special. So you can have custom configuration templates. So if you
configuration templates. So if you always use the same thing you can just build a template and that thing gets deployed on top of what we deploy. So
you are at stage where you want to be or we have plugins. These are Python modules that can do anything.
So for example, once I wanted to have IP anycast functionality where multiple routers would advertise the same loop back IP address from the
same autonomous system. So the AS path would be the same and then my tool would go like oh you are in the same autonomous system. I will set up IBGP
autonomous system. I will set up IBGP between them between you because that's how we do things, right? But not in this case. So I wanted to have this plug-in
case. So I wanted to have this plug-in that would cut off the IBGP sessions after the transformation is done. And
it's online. It's in the GitHub repo. So
you can uh go and check it out.
This is the current state of the tool as of one month ago. We added a few things in the meantime.
Uh the platforms that we support.
Uh so uh we ran out of ideas.
Now, there is one vendor missing on this that's pretty popular in some places.
Uh, starts with H.
The only reason it's missing is not because I hate them or anything. They
don't have an easily downloadable virtual machine. So, if you don't care,
virtual machine. So, if you don't care, I don't care. Thank you.
Uh, apart from that, more or less everything you could ever dream of. Plus
we do have some generic demons uh like bird and uh DNS demon. If someone wants
to bring some other routing demon talk to me. Uh KVM and Docker are supported.
to me. Uh KVM and Docker are supported.
We did support virtual box but no one was using it so we stopped doing that.
Uh or you can configure hardware labs if you wish. In which case the only extra
you wish. In which case the only extra thing you have to specify is well the actual wiring plus the interface names.
For the virtual labs we know what the interface names will be. Oh, we even insert bogus interfaces to the virtual machines so that you can get the interface names right.
So if you want to have Ethernet one and Ethernet two and Ethernet five, we will silently insert three and four
so that the numbering works out. Uh and
the last thing you can combine containers and virtual machines in one lab. So you would do FR routing in
lab. So you would do FR routing in containers and Arista in containers and then you can add a Nexus OS virtual machine to that.
You could do everything in virtual machines but then it would be way slower and use more resources.
Uh on the addressing front we do everything and everything else and some more. So we automatically assign
more. So we automatically assign addresses or you can do static addresses or you can change the pools we assign from or you can do unnumbered or whatever. On the data plane, we do
whatever. On the data plane, we do VLANs VRFs VXLAN MLS segment routing, both variants and tunnel interfaces. On the routing front, we can
interfaces. On the routing front, we can configure all major routing protocols plus static routes plus the new thing this month is discard static routes, a
router redistribution between anything and anything with route policies or not.
Uh the route maps, the prefix filters, the AS path filters. I was bored one day and I just wrote that stuff and then other people implemented that on like 10
different platforms. So you can now do uh route maps on Cisco, Arista, FR routing, uh Junos, a few other platforms. anycast gateway if you want
them or VRP.
In the MLS, uh we do LDP, BGPLU, we do L3VPNs,
we do uh 6PE, we do SRMPLS. Uh oh, by the way, L3VPN you can do it over MLS or you can do it over VXLAN with EVPN or
you can do it with SRV6.
And in EVPN because we have a crazy contributor, we support all the crazy stuff that the vendors are selling. So
you can run iBGP and OPF or ISIS like the same people or you can go EBGP only or you can go IBGP over EBGP like one
particular vendor is recommending and you can test in multi- vendor environment how these things work together and surprisingly they work. I
was so surprised.
This is an old snapshot of the documentation so you can see a little glimpse into what's supported.
The whole documentation would be like an hour long presentation.
Uh if you want to connect to the outside world, we support the ways to connect the virtual machines or the containers to the outside world. So you can even connect your virtual network to an
actual outside BGP feed for example.
You can connect this to your exchange point and take the BGP feed from the exchange point if you wish. And it's
really easy to start external tools. If
an external tool is in a container, you just tell us how that container is started. And we can create volumes and
started. And we can create volumes and all that. And then that container is
all that. And then that container is started immediately after the lab is started. And if you tell us how to build
started. And if you tell us how to build the config fire for the tool, we do that. So we integrated with SuzieQ
that. So we integrated with SuzieQ that's an observability platform. Uh
graphite is a user interface so you can point and click on your routers if you prefer that instead of opening terminal sessions. Uh, Edshark is a beautiful
sessions. Uh, Edshark is a beautiful thing that does capturing on a Linux server and then transports captured
packet over HTTP to your laptop where it creates a virtual interface that you connect Wireshark on the laptop to.
So, you're running everything in a Linux server somewhere.
I'm running it at home and you can capture on your laptop sitting in a coffee shop somewhere.
Uh someone did Cisco NSO.
Uh then you can create reports. Uh you
can create graphs. We don't do graphs.
There are tons of tools on the market open source that do graphs. So we just create the graph description file and then you use that one tool and it creates a beautiful picture of your
network.
uh oh automated validation so you can write tests that are executed whenever you wish so you can I'm using that in uh the
exercises for BGP for example that I wrote uh this is your exercise this is what you have to do and instead of you know logging into devices and doing show commands you just run a script and it
validates whether you did everything that you were supposed to do Uh plus if your server has too much memory, you can run multiple lab instances on
the same server. Uh some software development companies do that because all developers share the same server with the same images and everyone is running their own labs for software
development purposes.
Uh we even had to implement a staggered device start for someone because if you start 50 virtual machines at the same time, everything collapses.
They all burn four CPUs when you start.
So you you never have two 200 cores, but they burn maybe half a CPU when they're idle.
So if you start 10 at a time and wait and then you start 10 more and wait and then you start 10 more, you can easily start a lab with 50 or 60 network
devices.
And I'll show you a top apology at the end what someone did and was like mindblowing.
All right. What do you need to deploy this stuff?
Well, today we are recommending that you're running this on a Linux machine.
It can be a physical server. For
example, I bought for myself this small brick for€,000 with eight M AMD Ryzen cores
and 64 gig of memory. You just mail order it. it arrives
order it. it arrives uh you have to install Python on it.
We're recommending Ubuntu although people are running this on all sorts of Linux distros but if you come along saying it doesn't work on I don't know some X DRO we can't help you. If you're
saying it doesn't work on Ubuntu then it's my problem.
Uh you have to install this other software. It's all open source and it's
software. It's all open source and it's all free. So the cost to get this up and
all free. So the cost to get this up and running is just the hardware cost.
If you want to run this on your laptop, then with Windows, today you have the Linux subsystem for Windows. It works on
that. On Mac, uh you use the uh from the
that. On Mac, uh you use the uh from the canonical the multipass. It starts a YUbuntu virtual machine on the Mac. So
you can run the labs on your laptop.
Anyone using Mac with Apple silicon, it works.
You're limited in what you can run in uh this virtual machine because it's an ARM CPU and uh an x86 virtual machine will not run on an ARM CPU.
But Arista has an ARM container. SR
Linux has an ARM container. And FR
routing is open source anyway. So you
can run three different devices on your Apple laptop which is good enough to test complex things like maybe VXLAN with EBG uh with EVPN.
Uh for everything else you have to uh register somewhere and download something and then you get a
virtual machine that boots and expects you to type on the console port.
That's not good. So you have to get past that stage. You have to get to a point
that stage. You have to get to a point where the virtual machine has uh DHCP on one interface and an SSH server enabled and then we can take over. So you have
to build the vagrant boxes and getting the stuff is as easy as no hassle for Nokia, SR, Linux and BIOS to little
hassle for Juniper to reasonable hassle like Arista, Ruba, Cisco and Cumulus to oh you have to beg or oh you have to
know a developer who might give you something.
Uh, okay. And then you still have to write that config file, which brings me to another unicorn service because no presentation would be
complete without mentioning AI these days.
So you can vibe code your way to NATL topology file.
I actually tested this prompt and it works better if you include the URLs to give chat GPT references to where the
documentation examples are and uh it generates something and of course it's wrong and then you save that into topology
file and you start the lab and the tool complains and you copy paste the error messages back to chat GPT and five iterations later you have something
suboptimal but amazingly it works.
It was able to generate the topology file with two routers and two links running OPF. How cool is that?
running OPF. How cool is that?
Uh on a more serious note uh we have hundreds of different examples.
It's all on GitHub. So you just go and browse and find something that is pretty close to what you need and you go from there.
If that doesn't work, you just uh open the discussion and we'll help you out.
Back to reality. What do people think about this? Well,
about this? Well, uh getting started is an interesting problem for most people because it requires some
significant investment into tooling and building the environment up. But once
people start using it, it's like there's no going back.
So people who invested time and started using it were pretty enthusiastic and this is the biggest top apology I've seen so far.
This guy needed the staggered device start.
Uh so it's a mixture also of different vendors. So there was Juniper, Cisco,
vendors. So there was Juniper, Cisco, Arista, Cisco both routers and Nexus OS in there. And the blue boxes are the
in there. And the blue boxes are the routers and the edge boxes I think are the Linux hosts they need for the workload.
The other thing I'm doing is I'm using this to uh build uh lab exercises. Is
this starting or okay good. Uh so I built a number of uh
okay good. Uh so I built a number of uh BGP lab exercises and the beauty is that you can run them in GitHub code spaces.
It's a free service. You're limited to 30 hours per month and you get two CPUs and uh 4 gig of memory. You start the thing.
It's all in browser. Although there's a plugin for Visual Studio Code if someone wants to do that.
And uh here I have the lab exercises.
This is tied to BGP. The readme for the lab exercise. Uh
lab exercise. Uh okay. So that's boring stuff. You don't
okay. So that's boring stuff. You don't
want to see that. The beauty is once uh we start the whole thing. So these are the instructions saying change to this directory. We're already there and then
directory. We're already there and then do netl up. This is running on some container somewhere free of charge.
As long as you can fit this into two CPUs and 4 gig of RAM, you're good. And
uh the lab has started and now you can connect to the devices and uh start working on it.
Right?
So if you like it, here are a few things you can do to help this. So this is totally free, open- source,
communitydriven, everything is on GitHub.
No hooks attached. The one hook is that eventually I will ask you to upload the user statistics.
It will not be enforced. You will be prompted. You could do it or not. But I
prompted. You could do it or not. But I
would love to know what people are doing and what modules and what devices they're using. Even if you don't want to
they're using. Even if you don't want to use it, spread the word. It doesn't
hurt. Thank you. Uh, use the tool, ask questions.
You can always fix documentation. You
can always write a blog post. You can
always find bugs and report them. You can
always fix bugs or you can add new functionality.
So if anyone wants to write a module, I think Babel is running on FR routing. So you can do that.
routing. So you can do that.
or if anyone wants to tackle IP multiccast, you're most welcome.
Um, OPF areas are gone. We just did it last week.
All right, documentation is online. I
write a number of blog posts going into specific details and hints and guidelines. The source code is online.
guidelines. The source code is online.
All the examples are online. Plus, there
are 200 more integration tests. So, if
you don't find something in those examples, you go to the integration test. You might find it there. Two
test. You might find it there. Two
projects I did with this were the BGP labs and the ISIS labs. Yet again, no strings attached, all open source, all on GitHub. Use it uh to reach me. These
on GitHub. Use it uh to reach me. These
are the coordinates. If you want to discuss something, if you want to report bugs or something like that, I would appreciate if you could go to GitHub and
open a discussion or an issue so that it's tracked and archived. But
obviously, if you want to send me an email, there's my email. And now I guess I'm totally out of time.
Uh which means that uh find me somewhere during the break and you can ask me whatever you wish. No, we Thank you. We
actually do have time for a couple of questions. Oh, and there's one back
questions. Oh, and there's one back there.
You made me walk all the way to the front to then come back.
There you go. Uh, one question I probably didn't. Uh, so Sil from open
probably didn't. Uh, so Sil from open fact. Okay. Uh, one question is, is
fact. Okay. Uh, one question is, is there plans for support for VSX? I don't
think I've seen that because this this country is a extreme country in some of the things I've seen. VSRX like Junas stuff.
SLX as in extreme networks. Oh, okay.
Uh, do they have a virtual machine that you can download? Talk to me.
Okay. Uh, by the way, you're always welcome to submit a PR, but yeah, I would love to have that.
Okay, another question I think are sorry Martin winter we have for routing.
So question uh first like I think you just forgot to mention it but I assume you have a generic switch device in there too which you can add um well um
yes and no. So uh we do two things. Uh
one uh if you connect three things together, we would use a Linux bridge between them. So we would emulate a
between them. So we would emulate a thick coax cable with a Linux bridge. Or
you can say I want to emulate this multiaxis segment with a bridge device and then you can specify what device you
want to use for that bridge and that device has to have VLAN implementation so that we can actually turn on layer 2 switching and then you can use anything
you wish as that generic switch device.
It can be Linux running in a container.
It can be FR routing although you don't need FR routing for the control for the control plane because it's just layer 2 but you know it's nicely packaged or if someone wants to use I don't know a
Cisco layer 2 image or an Arista image for that switch they can do that okay so the other question is do you have any options in there for route simulations if I want to inject like 10,000 BGP
routes oh yeah there's a blog post written how you use this with BGP pipe So you use some external tool that uh you know we are like Linux we give you
the infrastructure and then you get creative.
Uh so there are tools that can insert uh BGP routes like BGP pipe and you just start a Linux node with the
image that is BGP pipe and then you configure that stuff and you get whatever you wish into it.
But yeah, maybe I should uh make uh talk to me. Yeah, I think I think this is something for the break. Yeah.
So, thank you very much. Thank you.
[Applause]
Loading video analysis...