How to Automate AI Data Center Deployments at Scale with the Infrahub AI DC Solution (Webinar Recap)

Jennifer Tribe

Apr 26, 2026

Full Transcript:

Hi everyone. Thank you for joining us for this new webinar. Today, I’m Damien Garros, co-founder and CEO of OpsMill. And today I’m with Wim Van Deun, our product manager for Infrahub. And we’re going to talk about how to automate AI data center deployment at scale.

It’s very much a topic of the day. We’re seeing a lot of interest in that. A few months ago you probably remember we did another webinar where we had two members of the team, Alex and Mikhail, explain how we deployed really large data centers for more customers.

This is building on top of that. We’ve been improving on the process. We’ve been improving a lot of things on the product as well. And I think more importantly this time we actually packaged everything and we made it public so that everyone can reuse. So that’s what we’re going to talk about today.

And Wim is going to show you a live demo and all of that actually is already available online with a lot of other things we’re going to talk about. In terms of agenda, we’ll have a quick introduction on what we’re trying to solve, the problem we’re seeing with our customers, with the market.

Then we’ll move into this AI data center fabric solution, explain what it is, how to build the fabric with it, do the live demo, manage the day zero, day one situation. And then we’ll go back and actually also cover the day two and extend the capability of the fabric and show how we can do that with this solution.

So when we say automating AI data center, what we’re seeing is today there’s many organizations that are building their own physical data centers to host their large AI clusters. And usually we’re seeing the same requirements that they want to move as fast as possible because this hardware costs so much that every week that this system is not in production, it’s money that is lost.

And so there’s really this need to move super fast. And because it’s such a massive system it is really complicated to do without automation. Automating those systems makes a lot of sense. Now automating the first deployment is super important. But if you’re not able to have a system that will also manage the day two maintenance then it becomes very complicated because we usually see these systems being deployed in phases because we’re trying to go as fast as possible.

So it’s super important to manage the first one but also the day two maintenance as we call it. And the third aspect is there’s usually not just one site. There’s multiple halls or data halls or pods or however you name it or sites in some cases.

But also making sure we have the ability to have a consistent deployment across all of those sites is also very important. Which brings this idea of having a design and being able to reuse it across different systems. So that’s all of that that we’re addressing with that.

So Wim, what are the challenges that we’re seeing on the customer side?

What we have seen for customers that were already automating these workflows is that in more the traditional tooling that we use they were typically created hard coded.

We’re creating hard coded one-off scripts and within those scripts they were typically embedding the design logic of how a data center should be built. Now there was one big challenge with that, is that if you have a site that is going to be slightly different it’s going to be really hard because you’re going to have to create a one-off script just for that particular other site to address that specific thing that you deploy differently in that specific site.

So people would create another version of that script to do another type of deployment which creates a lot of technical debt and it becomes hard to maintain. The second challenge that we see is customers maintaining the lifecycle of a deployment of such a data center.

So let’s say that your data center has to evolve. There’s a new variant of the design that you want to deploy. And the problem with the scripting approach that we had before is that when the logic is hard coded within that script and you have produced an instance of a data center you have lost all of the inputs that you initially fed into that script.

So the design becomes decoupled from the implementation which makes it hard to evolve your data center later on or to manage the lifecycle of it. Which typically also means that day two operations would need to be rebuilt in an ad hoc basis.

So even adding more scripts to do those things which increases even more the technical debt. Then another challenge that we have seen in the field was that people running into all kinds of troubles.

For example multiple people would operate on the same data center simultaneously which created conflicts. Where two users would allocate the same IP address for two different workflows which created conflicts.

And those conflicts were typically discovered way down in the process when we were actually deploying and it would create incidents.

I think this one especially. Sorry to interrupt but yeah I think this IP allocation, I’ve seen it happen many times when everything is stored in Git and then you have two branches that are very isolated.

I definitely heard a lot of people complain about that. So it’s a good one. And other things we could probably add here is I realize it’s the lack of especially for people that are using heavy Git automation platforms, they don’t have the APIs to programmatically interact and do the day one provision or day two maintenance with those systems.

It’s probably also something we could add as a fourth challenge here.

What we see there is definitely that a lot of the data is indeed not stored in the database but sometimes in YAML files.

Which means that multiple workflows have to implement some parsing logic for that data and it becomes hard to maintain. And every workflow had to parse the data slightly in a different way or interpret it in a different way.

And a lot of that we can solve with a proper database and a proper query language.

Okay. So we hope that a lot of people on this webinar are already familiar with our product called Infrahub. But just to be sure we want to give you a quick introduction about what Infrahub really is.

And Infrahub is really a data management platform for infrastructure. But now if we think even further where we see industry moving, where everything becomes more agentic driven, you can think of Infrahub as an infrastructure knowledge graph that can help you with your agentic workflows as well.

So what is so special about Infrahub is first of all we built it on a graph based data platform or a graph database. And it comes out of the box with a built-in version controlling feature and a schema engine.

And that schema engine is particularly important because everything in Infrahub is schema driven. You will hear people say a lot about the flexibility of the schema within Infrahub. What that really means is that you can define in Infrahub what data models you exactly need for your business use cases.

There’s a lot of pre-built schemas that we have available that you can use or that you can completely adapt to your use cases. With the version controlling feature also comes branch awareness. So very similarly to working in Git, we have a way to create isolated environments at the database level where you can prepare changes in an isolated environment without affecting the production data or the data that maybe some of your colleagues are working with.

What that allows us to do is also generate diffs between what a state change or a future state of your network should look like compared to the actual state. And we have a capability to run a complete CI pipeline on any of the data changes or schema changes that were made in the database, making sure that the changes were prepared correctly and that they comply with your business standards.

Last but not least, Infrahub also provides automation capabilities within the platform. We call those Generators which we will cover more in depth later during this presentation or webinar and Transformations which allow you to consume the data in Infrahub and produce artifacts that are tracked together with the state of the data.

Think of artifacts, the most common artifact that we would see in the networking industry is probably a device configuration but it could also be a cabling plan or something like that.

So then what is this AI data center solution that we are talking about?

So it’s really a reference implementation that we want to give to our customers to show how you can build AI data centers with Infrahub. It’s based on the learnings that we have made with multiple of our customers based on the problem statements that we started with.

So what this solution will provide is a design driven network automation approach for large scale AI data center fabrics in Infrahub. So what we will be showing you today is how we can build a five stage Clos topology where we are going to start from a high level design input and we’re going to generate all of the devices and all of the bits and pieces that are going to make up our network fabric.

And then we will show you how the solution can be used to also perform day two operations such as expanding the capacity in the fabric or expanding the capacity of the fabric.

So maybe would be good to explain what is the packaging? Is that a completely new version of Infrahub or what is the solution exactly? Could you quickly talk about that?

We have a slide at the end as well which covers that in more depth.

But in general there is a Git repository that we make public in which we have shipped all of the resources that you need to be able to spin this up yourself. It’s not a dedicated version of Infrahub.

We just use the standard Infrahub product and on top of that we are going to import schemas from a repository. We’re going to provide an example data set, and then all of the tooling around it to be able to build a data center.

So we could have in the future solutions for many different use cases. That’s just one of them. That’s what you’re saying, we just use that as an example?

Yes. So I think the AI data center solution is one of the solutions. We’re doing it based on what we see from our customers.

Based on the demand from our customers, the problem that our customers are facing. But we plan on adding multiple other types of solutions that will cover other use cases. That being said, the solution specifically is also a way for us to package and show you maybe some of the best practices on how you can approach such a challenging workflow.

Whereas we target this specifically as an AI data center solution, the approach that we take is definitely applicable to other types of use cases. So first and foremost any data center based on Clos topology, running EVPN and VXLAN can reuse this.

Right. But also the patterns that we’re going to talk about in a bit are applicable in other scenarios like I don’t know, the service provider world or even campus networking for enterprise networks.

And so if someone wanted a slightly different design, they could also use this one as a starting point and customize it or just make it their own. Like if someone doesn’t want to do the Super Spine for example and just wanted a three layer leaf spine, that would be also a good starting point.

Exactly, yes. So that’s the beauty of the platform in the flexible schema and just the overall flexibility within this product is that you can take this solution and modify it exactly to your scenario.

Perfect, thanks.

So just to give you a little bit more in depth view of what the fabric will look like that we are going to build, it’s going to be a five stage Clos topology.

In the end the goal is going to be that from a very high level design, in which you are going to capture what we want our data center fabric to look like, we are going to produce six Super Spine switches, and then we will have multiple pods in our fabric and each pod can be of a certain type or role so they can have a specific function.

Within those pods we are going to deploy the Spine switches and then we will have racks that are part of a pod in which one or more leaf switches will be deployed. Then after all of that we will also generate all of the cabling in between those devices, start assigning IP addresses and all of those things that would technically make this fabric work.

We have been talking about design driven network automation already before but what do we really mean with this design driven network automation approach? So you heard us already talk about design inputs.

So the goal is that we are going to create a design that can be consumed by other people in our organization that have to provide the minimum amount of inputs that we need to be able to generate a standardized implementation of that design.

So in our case the design input is going to be some specific properties that we need to capture for the fabric. For example what type of hardware are we going to use for the Super Spine layer?

The number of Super Spines we want to deploy, the cabling strategy that you want to use towards the pods or the Spines, which supernet or prefixes you want to use. Then very similarly we will have to provide design inputs for the pods that we want to create in the racks.

Also we’re not really going to go too much in depth into that but there’s another high level design input that we’re going to use which are device templates. So we basically created a set of templates that represent the devices that we can use in our network which can then be referenced in other design inputs such as the fabric, the pod and the rack.

So even though we have a fairly standardized design, you can see that by altering those design inputs you can create a lot of flexibility in the amount of pods that you need in the fabric or the amount of racks that you need within a specific pod for your fabric.

Then the next step is that we’re going to take that high level design input which is going to be fed into what we call a Generator. Think of a Generator as business logic.

It’s going to be a Python script that is run on the platform that takes the design inputs and is going to produce all of the devices that we need, the interfaces, a cabling plan, IP addresses, the prefixes, the routing that we need, all of the configurations and documentation.

In the end the output should be that we have a network fabric represented in our infrastructure data management platform that is ready to be deployed.

So Wim maybe I think this concept of design I’ve seen sometimes it’s the harder part to understand, is this list fixed or how hard or easy would it be?

For example let’s say if I wanted to, I remember in a previous life we would, I guess, sorry, the level of oversubscription. Sorry but if I had another very important parameter as an input to my pods that my business cares about, how would that work?

How would I, could I add something like that for example?

Yes. So I think this is one of the nice things about Infrahub. So remember that we have talked about the flexible schema of Infrahub and that everything is schema driven or let’s call it a schema architecture.

And that really allows us to capture both the design and the technical objects within our schema or data model. So what that means is that the design objects that we use can be modeled within Infrahub and because we have a flexible schema you can adapt design inputs that you want to use based on your business use case.

So for example if you would want to define a certain oversubscription that you want to use at the pod layer then you would be able to modify the design layer schema for the pod to add an attribute that would allow you to define what level of oversubscription you would use.

So if we look at it again it’s the same representation as the previous slide but in a slightly different manner. So within Infrahub we will store both the design input which you can define using the schema based on whatever you need of which we are going to create objects which are then going to be fed into the business logic layer which is the Infrahub Generator which is then going to produce all of the technical objects that we need that are also defined within the schema, so your network devices, the interfaces, the cables, the VLANs, the prefixes and IP addresses.

So that’s a really important aspect. And if we get back to the problem statement that we used before is that for managing the lifecycle of your data center, one of the biggest challenges was that the design input was lost after the initial build, and therefore we couldn’t easily lifecycle or update to a new design.

We actually solved that by storing the design intent within the database linked to the actual implementation that you have created, which allows us to do those day two operations later on in a good way.

So how is this implemented? So what we will be defining again is what our network fabric looks like, our network pods are going to look like, and what our racks are going to look like.

In the previous slide we just showed one Generator, but in essence it’s going to be three different Generators. So the three different places where we process our business logic. So our business logic is a big set of rules and instructions that we want to use to create the implementation of the data center.

But we have really started to divide that into different layers, and each of those layers is then going to produce a specific output. So the technical data that we need. Now what is really important to understand is that we have divided this up into layers for two main purposes.

Well, actually three. So one is the maintainability. So it makes it easier for us to evolve, for example, the pod Generator without necessarily having to touch the fabric Generator so we can more easily maintain that business logic into smaller chunks.

A second thing is that we want this process to be able to build the data center fabric quite fast. We don’t want to be waiting hours and hours for all of this data to be generated. And by splitting up the actual generation part into multiple layers, we can parallelize some of them.

So for example, the fabric Generator is going to be one specific instance because we’re just going to generate one fabric. But when we talk about the pods, you could already see that within a fabric we are going to have multiple pods and each pod will need to be generated.

If we wouldn’t split this up into multiple sections, then we would have to build those pods sequentially. And by splitting that up into its own Generator we can actually parallelize and speed up the generation process.

Similarly, we do that for the rack Generator. So the rack Generator has to process even more racks, right? And based on this sequencing we can actually generate multiple racks in parallel quite quickly.

And last but not least, we don’t want to cover it too much in this section yet but this also allows us to do those day two operations later on in a much easier way.

For example, if we would just want to add a new pod, we don’t want to be regenerating the whole fabric from scratch again which could be a quite lengthy process. But we can just start at the pod Generator level. Another important aspect that we want to cover is that Infrahub provides the capability to sequence those Generators as you want.

So you can build a workflow like we did in this example, where when you trigger off the fabric Generator because you have defined a new fabric it will automatically, once the fabric Generator has finished its job, it will actually trigger the pod Generator to generate all of the pods that have been defined within that fabric.

And once the pod Generator is finished it is also going to automatically trigger the rack Generators to run for all of the racks within a specific pod. So that allows you to scale this solution quite easily for massive data centers, which is definitely something that we see more and more with these new AI workloads in the world. At the end of it the solution,

once we have generated all of the output we will also see that we start to generate what we call artifacts. So artifacts in this case are going to be the intended device configurations for each device, the Super Spines, the Spines, the leaves.

And we’re also going to create a cabling plan, which is going to be a document that we can send out to the people that physically install the data center so that they know exactly between which devices and on which interfaces we need to build connections so that you can hand off a full package to be able to build out the data center in a physical way.

Excuse me. So that’s going to bring us to the first part of the demo.

Within this part we’re mainly going to focus on the day zero or the day one. So how do we create a high level design within Infrahub? And then how do we start generating the data center fabric from this?

So let’s move this out of the way. So we have an Infrahub instance running here which has been prepared already. So as you can see on the left we already have a few menu options here where it shows the different schemas that we have loaded within the system.

So by default this would be completely empty. So we have already preloaded the Infrahub here with two fabric definitions. And this is basically the high level definitions that we have talked about.

So in here let’s assume that a network engineer that was tasked to build this fabric came in and created a new fabric definition that got a particular name and he has defined within the constraints defined in our design how many Super Spine switches are deployed.

Then we have defined an interface sorting method for both the fabric, for both the connections from the Super Spine switches to the Spine switches and from the Spine switches, well from Spine switches, so here these are the things where you can basically modify what you want your design to look like within the constraints of your design definition.

We have also selected a template called Super Spine Switch which is the definition of our device template. So basically what it’s going to look like. So if we look here for the Super Spine switches we can see that for this particular type of Super Spine switches we have defined the amount of interfaces that it has.

And each interface has been pre-allocated with a specific role. For example the loopback interface or the interfaces that connect us to the Spine layer. So let’s move back to the fabric so that’s what we capture as high level intent on our fabric.

Then within the fabric we have defined four units which we can use to scale it. And in this case we have done that using the concept of pods. So if we look at a pod the high level design that was captured there by our network engineer was the name of the pods, the amount of switches that we want to deploy within the pods, Spine switches.

That is the role. So what type of function is all of the compute that we deploy within this pod going to be used for? In this case that is a CPU pod.

But we could also have other types of pods as you need them. We also have interface sorting methods here which is going to influence how the devices will be cabled. The device template that we are going to use for the Spine switches and it is linked back to the parent fabric.

So Wim the design template, maybe we can look at that for a minute. Could it be for example if I had a specific Arista switch that I want to use or if I have a Cisco or Juniper switch, should I create for example a template for each type of hardware that I want to use?

Yes. So for example it could be many things actually. Like it could be a combination of, what we see typically is that with our customers that sometimes they use the same type of switch but used for a different type of purpose.

In this case here we can see that we also created some specific Dell switches. We can see here that we’re using exactly the same model. But the Spine switch interface layout and what I mean with that is the roles that we use here to define where they are going to connect to is different than the same device model that we use as a Super Spine switch.

So here as well, is there any restrictions? Can I do any type of hardware with that or?

Yes, yes. There’s no specific restrictions on that. So we are not focusing on specific vendors.

It’s completely vendor agnostic. And in essence it’s object templates. So we’re creating a template here for a device, but you can create object templates for any type of schema that you define within the system, where that makes sense for your use case.

So for example we haven’t done it in this solution but you could easily create a template for a pod as well so that when you create a new pod you can select the template of the pods that you want to use that then can be adapted within your design constraints quite easily.

But yeah object templates can be used for any type of model that you define within the system.

I believe that just to be clear, like you could have for example a pod or a rack with specific vendors and you could easily be a multi-vendor environment here.

It’s not like there’s technically no constraints that you have to use a single vendor across all the fabric.

Absolutely. You could use a multi-vendor. You could build a multi-vendor environment with this. Yes, absolutely. Yep.

No restrictions on that at all.

Does that answer your question?

Yeah, that was perfect. Thank you.

Thank you. So then within a pod, we have defined racks so here as well we haven’t done it in this particular example but you could easily create object templates for the different types of racks that you want to deploy and what their layout should be.

Here we haven’t really done that. It was a manual input let’s say. So for a rack, we try to capture what the type of the rack is going to be. So in our case you can either be a compute or a storage rack.

We give it a name, some specific technical data. So for example here we define as well the amount of leaf switches we want to deploy. But in real world use cases this can be expanded.

For example there might be a mixture of leaf switches that you want to deploy combined with maybe an out-of-band switch or a console server or all of that. You can really extend the schema and define what you want to capture in this high level definition of the rack. We link it to the type of switch we want to deploy as a leaf switch and then it gets linked into the organizational structure that we use, linked back to the pod so that covers the high level definition inputs.

Of course there’s much more. So we have here defined all of the racks that we want to be deployed. And as you can see here today there’s no devices within this rack. In fact there’s no devices within this solution at all.

As we will see as well, the IP prefixes is completely empty. So yes, we have already defined the 10.0.0.0/8 supernet that we want to use in our organization. But there is no further subdivisions yet or no IP addresses being allocated out of this.

So now the goal is that from that high level definition we are going to create our fabric. So the first thing we are going to do, which is very typical to Infrahub, is before we start creating a change, we are going to create a new branch.

So let’s call this branch fabric A. So this now creates an isolated environment. So as you can see here, creating a branch is really quick. There’s no data copying or anything like that that happens.

This is really a construct that lives at the database level, that we can, you know it’s one of the strengths that we in Infrahub and why we use a graph database because we can introduce this notion of branching quite quickly.

So now within that branch, whenever, if this branch is active whenever I make a change here, nothing would basically affect the main branch. So all of the changes that I will now start doing are effectively happening within a specific branch.

So now that we have defined our design we can actually start creating the implementation. So what we are going to do here is trigger the Generator for a fabric. So we’re going to click on run and we’re going to select a specific target.

In this case we want to run the Generator for fabric A. So that task has now been kicked off and should complete fairly quickly. So let’s have a look at the running tasks.

So right now we can see, as we defined in the slides, that the fabric Generator is running for fabric A, which is going to create all of the Super Spine switches that we have.

And I forgot something, I guess.

No, okay, it’s fine. So we can see now that the task to generate fabric A has completed and it actually started to kick off automatically the Generators for the pods.

So it started to generate pod A2, pod A3, pod A1. But we can also already see that the generation of pod A1 effectively failed, and that is intentional. Because pod A1 is the pod in which we want to capture our Super Spine switches or the fabric level switches.

And the Generator is not intended to run on this specific type of pod, hence why it failed. So by now the pod Generators have already finished and started to kick off all of the rack Generators that are all running in parallel now.

So we’re doing layer by layer. That’s what you’re saying. Like we build the Super Spine first and then we build the first layer of the pods, the Spines and then the racks.

Correct. So remember that there was multiple reasons why we did that.

So I think one of them is the scalability. As you can see, this happened all quite fast. So all of the data by now has been generated, all of the tasks have been completed. So that’s one of the reasons, we saw that all of those rack Generators were running in parallel.

It was not a sequential process from the rack layer perspective, but we were able to parallelize at the rack layer. And another important aspect is that we can reuse this rack Generator just to create a new rack or to expand the fabric in a day two fashion later on.

So all of the tasks have been completed and if I now go to my devices, we can see that it has effectively generated 25 devices for us. So at the top we see the Super Spine switches.

Then we see the Spines for each specific pod. So pod one, pod two, pod three, pod four, sorry, pod three, pod two. And yeah, and then all of the leaf switches that go into specific racks.

So if we look at a specific device, we can see that we have automatically generated a hostname for it. Each device got assigned a role to it. We started to assign an IP address to it and it’s been linked into where it’s going to be positioned within the fabric.

So we can see here that this device is going to be positioned within rack A21 and rack A21 is part of pod A2. If we go and look at the interfaces of this device, we will see that a lot of the interfaces are still inactive, which is intentional because we can see here at the role that these are effectively interfaces to which we’re going to connect our compute to.

But if we look back at the uplink interfaces, let’s say, then we can see that they have been flagged as active and we can see that they’ve been used to connect to the Spines. We can also see that they have a specific MTU set to them.

An IP address in a /31 prefix has been allocated and we can effectively follow where this particular interface is connected. If we click on that we will see that this interface on the leaf switch Ethernet 27 is connected to the Spine switch for the correct pod on Interface Ethernet 1 and from there we can basically navigate to the pod device, the Spine switch, and that one also has its particular interface layout where we can see that the full fabric has basically been generated.

If we now look back for example at the IPAM data, we can see that all of the available address space to us has been allocated in a specific logic that was defined in our Generator.

So here we can see that a /16 was allocated according to our business rule, which is being used for or is allocated to our specific fabric. So we can see here that it has a specific role attached to it, and that address space has then been further defined into multiple other prefixes.

So for example, we can see here a /31 that has been allocated for a specific pod, to provide IP connectivity between the Spine layer and the Super Spine layer.

And if we look within that prefix we should see that specific IP addresses have been allocated as well.

So it’s really cool. All of that was generated by the Generator. None of it existed before. How it was designed was captured in the Python script in those Generators.

And that’s what defines all this logic, this business logic, as we say.

Correct.

Okay. Now just I think we have 10 more minutes left.

So I know there was a lot of additional cool things you wanted to cover. Speed it up.

So just to show you if we go back to the main branch, we don’t see any of those IP allocations in that yet.

So now we have prepared that change within our branch and we want to propose this implementation and bring it into the main branch or into production. So in Infrahub we do that with what we call a proposed change. A proposed change is very similar to a GitHub pull request.

So basically we’re going to propose the fabric that we have added and we want to ask our organization for a review. So anyone within the organization can now come in and basically review the change that we are proposing, the implementation of this fabric.

So the first thing that we will notice is that the data diff should be showing up here. Sorry, with this streaming my laptop becomes a little bit slower.

So we can see here now an actual data diff of what happened by triggering those Generators. So we can see here all of the devices that have been added. And if we scroll down we will see all of the IP addresses and IP prefixes that have been added.

The whole change that was done by this Generator can now be reviewed. Now we also talked about it already before so within this proposed change a CI pipeline has been triggered which is now going to validate all of the changes within this branch.

So basically from a data perspective at the database layer and at the schema layer we are going to look that everything is correct and that no mistakes were made. We haven’t done it in this demo, but we can define our own user level checks here that would validate the complete fabric against rules that you define for specific compliance.

If those checks would fail then the proposed change would be blocked from being merged. Now the other important aspect is that the CI pipeline also triggered artifacts to be generated. So from all of the data that we have now generated using those Generators we can now generate the full device configuration for each device in our fabric.

So here we’re looking briefly at a specific device configuration for a leaf device. So we can see the hostname has been configured, interfaces getting configured, etcetera, etcetera. Now if we scroll down we will see that also the Spine switches have been, the configuration of the Spine switches have been generated, the Super Spines and there is a fabric cabling plan as well.

So if we move back to a device quickly then we will see now that it has a specific artifact attached to it. And here we can see the startup configuration, which we can download from here or further inspect etc.

Also at the fabric object we can find a cabling plan artifact which shows us a nice rendered CSV which we can use and here download as well to send that to the cabling team.

So what else could you do with an artifact? Like what other, could you do a diagram for example of the pods or something more visual?

Yes. So we support generating SVG images so you can generate diagrams or other forms of documentation.

For example, what we have seen is for example customers automatically generating configuration documents for a specific device or specific fabric in markdown format that also get nicely rendered within the UI.

So there’s multiple types of artifacts that you can generate.

All of the documents that you need to generate from the data that you have captured within this infrastructure data management platform can be created. So let’s merge this proposed change.

It should be quite fast as well.

I think we have five more minutes maybe. I don’t know if we’ll be able to cover everything. I think we wanted to cover the day two. Yes. And I think we also maybe wanted to cover how could we integrate with AI? Which of the two do you think is the more interesting to maybe show?

I think we can combine the two. So what we want to perform is a day two operation, right? So we have our initial design. We generated the day one or the day zero of what our fabric should look like.

And now we’re 10 weeks into after the initial go live and the business comes back to us and says hey, we need to actually add a new rack. And hopefully it’s just one but it could be 10 or whatever.

So now how do we use this same implementation logic to be able to add capacity to this rack? So the nice thing by dividing everything up into layers now means that we can quite easily use the same logic in the same designs that we have done previously.

We can leverage them to do the lifecycling of this data center fabric. So we’re going to define a new rack with its specificities and then just run the rack Generator to extend the fabric and what that will automatically do is that two new leaf devices will be created and it will detect that it’s actually going to be leaf switches that need to be deployed within the existing fabric.

So because of all of the data that we have nicely captured within the solution, we can now determine which Spines we should be uplinking these leaf devices to. We can start allocating IP addresses from the specific pool that was allocated to the pod to which this rack is going to belong.

And then in the end we will see that the configurations get generated including the configuration of the Spine switches that need to be updated as well as the cabling plan. The initial plan was to first show you that in the UI, but we haven’t talked about AI enough yet.

So we have a demo where we want to showcase as well, how can we use AI with Infrahub? So we have significantly invested a lot of time lately to look at how AI can empower us to do even more with our platform.

And what we have seen is that a lot of our customers are looking into that as well. And we want to show what we have been building that allows you to create a more agentic workflow on top of Infrahub.

So we haven’t covered too much yet about what we have. But in the end it comes down to a set of an MCP server that we have created that is soon going to be shipped together with Infrahub that allows you to allow agents to communicate with Infrahub.

And on top of that we have created a skill that allows us to define a workflow for the agent to execute. So what this is going to allow us to do right now is basically trigger a skill which I called add rack, which is going to execute this day two workflow for us.

So it’s going to take a little bit of time because the system will need to determine exactly what it needs to do. It’s going to use the MCP server to understand what schemas have been loaded into the system, what existing objects already exist within the system.

So the agent will need to understand that there is a fabric and it’s going to present us with a few questions that we need to answer for it to fulfill this task to create a rack and in the backend it’s going to leverage all of the things that we have shown you already.

So what the agent is going to do is leverage the MCP server to create a new rack object, is going to be linked to the right other objects such as the pods and the fabric and then it’s going to trigger the Generator to run the rack Generator for that specific rack and open a proposed change for us.

So you mentioned MCP server. Maybe, I don’t know if everyone’s familiar. Is it something that’s shipped with Infrahub? Is it publicly available? Where can we find this MCP server?

The MCP server is not yet part of Infrahub.

But we have published a repository, Infrahub MCP, where you can find the MCP server, where you can find links to our documentation, where we tell you how to install it, how to set it up and what of the functionalities that it provides.

Now that being said, the MCP server has been a big focus of us lately and we are working heavily on improving it. So some of the functionality that we have today that we’re using in this demo is not yet publicly available but will be really soon.

And as we said before, in one of the next releases, actually in the 1.10 release which is the next release cycle that we’re starting, it’s going to be easier to, well it’s going to be bundled with Infrahub, let’s say.

So again my laptop is a bit slow and the workflow is taking much longer right now because of that.

Yeah, it’s always the trick with the AI demo. Like they’re really cool but they’re a bit slow. So here it’s gonna,

could you do the same from the UI maybe like if you wanted to. Not that we have to do it but like, the agent is just another interface to it. But you know the same could be done in many different ways. Like if someone doesn’t want to use an AI agent for example.

Yes, exactly. What we see with our customers is that a lot of them want to get to this notion of a service catalog where the services that they have to offer services to the rest of the organization. So in this case it could be that building data centers is a service that you want to offer to the rest of your organization.

And what we typically see is that those types of customers will build a service portal where anyone can go in and basically order a new data center from the networking team where they would be provided with the inputs and then generate, yeah, trigger those workflows from the portal.

The downside is of course that that workflow needs to be completely coded and one of the powers with AI is that we can in a more human language define exactly what our workflow should be.

I’m not sure what is going on.

I think we’re at time so maybe we’ll have to do another webinar just on that.

I know, there’s also a live stream coming up in a couple of weeks just on the MCP server and the skills. But that’s the yeah, that’s the joy of live demo.

But I think it’s always better to do a real demo, even if from time to time the demo god is not with us. It’s weird because it’s saying that it cannot reach but it is reading from. So I am, I’m sorry.

No worries. Maybe was there one or two closing slides before we close or check if there’s any questions from the audience?

So far no question. So I imagine it was clear.

Or maybe they were already looking at the GitHub repo while we were presenting.

That could be the case. So as we already talked about it before, so this solution that we have shown you is packaged and freely available to everyone.

You can find the complete code, the schemas, the artifact definition, everything that we have used today, in this particular GitHub repository under the OpsMill organization, probably solution AIDC. The whole workflow and how to set it up and how it effectively works in a lot more detail than we were able to cover today has been captured within our documentation.

So go there to find any additional more information you may need. Within the repository we have the schemas, the device templates, example data, Generators, everything that we have shown you today. If you want to figure out or if you want to read more about Infrahub in general I would advise you to go to our website, opsmill.com where you can find more information.

Here’s also the link to our public repository where you can start browsing the Infrahub code if you would like to. And there’s also the link to our general documentation website. And last but not least, we have been talking a lot about the design driven network automation approach here.

And last year at NANOG, we did a presentation covering more in depth what design driven network automation exactly means and what the different characteristics of design driven network automation are. So you can watch that YouTube video to learn more about it, and that is it.

Perfect.

Thank you, Wim.

Thank you. Yeah, I think hopefully that, you know, gives a really good path for everyone to get started and play with the project. So thank you, everyone, for joining. Sorry we’re a little bit over, but yeah, thank you.

Thank you, everyone.

How to Automate AI Data Center Deployments at Scale with the Infrahub AI DC Solution (Webinar Recap)

In this post

Full Transcript:

REQUEST A DEMO

See what Infrahub can do for you

Fantastic! 🙌