Don’t Warm the Wires: Avoiding Incidents Caused by Change When Implementing Network Automation

Posted on May 17, 2024

When you make a network change, where do you get your information from?

Often, network engineers rely solely on memory and previous hands-on understanding of the network when making changes. Here’s a scenario I heard from Michael Wynston, Director of Network Automation & Architecture at Fiserv, on our recent Packet Pushers episode:

“When an engineer needs to do something, add a new VLAN to a switch, you SSH to the 10 switches that you know of. And you find the VLAN number that’s not being used. Excellent.

“Well, there are 12 switches. And that’s just in that one pod. And, oh, by the way, the VLAN you picked happens to be a transit VLAN that you didn’t see on those 10 switches, because it’s only on these two.

“And now that you’ve added it to those other 10, those other 10 become transit, you created a spanning tree loop, and happy day, the wires are warm.

“That’s the kind of mistake that engineers make. And it’s called an incident caused by change. And there’s no way, with over 20,000 devices, any engineer can say, without knowing, that they actually know anything.”

How to Avoid Incidents Caused by Change

As networks scale up, these kinds of incidents can become more common unless teams adopt new strategies and tools. The example above came up as part of a conversation around automation and how engineers source needed information. In Fiserv’s case, with over 20,000 devices in their network, there’s literally no way a network engineer can hold the information required to avoid incidents in their head — and the same is true for the majority of organizations, especially as networks continue to expand.

How do you get around this?

It’s all about creating processes you can trust, where systems of record are always authoritative and are automatically updated when changes are made so they are always trustworthy. Engineers need the ability to access information they know is accurate. That doesn’t mean a single source of truth — as we’ve covered previously, attempting to build a single source of truth is not only difficult and time-consuming, but also creates more opportunities for data inaccuracy because you’re duplicating data from one source to another.

Instead, the focus must be on ensuring different systems of record are always accurate, always up-to-date, and always easily accessible to any network engineer, automated process, or network or IT system that must source authoritative data to execute a change.

Introducing Automation Means Standardizing How Processes Interact With Systems of Record

The push to introduce more automation in networking is driving teams to rethink how they interact with systems of record to make network changes. Automated network changes and self-serve delivery isn’t really automated or self-serve if a network engineer still needs to manually source information. The challenge is, information gathering for manual changes is often ad-hoc and can be different between different network engineers.

Inconsistent information gathering already creates incidents when teams do things manually. But it’s a problem that has been, in some ways, tolerated for a while now. However, as we move further toward an automation-driven model for networking, we need to ensure we’re addressing how processes can source authoritative data about all network devices, domains, cloud environments, etc.

Coming back to the example above, a trustworthy system of record for current VLAN allocation could have saved a nasty spanning tree outage. If the network engineer had been able to reference a system of record they knew would contain accurate information, they could have queried the database/system to determine the next unused VLAN or queried the system for the VLAN they had thought was unused to find that it was already allocated to use as a transit. Either way, they avoid the incident caused by human error.

And even when authoritative systems of record do exist, network engineers are often working under intense pressure and time constraints — bringing me back to that automation point. In a team that’s trying to keep up with manual activity, an engineer might decide, I’ll just verify what I can use from the router I’m already changing, because I’m already accessing it. That approach can lead to outages too. But, if processes are automated, and every process has access to authoritative systems of record, queries happen quickly as part of an automated workflow.

This is why standardization with regards to interacting with systems of record is so critical to success as networking continues to evolve.

Your Orchestration Platform Must Integrate With Every System of Record

All of this is to say, if you’re investing in automation and orchestration with the goal of orchestrating workflows that can drive network changes end-to-end, you need the orchestration solution to be able to interact with every system of record in your environment.

Itential offers a few key advantages over other platforms on the market that make our platform the ideal solution here. Itential’s integration model is based on consuming APIs, and the platform can auto-generate integrations with other systems based on API documents. It’s a vendor-agnostic approach that allows teams to incorporate any system that exposes an API into an Itential orchestration workflow. This allows network engineers to build workflows that always both pull information from systems of record and update those systems when changes are made, ensuring no “rogue” changes happen and keeping the systems of record accurate and up to date.

In addition, Itential allows for multi-domain orchestration across all of your network infrastructure. Therefore, the central platform which interacts with every system of record is also the only platform network and infrastructure teams need to use to orchestrate change processes.

The third major point I want to highlight is Itential’s ability to expose its API for consumption by other platforms and users, which enables the delivery of all orchestrated network changes for end users, pipelines, and platforms to self-serve.

This is crucial — in a manual, ticket-driven world, network changes aren’t standard. Sometimes, a network engineer will query the right system of record, another time, they won’t. Someone might forget to update a system of record when a change is made. Someone might skip steps when making a change if it’s late in the day and it’s urgent. All of this causes more risk.

If instead, the only way a network change can be made is when someone requests the same orchestrated service — wherever that’s being exposed—then it’s not possible for “rogue” changes to be made, it’s not possible for human error to create issues with systems of record, and it’s not possible for anyone to skip steps.

Let’s revisit the story from the beginning. In Fiserv’s case, they were able to use Itential to reduce the amount of different pathways engineers take to make changes while simultaneously delivering network services across the entirety of their organization in a standardized manner. This has reduced the risk of human error and lowered the number of incidents they face, even as their network continues to expand and they evolve toward a fully self-serve model. You can listen to this Packet Pushers episode to hear the full story.

Tags: End-to-End Automation Network Orchestration Source of Truth

Rich Martin

Director of Technical Marketing ‐ Itential

Rich Martin is the Director of Technical Marketing at Itential. Previously, Rich has worked at several networking vendors as a both a Pre-Sales Systems Engineer and Systems Engineering Manager but started his career with a background in software development and Linux. He has a passion for automation in the networking domain, and at Itential he helps networking teams to get started quickly and move forward successfully on their network automation journey.

Filter

Sort By

Itential Cloud

Solutions

Resources

Partners

About Us

Network Orchestration

Don’t Warm the Wires: Avoiding Incidents Caused by Change When Implementing Network Automation

Rich Martin

How to Avoid Incidents Caused by Change

Introducing Automation Means Standardizing How Processes Interact With Systems of Record

Your Orchestration Platform Must Integrate With Every System of Record

Rich Martin

Stay in the loop with Itential.

Filter

Sort By

Network Orchestration

Don’t Warm the Wires: Avoiding Incidents Caused by Change When Implementing Network Automation

Rich Martin

Share this:

How to Avoid Incidents Caused by Change

Introducing Automation Means Standardizing How Processes Interact With Systems of Record

Your Orchestration Platform Must Integrate With Every System of Record

Rich Martin

Subscribe to our Newsletter

Sign up to receive the latest news, content & events from Itential.

Related Content

Demo

Integrating NetBox as a Source of Truth for Network Automation

Blog

You Need a Single Source of Trust, Not Truth, for Network Automation

Podcast

Packet Pushers: From Automation to Orchestration for a FinTech Network