Sundays at my house are always a bit frantic, with a number of chores that need to get done to prepare for the week ahead. Several of these chores are time-consuming affairs, like the laundry. Each cycle takes about 90 minutes to complete. With two teenage children, my wife, and myself, we generally have at least 4 cycles to run. While the laundry is running, we are able to do other chores. Sometimes we assign the chores out and everyone does their part in parallel. When we get this right, we are usually finished by the early afternoon. It makes no sense to do this any other way — life is short, and Sundays are precious.
When we are all working in parallel, we often have to consider things that an individual would not. My daughter and son cannot both vacuum their bedrooms at the same time, as we have only one vacuum. The laundry machines must not be idle, or we risk consuming more time to complete everything. Basically, it’s more complicated when we are all working together, but the payoff is having the afternoon free.
The same can be said for network automation. Running things in parallel can be more complicated, but the payoff is completing things more quickly and becoming more productive.
Working in Parallel in the Itential Automation Platform
Itential provides a way to execute a child workflow in a parallel loop. It works by passing an array of data into the child job and then selecting parallel execution. The workflow engine will then run the child job on all items in the array at roughly the same time. You can see two examples in this step-by-step demo.
Using this feature allows for much faster execution of a workflow, which can be significant when looking at a process that must be executed frequently at scale. It’s a useful tool for network teams to make workflows as efficient as possible. The trade-off is increased complexity, but learning the best ways to build and run parallel jobs in your own environment can add a lot of value.
We often are asked about the limitations of parallelism in Itential’s platform. The usual question we get is: “how many jobs can run at the same time?”
The question itself is a bit overly simplistic which makes it hard to answer. It’s like asking how big a house should be. Well, it depends.
How many tasks are you running in parallel? What are those tasks doing? Are they making calls to external systems? Are they upgrading device OS or configurations? Are they CPU intensive? Does the external system have limitations on concurrent calls (probably)? Is the automation time bound, meaning it has to complete in a certain number of minutes? Is there a large amount of data that is being passed through the child automation? Are parts of the automation unnecessary in a parallel format, like metrics reporting?
In other words, to provide a good answer, we need to understand many details of the workflow. We need to know how many windows, bedrooms, and bathrooms your house will have and if we are building it on a plain or mountaintop.
The workflow engine will do its best to accommodate the parallel execution. If you provide a list of 10000 items to execute it will try to do that. The side effect to this approach is that it might exhaust the server. Our experience has been to provide some boundaries to the parallel execution, for example, run 100 at a time and iterate through groups of 100.
The Power of Parallel Jobs
Many of our customers are successfully using parallel child jobs in loops to do some extraordinary things. None of them are doing this in an unbounded manner. They are all doing these parallel jobs in batches. I’m aware of one customer that is doing 10000’s of show commands on 100’s of devices. They have to do this during a maintenance window, so the automation has to be as fast as possible. The way they built it, it’s batching the devices, then batching the commands and using parallel loops inside of parallel loops — completing in about 10 minutes. It took them many weeks and much experimentation to find the correct batch size. None of their batches are uniform in length either, some are 100, some 10, some 1. It was an engineering effort.
When faced with a time constraint every task in the automation will need to be scrutinized. Define exactly what needs to run in parallel and only do that. The customer above started their project with the requirements that the workflow must run many commands, it must persist the output of those commands, it must generate a report of the results. But after much discussion, and an open mind, the customer realized that only the running of the commands needed to happen in parallel. The rest can happen after. Their automation in the end looked very different than their initial whiteboard sketches. They had to think differently about the problem to build the most efficient solution.
Best Practices for Parallel Jobs
Currently, there is no concurrency throttling built into the system — the workflow developer has the freedom to build workflows that suit their needs, as well as the responsibility to build workflows that do not exhaust the server. These practices can help:
- A good approach is to make a JSON Schema Transformation (JST) that makes an array of arrays and make the array length of the inner array a parameter so that it is easy to change and reuse.
- Make sure that the child job ALWAYS finishes, error or not, so you don’t find yourself in a situation where the parallel automations are half completed. After the parallel task there can be some “accounting” done to identify any errors.
- Use as few tasks as possible — JST can help consolidate tasks.
- Beware of adapter calls to external systems. They introduce variables, like network latency, that are outside of your control. Also, consider the external system and if that system can support a large number of concurrent requests.
As for the initial question, the recommended number of parallel runs when using parallel loops? The answer is whatever is most appropriate for your needs. We’ve worked with many customer teams to find the right balance for successful parallelism in their environments, and it can take some iteration to get to where we need to be.
Start with a reasonable number of parallel runs as a goal — 100 is common — and if this satisfies all your business needs, then great! If it does not, then we might have to apply some engineering and examine the automation, the environment, and the goals.
When & Why to Use Parallel Jobs
The promise of parallel automations is that things will get done faster. This is certainly true, and when done right, the time savings can have a significant impact on business-critical IT processes. However, such an approach does not come for free. The temptation in a low code environment is that the workflow designer can quickly build automations and recklessly make them parallel. Be thoughtful about leveraging the parallel feature. Be open minded too. Parallelism will likely require thinking about the automation differently — but when teams get it right, they are rewarded with a nice payoff. Almost as nice as a free Sunday afternoon.