Automation purists will likely tell you the best systems remove the human element completely. In effect, the goal of true automation should be to eliminate the need for human interaction.
While automating the toil carried out by our teams is a key strategic goal, at bet365 we’ve found that it’s nearly always a balancing act. When dealing with highly complex technical architecture and ecosystems as we do, it’s both necessary and preferable to keep humans looped in.
A good example is ASAP, our automated server and application provisioning system. When the system design was first conceived, it was intended to remove all human interaction in the provisioning request process. For good reason. Prior to its development, the server request workflow was driven by manually entering configuration detail into a lengthy request form. Due to the complexity of the data collection, it was time-intensive, required lots of checking and coordination, and was prone to errors.
To inject some urgency into the process, the mantra was always to get as much done as you can as soon as you can. All too often this meant starting provisioning before the project requirements were clear. Typically, when the form was filled out, the requester wouldn’t have all the information needed. The result was on most occasions, the request would need to be modified and added to as the project advanced.
Things would inevitably change as the project ran its course making it common for necessary configuration changes to be made late in a project’s cycle. This would mean the final requirements would not be met, the system wouldn’t work as required and rework was nearly always necessary. The design process was fluid and complicated – form filling was static and uncollaborative.
It got to the point where the time taken was becoming unacceptable. We needed to find a way that provisioning could be settled within a few hours of the request. When looking at how that could be achieved the biggest challenge would be how to take the complexity of data collation and automate it.
The Challenge of Automating Complexity
The initial idea was to serve users with a form where they could fill in a few fields, click ‘Go’ and the system would take care of the rest. However, we found the platform was too complex to enable this type of self-provisioning.
To get a feel for the kind of complexity we face, consider the following. The hosting of our platform is fully on-premises in our own data centres. Each year, we provide thousands of servers, which must be built, configured, OS installed, patched, placed on the network, and installed with our products. The platform is vast, comprising over 1,000 products, 20,000 servers, several globally distributed data centers, and a complex network topology that needs to remain secure and compliant with diverse regulatory requirements.
When addressing the creation of a fully automated provisioning system, our key challenge was how we deal with the vast array of configuration detail scattered across a varied, complex, and sizeable product base.
Before you can add anything new, you have to consider the existing topology. Any change to the underlying infrastructure can have an undesirable impact if not configured correctly. Therefore, requests for new servers need to be closely managed to ensure that each new environment connects seamlessly with the existing platform.
There was also additional complexity that rose out of our desire to automate. When a process is manual, you can tolerate a certain amount of fluctuation that you can’t when you’re automating. For automation to work, the input must be correct and complete at the beginning of the process and the outcome the same every time.
In summary, we determined that it was important to ensure that the design process was initiated with dialogue and collaboration, not form-filling. Then through the design process itself, the correct data could be determined and fed into ASAP by the design team, not the requestor.
Building a Strong Foundation
Thankfully, the automation work had begun sometime earlier. Our platform teams had already started to produce APIs for their areas. This meant we had a good starting point for creating an orchestration layer that called into the platform APIs with the right data at the right time.
Rather than expecting requesters to work out the configuration data and manually enter it into a form, we began by challenging ourselves to find sources of required data in the existing platform.
We realised that we could use a combination of CMDB and existing release configuration together with lookups from the platform APIs. As we developed the orchestration workflow, it highlighted areas that still required manual intervention. Fortunately, all the teams involved were committed to automation and the initiative gained momentum as confidence grew.
Putting the System to the Test
The project has been a resounding success and has given us the speed, consistency, and predictability we were looking for. Servers don’t need to be built months before they’re needed. Instead, we can build them Just-in-Time – when the project requirements are clear. We can start the collaboration, but we don’t need to close the file until the very last moment.
It ensures we have the best chance of getting it right and it not being subject to change. It encourages collaboration throughout the project lifecycle, and it ensures we’re not having to repeatedly go back and make amendments. We’ve reduced rework and thankless manual activity, which means we can focus on the goals rather than the toil.
Since implementing ASAP, provisioning can be done in as little as an hour. In a few weeks, we built thousands of servers in a new data centre in the US which really put the fledgling system to the test. Critically, we’ve taken the pressure off our people and made concrete steps to ensure they always get exactly what they need, when they need it.
To ASAP or Not To ASAP
We now have an ASAP-first mentality. Most VM builds can already go through ASAP and where they can’t, we’re challenging the reasons why. For example, we are working on including aspects of physical server provisioning.
However, despite our advances in developing standards and guardrails, we have realised it is still important for the requester to interact with someone from the DevOps team to help translate the requirement into a platform design that feeds ASAP. This ensures the platform is protected while product teams get what they need.
Article written by Steven Briggs, Head of DevOps, at bet365