Software development pipeline - Design flexibility

Wednesday, October 31, 2018 | Posted in

The fourth property to consider is flexibility, i.e. the ability of the pipeline to be able to be modified or adapted without requiring large changes to be made to the underlying pipeline code and services.

A pipeline should be flexible because the products being build, tested and deployed with that pipeline may require different workflows or processes in order for them to complete all the stages in the pipeline. For example building and packaging a library will require a different approach then building, testing and deploying a cloud service. Additionally the different stages in the pipeline will require different approaches, e.g. build steps will in general be executed by a build system returning the results in a synchronous way, however test steps might run on a different machine from the process that controls the test steps so those results might come back via an asynchronous route. Finally flexibility in the pipeline also improves resilience since in case of a disruption an adaptable or flexible pipeline will allow restoring services through alternate means.

Making a flexible pipeline is achieved in the same way flexibility is achieved in other software products, by using modular parts, standard inputs and outputs and carefully considered design. Some of the appropriate options are for instance:

  • Split the pipeline into stages that take standard inputs and deliver standard outputs. There might be many different types of inputs and outputs but they should be known and easily shared between processes and applications. There can be one or more stages, e.g. build, test and deploy, which are dependent on each other only through their inputs and outputs. This allows adding more stages if required.
  • Allow steps or stages in the pipeline to be started through a response to a standard notification. That allows each step to determine what information it needs to start execution. Additional information can be downloaded from the appropriate sources upon receiving a notification. This approach allows notifications to be generic while steps can still acquire the information they need to execute. Additionally having pipeline steps respond to notifications means that it is very easy to add new steps in the process because a new executor only has to be instantiated and connected to the message source, e.g. a distributed queue.
  • If a stage consists of multiple, dependent steps, then it should be easy to add and remove steps based on the requirements. In these cases it would generally be preferred that a stage like this executes one or more scripts as they are easier to extend than services. As with the stages steps should ideally use well-known inputs and produce well-known outputs.
  • Inputs for stages and steps are for instance
    • Source information, e.g. a commit ID
    • Artefacts, e.g. packages installers, zip files etc.
    • Meta data, additional information attached to a given output or input, e.g. build or test results
  • Outputs generated by stages and steps are for instance

Flexibility of the workflow is can further be improved by making sure that the artefacts generated in the pipeline are not created, tested and deployed in a single monolithic process even if the end result should be a single artefact. In many cases artefacts can be assembled from smaller components. Using this approach improves the workflow for the development teams because smaller components can be created much quicker and in general assembly of a larger piece from components is quicker and more flexible than regeneration of the entire piece from scratch. In many cases only a few components will be recreated which both saves time and allows much of the process to be executed in parallel.

The exact implementation of the pipeline determines how flexible and easy to extend it will be. Given that the use and implementation of the pipeline vary quite a lot it is hard to provide detailed implementation details, however some standard suggestions are:

  • Keep the build part of the pipeline described in the scripts given that scripts are, in general, easier to adapt. By pulling the scripts from a package, e.g. a NuGet or NPM package, it is quick and easy to update to a later version of these scripts. An additional benefit of keeping the process in the scripts is that developers can execute the individual steps of the pipeline from their local machines. That allows them to ensure builds / tests work before pushing to the pipeline and provides a means of building things if the pipeline is not available.
  • Any part of the process that cannot be done by a script, e.g. test systems, items that need services, e.g. certificate signing, which require that the certificates are present on the current machine, something which might not be possible to do on every machine etc., should have a service that is available to both the pipeline and the developers executing the scripts locally. For any services that should only be provided to the build server, e.g. signing, the scripts should allow skipping the steps that need the service.
  • For stages that execute scripts, e.g. the build stage, jobs can be automatically generated from information stored in source control. This makes it easy to update the actions executed by these stages without requiring developers to perform the configuration manually.

As a final note one should consider how the pipeline will be described. It is easier to reason about a pipeline if the entire description of that pipeline is stored in a single file, ideally in source control. However as the pipeline evolves and more steps and stages are executed in parallel it will become increasingly difficult to capture the entire pipeline in a single file. While harder to reason about it is in the end simpler and more flexible to let the pipeline layout, as in the stages, steps and orders of these items, be determined by the executors that are available and listening for notifications. That way it's easy to change the layout of the pipeline.

And with that we have come to the end of this journey into the guiding principles of designing a build and release pipeline. There are of course many additions that can be made with regards to the general design process and even more additions for specific use cases. Those however will have to wait until another post.

Edits

  • December 3rd 2018: Fixed a typo in the post title
comments powered by Disqus