Steve Smith presented at Devoxx UK 2015 how Continuous Delivery is performed within Atlassian. After the talk, we had the chance to discuss further details of his presentation and ask him a few questions.
InfoQ: In your talk you mentioned that when adopting continuous delivery you faced both technical and organisational challenges, can you discuss some of those challenges?
Steve: Actually, when we adopted continuous delivery we faced three kind of issues: technical, organisational and regulatory.
Beginning with the technical, there are the obvious ones: setting up the servers, provisioning the machines, creating all the necessary scripts to automatically deploy the relevant changes, etc. But there are other, more subtle challenges as well that arise from the need of higher communication. Once you create an environment where changes can be promoted to production at the click of a button, you need to make sure all the different teams keep fluid and open communications so the increased pace of change doesn’t get anybody off-guard. Atlassian is a fairly distributed company, with offices in Sydney, Vietnam, Amsterdam and San Francisco, and the traditional communication channels weren’t rich enough. We needed a communication means that let people not simply exchange messages, but also build rapport, create a relationship. So investing in technologies that easily allowed for things like high-quality videoconference, sharing images, group chats, etc. was essential.
Another problem related to distributed teams is the different timezones. When you work in the same timezone and you need to collaborate with a peer, even if they are located at a different office, you can always start a videoconference and share your screen. But when people work in a different timezone you need tools that allow you to collaborate offline. You can use a pull request to ask somebody to review a change for you, but at the same time this needs to come with tools that let you explore the changes easily and provide feedback in an integrated manner. And all these tools need to be integrated with each other so as to provide a seamless workflow.
This leads to the organisational changes. People in general are very wary of change. When developers get to used to the idea of spawning new services, sysadmins become concerned of all the changes in the network that this may imply. When developers start deploying containerised applications (using Docker or similar), sysadmins worry about the software that may be included within the container and the overall effect it can have. Similarly, when sysadmins ask the developers to create their applications in a particular manner, or to refrain from using particular technologies, developers frown upon their lack of freedom. Continuous delivery only works if everybody is onboard, changes needed to come out of the cooperation of the different teams.
Finally, we faced a number of regulatory issues. Atlassian promotes a culture of “being the change you seek”, which encourages people to push for the change they want to see. This made people break the technology barrier and encourage the human interaction, although it also creates a number of questions like who manages risk or who is the ultimate business owner of a change. Atlassian processes large numbers of customer orders and we have to manage risk around that, which means that regulations like SOX and PCI have influenced our internal processes. Duties need to be appropriately segregated. The same person that changes the software cannot be the one that pushes the change to production, so the continuous delivery service needs to include controls that allow specific roles to validate at specific checkpoints.
InfoQ: Can you discuss how the tight integration of issue tracking, VCS, and CI can add value for development teams.
Steve: The idea of creating flows is important. The issue tracking system creates a flow identifier for everything that happens to a particular request, allowing anybody to follow its status all the way from inception to production. By integrating the issue tracking system with VCS and CI, whoever raised the ticket can see what is happening with it: when development starts, when there are issues with it and what it is being done about it, when it’s being deployed, etc. They become more informed and therefore happier with the progress.
This also creates a positive side-effect on developers: since stakeholders can find the information they need when they need it, there is no need to poke developers, sysadmins, or anybody else for it. This is a commonly appreciated trend, for instance most cloud systems include a status page that shows the status of each system so as to keep everybody informed.
In our case, the issue tracking, version control and CI systems were all Atlassian products, which means they were tightly integrated from the outset. However, they integrate with each other through publicly available RESTful APIs, which means any other product could work just as well provided they make the relevant calls to the relevant endpoints.
InfoQ: Using feature branches is at the core of the methodology that you presented, what do you think are the downsides of this approach against an all-in-master strategy?
Steve: If you have multiple features that are interlocked, it can become very complicated. If they depend on each other or require changes on each other, it creates problems. However, when this happens, it probably means they are actually the same feature and the two issues should be merged together.
Sometimes the interdependency has a more technical nature. If the exposed APIs aren’t opaque enough people can make assumptions about the internal workings of other systems, creating artificial interdependencies. This makes following good programming practices key, so as to make sure technical dependencies don’t grow into business dependencies that prevent parallel functionalities to be developed in parallel branches.
Finally, you need to keep in mind that when you create a feature branch you are diverting from the main line, so you need to rebase or pull from the master branch often to make sure your feature branch doesn’t become stale, otherwise merging in the end could be challenging.
InfoQ: As you pointed out during your talk, implementing continuous delivery and running tests in feature branches comes at a cost that may put some organisations away, what arguments can teams present to justify the investment on the necessary infrastructure?
Steve: This comes back to the very idea of testing: tests are a bunch of code that is written but never makes it to production, which was initially perceived as a waste by some people. However, over time a number of studies have demonstrated the value of tests.
The same argument can be applied to continuous delivery: on every non-trivial software project the development cost is going to be the highest, and therefore making sure that software works justifies the costs of the necessary infrastructure for all the testing. Resources needed to run the tests are usually cheaper than the developer time and the impact of not finding the bugs on time, which means it is usually a worthy resource This is the reason why many CI systems include the capability to scale up on demand using cloud resources, because the extra cost of temporarily scaling up to satisfy peaks of activity is usually worth it.
At the end of the day, this added cost is simply taking the agile philosophy to the next level: provide small feedback loops and deliver as soon as possible.
InfoQ: What are the next challenges that we will face in the continuous delivery space?
Steve: The people who are doing continuous delivery now are the people who are on the bleeding edge of technology. CD is only now beginning to become common. The main problem may be selling it internally: it takes long to set up these things and it can be difficult to convince more conservative managers about the value of CD. It’s all about education, telling people that is going to benefit them, and adapting the message to all the roles (sys admins, stakeholders, risk managers, etc), making them think it’s their own idea. You can’t impose these things, or do it just because everybody is doing it too.
Large organisations need to understand that every CD system will be different, finding the commonalities to make this a repeatable process may be hard. Also, there may be trade-offs that may make it not beneficial for everybody, so analysis will be needed for each particular case. It needs a culture where you react to change, as opposed to setting everything in stone from the beginning.
The set-up process may be aided by containersation technologies, although they come with their own challenges. The main problem will come around security, particularly in terms of managing and keeping software within the containers up to date. Traditionally, sysadmins were responsible for this, but now that developers can create containers the responsibility spreads to other teams too. Ownership needs to be redefined, maybe shared.
On the other hand, technologies used to set up a large number of machines, like puppet and chef, are managed through configurations that are looking more and more like code. This means that sysadmins will have to adopt some software development practices. And, if business analysts are going to be part of the continuous delivery process by validating what goes to production and when, they will have to become acquainted with the tools that developments use.
This means everybody in the organisation will have to become a bit of a generalist, and companies will have to encourage cross-functional training to achieve this. Pull requests can help with this, each department or role can ask the other department to review and approve a change before it goes ahead.
Steve Smith has worked at Atlassian for over 8 years, both as a sysadmin and a developer. He now works out of Atlassian's Amsterdam offices, focusing on high-availability, continuous-delivery and platform migration issues. He is also a regular public speaker and blogger, and likes to discuss about continuous-delivery in Atlassian's development blog.