One of the talks at DevOps Enterprise Summit 2016 in London that received the most traction was the story of how the UK's Revenue and Customs agency applied DevOps and Continuous Delivery principles to move from a bureaucratic culture to frequent delivery of digital tax services, learning and adapting from incremental successes and the occasional failures.
InfoQ reached out to Lyndsay Prewer, one of the co-presenters, to dig deeper into how this journey started, where the agency is at today and what the main challenges until now have been.
InfoQ: Can you tell us a bit more about your current role?
Lyndsay Prewer: I’m a Delivery Lead at Equal Experts, working with HMRC Digital. I lead a product team in the delivery of features that improve the security of, and ease of access to, HMRC’s digital services. As such I’m often co-ordinating work across several teams, helping them to deliver service improvements in small, regular increments.
InfoQ: How and when did you first hear about DevOps?
Prewer: Prior to this role, I was helping a private sector organisation that went from one release a year to weekly releases. This was a huge step towards establishing a Continuous Delivery culture, but there was still a long way to go. Development and Operations were still separate groups, with separate leaders, and this significantly hampered further progress. That was when I first heard about DevOps, and also read the Phoenix Project. We knew that establishing a DevOps culture was the key to improving our Continuous Delivery, but it proved a very tough nut to crack.
InfoQ: How did DevOps get started at HMRC Digital? What were the first steps taken and why?
Prewer: Before HMRC Digital’s existence, HMRC outsourced all its IT, and the nature of the contract led to a heavyweight delivery process: think weighty documents, long lead times, infrequent releases and periodic change freezes. When HMRC Digital was conceived, it started small, with just one team focused on delivering IT solutions in small, regular increments. HMRC lacked this kind of expertise and culture, so experts were brought in from other organisations to help things get started. HMRC provided the business knowledge and direction; the Government Digital Service provided a framework with a focus on user needs (and helped cut through bureaucratic red tape); and Equal Experts, followed by other consultancies, provided the skills and expertise required to iteratively design, build and deliver solutions, using agile, lean and DevOps practices.
InfoQ: Was it mostly a grassroots movement? Or was there a top-down understanding that DevOps was needed?
Prewer: HMRC knew it needed to change, but didn’t know how. The experts brought in knew that successful organisational change is enabled by proving that the new approach delivers value. Thus, the founding team used agile, lean and DevOps practices to deliver small changes, early and often, into production. A good example of this was the first online Tax Credits Renewal service, which was built in just eight weeks. It enabled significant numbers of users to renew their tax credits online, removing the need for them to use the less convenient and more costly telephone and paper channels. This was just one of many successful deliveries that gave HMRC the confidence to invest further in HMRC Digital.
A fundamental part of the organisation’s growth has been the building of a Platform-as-a-Service (PaaS), enabling product teams to have autonomy and ownership, via enhanced DevOps capabilities. When a new product team comes onto the platform, they immediately get the benefit of an automated build and deployment pipeline. In addition, the services they build can leverage a range of platform services (such as authorisation, auditing, monitoring and alerting). This frees each team up, so that the team building a service is also responsible for running it in production – following Amazon’s you build it, you run it principle.
InfoQ: Which DevOps initiatives are currently going on at HMRC Digital? Did they involve organizational changes?
Prewer: Each year, HMRC’s online traffic grows exponentially from December up to 31st January, which is the deadline for filing Self Assessment tax returns. For this year’s Self Assessment peak, HMRC Digital made their platform multi-active, by adding a second cloud provider. The second provider was selected in October 2015 and operational by December, all with minimal impact on the product teams and their services. A key factor in the speed, success and minimal impact of this significant infrastructural change was the design of HMRC Digital’s PaaS, and the autonomy of each product team. There are fifty product teams, and only two WebOps teams that support the platform. WebOps solely focus on the platform infrastructure, leaving the product teams to focus exclusively on their own services.
InfoQ: How is HMRC Digital planning (or already working on) to disseminate DevOps in the larger organization?
Prewer: HMRC Digital’s PaaS incorporates excellent monitoring and alerting tools (the ELK stack, plus Sensu and PagerDuty). Tools alone are of no help if teams don’t use them. Training over fifty teams in how to best configure and use these tools was quite a challenge. The approach to solving this was:
1. A service catalog, that maps microservices to teams.
2. Automated setup of monitoring and alerting tools, for each team and their microservices.
3. Regular internal blog posts and show-and-tell sessions that highlighted how different teams were using these tools to better support their services and ultimately their users.
The first two steps made it trivial for teams to access pre-defined kibana and grafana dashboards for each of their microservices. They still also have the freedom to create their own dashboards.
In addition, alerts are now automatically sent to each team for each of their microservices, once a given error threshold is reached. The threshold can be customised per microservice, allowing each team to tailor the tools to their own needs.
InfoQ: Which other cultural or technical challenges have DevOps initiatives faced at HMRC Digital?
Prewer: HMRC Digital deploys to production multiple times per day, and has a platform running over 300 microservices. This is all in the context of a legacy estate that has multi-month release cycles, change freezes around peak business events, and lengthy end-to-end testing schedules.
We frequently have to plumb our digital services into legacy back-ends, and so there has been some friction as the two different cultures meet. Our journey over the last three years has seen this friction reduce though, as we’ve learnt better ways of working together. Multi-month release cycles have been shortened for frequently changing systems, change freezes are no longer the norm, and the use of stubs and contracts has reduced the need for end-to-end testing.
InfoQ: The "2016 State of DevOps Report" suggests that investing in DevOps and Continuous Delivery practices leads to faster, more reliable delivery of business value. Do you agree? And if so, have you come across concrete examples in your organization backing up that claim?
Prewer: I agree 100%. From the outset, we’ve designed and delivered services in an iterative manner. We strive to get the smallest thing into production as early as possible, then iterate from there. A good example of this is the “Tax Account Router”. This was a microservice introduced in November 2015, ahead of our peak business event in January 2016 (the Self Assessment deadline). Its purpose was to route all Self Assessment users to the appropriate landing page, depending on what type of user they were (i.e. Business, Individual or Agent).
The first version we put into production (six weeks from the team starting) didn’t actually route any users; it simply generated metrics around how many users it would have routed. This allowed us to validate its behaviour with zero impact on real users. We then added routing, but behind a throttle, so we could again validate with minimal impact. This is an example of how each release added a small change, minimising risk and waste. We’re able to do this because our build and deployment pipeline allows code changes to be in production within a few hours. This is following normal practice and procedure – not a case of cutting any corners!
InfoQ: What challenges and roadblocks lie ahead in HMRC Digital’s DevOps journey?
Prewer: DevOps has enabled HMRC Digital to make daily improvements, in production, to its digital services. This is great for the business and its users, but presents the challenge of how to keep the rest of HMRC in sync with the nuances of the services being provided.
HMRC Digital has made great improvements in its service delivery by forming small, focused product teams that join up business, development and operations expertise. The next step is to improve the integration and feedback loop within the broader HMRC organisation.