BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Continuous Delivery and the Four Principles of Low-Risk Software Releases

Continuous Delivery and the Four Principles of Low-Risk Software Releases

Bookmarks
   

1. Jez, it’s been a year since we’ve talked, it’s been a year since the Continuous Delivery book came out. How have things been going in the past year?

It’s been an exciting year I think. When we wrote the book we thought we were going to write a very boring niche book about build and release management which hardly no one would read, and I set an expectation with my wife, we’ll make $5,000 out of this book over it’s entire life span if we’re lucky and it’s been wildly successful. I think purely by accident there’ve been a number of things that came along at the same time: the Lean start up movement, all the stuff around devops, stuff to do with the clouds and all these different movements have kind of resonated with each other and with the wider community and with some of the issues we’re seeing in the wider world and that has just lead it to become part of the new normal I guess. If you see companies in Silicon Valley who are doing continuous deployment, releasing every good built, Facebook has just moved to releasing twice a day, Amazon, Google, famously all these companies are doing multiple releases a day, Etsy, and so forth. So this is kind of seen as the normal way to do things now and so we were just really lucky when the book came out that this wave kind of came out and engulfed everything.

Michael: So, it was timing.

Yes, it was good timing, which had nothing to do with our planning.

   

2. So, let’s just refresh out audience here on Continuous Delivery. Can you just give us the nutshell CD?

Yes, sure. So the point of Continuous Delivery is that your software should always be in a releasable state right from the beginning of the project, in terms of when you start coding that is, when you start writing code you should aim to be able to release just the very first feature all the way out to production like environment and from then on you should prioritize keeping the system releasable over building new features and your ideal is you should be to do a push button release of the system to users on demand and that, as a result, IT is no longer the constraint on what can be built, instead you’re coming up with new ideas, “what should we build next, oh, maybe we should try this idea, what is the smallest piece of web we can do to test this idea, we’ll code that up, we’ll push that out, we’ll get some feedback and work out what to build next”.

   

3. Ok, great. So, putting that into context with some of the other movements that are going on right now, Cloud being one of them, Big Data architecture is another one, how is CD affected by or how does it affect these technologies?

Well, I think it’s a natural fit with technologies that aim to use the Cloud, because the obvious area of applicability for CD is websites and it’s not the only area, I guess we’ll come to that a little bit later, but for websites, obviously a lot of websites, it makes sense to be in the Cloud because you get this horizontally scalable architecture for free, it takes care of a bunch of your nonfunctional requirements and there continuous delivery is a really good fit because if you architected your system correctly then you should be able to do push button deployments on demand in a low risk way and that idea of developers pushing into the cloud naturally fits with the way the…

Michael: And Cloud doesn’t complicate that process at all?

No, it actually makes it easier because the whole point of CD is that everything should be automated, the provisioning, the configuration management, the management of the infrastructure, this whole idea of infrastructure as code, you should be able to check the immigration into version control and spin up service with a known good configuration from version control. That’s how the cloud works, right? So, assume you design your system with that in mind, then CD fits perfectly with that paradigm.

   

4. That’s the whole thing, having preplanned. Ok, and then Big Data architecture, how does that work?

I think that’s kind of orthogonal, I think it just happened, I guess the area where CD and Big Data you see these things coming into conflict is more traditional organizations who do data warehousing and data warehousing is one of these parts of IT that has been quite resistant to Agile in general, I guess, a lot of data warehousing systems are still very waterfall in the way they planned and designed and so CD kind of interacts with that as much as Agile is colliding with traditional warehousing community and Ken Collier, who’s a ThoughtWork-er, has just written a book on Agile data management and he’s addressing some of this stuff.

Michael: Ok. So you’re saying the data base community is not Agile, they’re more waterfall.

Yes, traditionally and I think especially in the traditional RDBMS world the idea of managing data base roll-forths and rollbacks in a fully automated way rather than having the DBAs manually running scripts. That’s always been a sticking point for people adopting continuous delivery, but I guess what I would say most of the companies that are building the NoSQL data stores are more in touch with the Agile method of doing things, a lot of those people come from the Agile community and so they kind of built in, kind of automated deployability and roll-forths and rollbacks into the technology, so logically, that technology doesn’t actually require you to schema changes as much, that’s part of the point of NoSQL, you don’t have to have a schema, that you’re building a structure into the data structures on the application side rather than having all this complexity in the data base itself.

   

5. Great, got it. So, last year we talked a fair amount about test driven development and I believe it’s your latest blog poses the question does TDD really make for higher quality software?

Right. This is inspired by a book by a guy called Laurent Bossavit, he wrote a book called Leprechauns of Software Development on Leanpub, it’s a brilliant book because he takes all these cornerstones of Agile development, the cost of change curb, TDD, and these other pillars of Agile methodologies and looks at the evidence behind them, he actually goes and traces back the citations to find the original data and he shows that in most cases the data is actually from a small set, often computer science students, or it was derived from projects where the variables weren’t probably controlled and they have been generalized way beyond the applicable range in which they’re valid. So, he basically blows a hole in all this stuff and says all this evidence that people say Agile methodology is better because of this data, that’s probably not true and most of it is probably just leprechauns, this imaginary “yes, we must be better because of this stuff”.

So I think that what he is saying is not that we should give up and throw up our hands and say we’re Agile because that’s our religion, he’s saying we need to make a bigger effort to be more scientific about the way we evaluate our practices. And that’s really difficult in IT because you can’t randomize control trials, it’s just impossible to control variables in these complex systems. So, I think he writes a big rant about this and then there’s a section of the book which is what are we going to do next? And that’s blank so far. But I think that’s the challenge we have to confront, we can’t go around saying “oh, Agile is better because of these studies” and then the studies are actually just hot air. So, let’s either admit this is just religion, which I think is not the right way or we need to be more methodologically sound about what we do. So that’s something I am personally interested as well. I come from a background of physics, theoretical physics not proper physics.

   

6. So, let’s see. ALM, application not Agile, life cycle management, it’s been around for a long time but it’s getting a lot of air play again, perhaps of continuous delivery, what’s the relationship here between ALM and CD?

I think the traditional life cycles in ALM, if you look at frameworks like ITIL or more in phase methodologies you’ve got, I mean you’ve seen a V model and all these models of life cycle management, I think CD is basically saying “don’t do that”. The Agile life cycle model should actually be much simpler than that. And the reason for that is that it’s just too slow to develop software using that life cycle. There’s a bunch of studies that have been done that show that the biggest source of waste in software development is stuff that’s build that’s never used, over 50% of features that are built are rarely or never used, and so that’s the biggest source of waste in software development, let’s not do that. And I think the continuous delivery and the Lean startup movement are saying listen, rather than, when you develop a product you have an idea, if I build this it will be extremely valuable to x group of people and you build it and actually building the software to test the hypothesis is a really expensive way to test the hypothesis, so let’s find a cheap way to do it, build small increments and build the smallest amount of stuff you need to test the hypothesis which is much smaller than the amount you need to actually deliver the real functionality and then it’s array based on that. And that’s just a much more efficient way to deliver software that’s actually valuable to people, because it allows you to stop doing whatever you’re working on if it’s not actually valuable.

So that obviously has an impact on the whole software delivery life cycle and an impact on every part in the organization. James Whittaker released a book called Testing at Google, I think last year, and one of the points he makes is, he has this provocative statement that exploratory testing is dead, I don’t think he’s got it exactly right, but his point is when it takes you weeks or months to deliver bug fixes, you need to be much more careful about exploratory testing, you need to test all the edge cases before you put something out, because if there is a bug it’s going to take you weeks or months to fix it and doing emergent bug releases in the traditional process is actually risky. So his point is “well, listen if you think it’s in this delivery and you’re always pushing these out into the clouds and you can get a fix out in minutes or hours, you don’t need to test every possible use case, you just need to make it really easy for someone to report a bug and then you can get a fix out in an hour or a day or so, at the worst”. So, that changes the way the testers need to behave, it changes the skills that people need to have, it changes the way the lifecycle works, this is true of analysis too, if you’re talking about validating the features you’re building, you need to build data measurement into the features, so when you put the features out you can get the data, who actually used this feature, was it valuable, that needs to be part of analysis effort, so this stuff affects the whole way that we think about how teams should behave, the skills that are required and the life cycle in general.

   

7. Ok, you mentioned teams. In your consulting, are you involved with team development, do you get involved in that part of the Agile process, are you working with Scrum teams or things like that or are you more high level, just talking about the overall pipeline?

I haven’t done any kind of real work in terms of actual hands on coding or direct consulting for a while now, my most recent real proper role was as a product manager for Go, so that was working on a small team of ten people; the joy of working at ThoughtWorks is we hire smart people, that’s our whole kind of modus operandi, so on those team when you are working with people that are really smart, even people who are genia and have less experience perhaps, you can get rid of a lot of the training wheels, so for people who have been doing traditional. The things is, it’s not necessarily about smartness it’s about what you know. So people who aren’t used to Agile development, the whole Scrum thing can be kind of important, as I said training wheels if you like, and then progressively as people get more experienced and they get to know things better, they take off the training wheels and you can start to optimize your process for just getting software really fast. ThoughtWorks consulting has definitely done work with Scrum teams and helping organizations move towards continuous delivery, there’s been a bunch of stuff we’ve done around that.

Michael: Right. So, your role has kind of changed since I talked to you last, you think of yourself as more of an educator, an evangelist?

Those are very polite words, I think of myself mainly as kind of a ranter and I like to write things when I get across.

   

8. Cool. Talking about some other things you’ve written, earlier in the year I picked on this one blog that was Four Principles Of Low-Risk Software Releases. What are the four principles?

So, firstly I would say you want to decouple deployment and release, they’re two separate things. This is a problem for the people translating my book into Portuguese, two of my Brazilian colleagues, because in Portuguese it’s the same word for deployment and release, and actually that can mean two different things. You could be deploying into production all the time but not actually releasing those features which means making them available to users, so Chuck Rossi, the release manager at Facebook has this video and he says that all the features in Facebook that are going to be released in the next six months are already in production, you just can’t see them yet. So this is why we’re deploying all the time but it’s a business decision when you are going to release the feature, meaning making it available to users.

And obviously deploying into production continuously massively reduces the risk of performing deployments kind of counter intuitively, for one of the other reasons in the article which is that when you release frequently, the amount of stuff you’re releasing is much smaller so that makes it much easier to find what the problem is if something goes wrong because what’s changed since the last release, well it was just a few hours ago so any small amount of stuff could have changed and it also means that actually fixing the problem is quicker and you get lots of practice, so if you practice deployment all the time, it becomes really easy. So reducing batch size is the second principle, and then another principle is that you want to be incremental about the release process so you don’t want to release the whole system all at once, bang into production, what you want to do is you have a graph of dependencies, you have dependent services, you have dependent database, you have static web files, what you want to do is incrementally release the new versions of each of those things before you put the main app into production and you can use techniques that we’ve talked about in the book like blue-green deployments, even to do deployments of new versions in an incremental way as well.

So, the final one is that what you want to do is optimize resilience. Resilience is one of these words that I think is really important. John Allspaw, who coauthored the Web Operations book and is director of engineering at Etsy, talks about resilience a lot, in all kinds of systems, whether it’s the space shuttle or medical devices, he uses resilience at failure. One of the, for some reason I got into looking at the Apollo program and the computers that were built for the Apollo program and the software was kind of buggy, they realized, the MIT was building the software and they realized they weren’t going to be able to get rid of all the bugs because obviously it was a really tight deadline, so what they did was they made sure that if there was a bug and the system shut down, it would reboot and come back to a known good state. So what they did was they made the system really resilient and there’s the quite well known story about when the lunar lander was coming down onto the Moon, they ran out of CPUs cycles because there was a bug in the radar finder and it kept sending interrupts to the computer and the computer basically ran out of CPU. But they built it so that the low priority tasks would get thrown out of the run queue, so essentially the computer reprioritized and said “well, these are the important things and I’ll keep doing these and this thing which keep sending me interrupts I’m going to deprioritize off the run queue” and as a result they were able to make a go decision to land on the moon because they built the system in a resilient way.

And I think that’s really important, you need to design your system’s resilience but you also need to design your process for resilience, and your organization for resilience. And resilience gives you this idea of risk in a sense, things going wrong but there is also upside risk which is being able to take advantage of opportunities, I guess the word that’s better for that is adaptability. So Jim Highsmith talks about adaptive organizations and adaptive leadership, so resilience and adaptivity are kind of two sides of the same coin, whether you’re talking about downside or upside risk and that’s how you create great organizations. Great organizations are always learning and they are always adapting to the events that are coming from around them, from competitive pressures, from changes in the market fundamentals and they are able to leverage that and arbitrage the opportunities to create great new software and take advantage of that. So I think resilience and adaptability are really important concepts, not just in software but also organizationally.

   

9. So, we were talking about differences between delivery and deployment[...], so it’s probably an important point to make that delivery can be achieved without constantly deploying, right?

Michael's full question: Cool. So, we were talking about differences between delivery and deployment, which can be confusing to some people, especially when looking at mission critical kinds of applications, working in hospital, working on the Apollo program, so it’s probably an important point to make that delivery can be achieved without constantly deploying, right?

Yes, absolutely, that’s really important. That’s one of the distinctions we made in the book, that we had no idea it would be really important and I think, if you consider these mission critical things you don’t want to be doing AB testing on flight software, you don’t want people on Boeing “here’s a new version, we think it will work better, see if you crash this often? Oh, you crashed more, oh, we don’t want to release that version to the rest of the public”. There are some things that you don’t want to do in mission critical systems. But I think continuous delivery focusing on keeping the system releasable all the time is something you can do in any situation, it’s not a mission critical example but the HP LaserJet team, the firmware team are releasing a book this year, a guy called Gary Gruver releasing a book called [A Practical Approach to Large-Scale Agile Development: How HP Transformed LaserJet FutureSmart Firmware] but if you search for Gruver Agile on Amazon you’ll get this book they’ve written about how they moved from continuous delivery in firmware and what they essentially did was they had all these branches, different branches for different versions of the product and they couldn’t spend any time developing new features because they were spending all their time doing testing and merging assumed branches, then they moved to continuous delivery, they did really aggressive automated testing to make sure that if they put a new change in they would test it on logic boards, they actually have physical logic boards to mirror the systems that the firmware was running on, every change they would put that on the logic boards and test it and so that’s really powerful, that gives you the ability to see the latest version of the software in a working state and have a good idea of the quality of that system, because you have automated tests that validate whether the latest build is actually good.

That is really important in mission critical systems, you want to know that if I make a change what’s the effect of that change on the production system, in a production like environment. So, that is really the focus of continuous delivery. Continuous Delivery says you want to know the effects of the change, we want to have automated validations whether we can get feedback as fast as possible, we want to make sure we catch problems as early as possible and if there is a problem we want to stop the line. We talk about focusing on releasability over functionality. So, you want to make sure, if the system is not in a releasable state, you stop building new features. And all these things are actually about reducing risk, this is one of the key points for us, is that our background is working systems where you’ll be there for the whole weekend because the releases were so risky, that’s really where we were coming from, how do we reduce the risk, how do we bring some of these problems forward in the development process? One of the most memorable lines from the book is “if it hurts, do it more often and bring the pain forward”. If something really painful is going to happen at the end of your process, how can you possibly find ways to test for that as early as possible and try to find these problems early on in the delivery process? I cannot think of any situation where that would not be applicable and it’s more applicable in mission critical systems, if you speak to people who build these kind of things, they have sophisticated automated testing systems and their testing changes and they are prioritizing releasability because that’s how you validate the system is actually going to work in these situations.

Michael: Ok. By the way you mentioned feedback, super important, right, at all stages and it’s something that can be overlooked.

Absolutely, and there’s feedback at all these different levels. As a developer you want to get feedback on the changes you’ve made, did I break any tests, you want to get feedback on is this system deployable, but you also want to get feedback from users, here’s what we did valuable. So there’s all these different feedback loops and you really have to pay attention to them, and the paying attention is important, I mean it’s one thing to get the feedback, it’s a completely different thing to act on that feedback. So you go to organizations and the build light is red, and no one cares, they’re still developing crappy code and checking it in because whatever, and that’s a really big problem, that’s one of the things that motivates the whole devops movement, because when you work in silos it’s really easy to optimize for your own personal productivity, developers like working on branches because it lets them develop really crappy code and they declare dev complete really fast rather than focusing on how can I build a valuable high quality system. So getting that feedback, actually making sure people get that feedback, that’s actually quite a big deal in some places and again, continuous delivery is about getting that feedback as soon as possible but then also making sure you act on it, that’s a cultural problem and these problems are the most difficult ones to fix.

If you have a culture where developers don’t care what happens in production, then it’s going to be really hard for you to develop high quality software. And most of these organizations don’t even have that information, so getting the information is one thing but actually making them care about it is another thing. And again, one if the effects of continuous delivery is the stuff that you code is in a production like environment much more quickly and so you actually start to care about that a lot more if you’re more in contact with it, if the cycle time is shorter. So it’s kind of a way of making people care more about what they are doing because they can see the effects of it. There’s this video by a guy named Bret Victor called Inventing on Principle, brilliant video, and what he talks about is if you’re creating something you need to be in touch with what you’re creating, you need to see what it is you’re building because that’s how you discover all the cool things that you didn’t think about which make the products amazing and special and brilliant. And so that idea of getting the feedback really tight is essential to creating something that’s a good product.

   

10. Good. One final question for you, last year when we talked devops was kind of the, I think you described it as the anti-movement, it hadn’t quite gelled yet. Has it gelled now, is it forming, what’s happening with devops in relationship to CD?

It’s interesting. I went to Devops Mountain View a few weeks ago and there was a talk on the future of devops and Patrick Debois and a bunch of other founding people were at the meeting and it was kind of interesting. There were a couple of attempts early on to define a manifesto for devops, I was involved in one, it was wildly unsuccessful, and I think for a good reason, the movement is still resisting this kind of thing and everyone is worried about being coopted by the vendors, which has already started to happen, people are already starting to talk about the devops solution for vendor X and the devops solution for vendor Y, so that’s happening. So there’s always this tension between wanting to actually be real and be about what’s actually going on on the ground and try to solve problems, what people are worried about is what happened with Agile, where, and you see it even with Agile conferences, they become less and less dominated by people who are highly experienced and more and more dominated by vendors. I work for a vendor, we are part of the problem much as we resisted, and how do we prevent this from happening with devops. So I think Patrick and the rest of the people involved in this have continued to take the view that we shouldn’t have a manifesto and we shouldn’t define it, and we should just continue to be about sharing practices and principles and tools that works for us and about being a community of people who share ways to improve the way we build and run software systems. That’s what it should be about.

Michael: Great. So, thanks for coming by today, we enjoyed having you.

It’s been a pleasure, thank you for having me.

Michael: Thanks, Jez.

Dec 24, 2012

BT