At the Agile Testing Days 2015 Wouter Lagerweij talked about how rebuilding a legacy system instead of refactoring it has helped a team to adopt agile practices like test driven development, automated testing, and continuous delivery. His talk is based on his blog post Don’t Refactor. Rebuild. Kinda.
InfoQ interviewed Lagerweij about what it is that makes refactoring so difficult, if rebuilding software is less risky than refactoring, and how continuous delivery fits with rebuilding software. InfoQ also asked him for advice on rebuilding vs. refactoring software.
InfoQ: Can you explain what makes refactoring so difficult?
Lagerweij: Refactoring is one of those practices that is fairly easy to do as long as you do it. The same thing is true of unit-testing for example. As long as you consistently write tests for your code, or clean up design issues in your code, in small steps, it’s not difficult.
The longer you let things slide, though, the harder it becomes to pick them up again. This is why everyone is always talking about "technical debt". It’s not usually actual technical debt as Ward Cunningham coined it (see Ward Explains Debt Metaphor), though. That assumes that it was a conscious decision to delay improving the current design. That’s not what usually happens. What happens is a simple hygiene problem, one that spirals out of control.
Not that the software industry is the only place where these things happen. Just ask some Lean practitioners active in the healthcare industry. Getting surgeons to consistently wash their hands before surgery decreases patient complications by half, but even with active evangelizing and structured checklists it is still difficult to get compliance.
So, since many teams do delay refactoring, we have many messes, or "legacy codebases". And since those teams are almost invariably not experienced in refactoring (else they would have been doing it), they are certainly not up to the task of fixing those messes.
And by the time such teams realize they should do something about their "debt", they’re stuck with a system in which it is very hard to learn the skills needed to improve that system! When small changes can have unpredictable results in different parts of the system, it’s not conductive to a positive learning experience in refactoring. And any developer knows that adding unit tests to an existing, closely coupled, system is a difficult and unpleasant experience.
I guess what I’m saying is that if you wait until you’re in such trouble to begin learning these skills, there’s a very good chance you’ll not be very successful in applying them. And that means there’s an even bigger chance that you will give up before mastering them.
InfoQ: In your opinion is rebuilding software less risky than refactoring? Can you explain this?
Lagerweij: Honestly, if you have a team of people that knows how to attack a legacy system, refactoring will always be the better option. It’s less risky, and has less overhead than any type of rewrite.
But those skills are unfortunately still quite rare. And if you don’t have people in the team that have done that sort of work before then all you will do is slow down further. You’ll increase the level of frustration in the development team as well as in the business.
And there’s many organizations that find themselves in trouble this way. They simply don’t have the expertise in house to fix their technical issues. They can’t get the new functionality built that they need to be competitive. Finding even a few experienced people is difficult, and is often a case of "too little, too late".
In that situation a rewrite can be more attractive.
InfoQ: Can you give some examples showing how teams can rebuild software and do continuous delivery? What makes this a good combination?
Lagerweij: One of the advantages of a rewrite is that you’re starting fresh. That means you can make sure you do it right this time. Of course, most of the time, you don’t. As I said earlier, the best way to deal with things such as testing and refactoring is to do them continuously. But if you never managed to do that before, why would it suddenly work now?
I discussed this in my talk, where I described the deal that we had made with the teams I was working with. This time, we would use Test Driven Development (and also Behavior Driven Development), we would keep the code clean, ensure to always have 100% unit-test coverage. And because they would no longer be held back by their legacy system, they knew that they would be able to do that.
We also agreed that, to make sure we didn’t get tempted to loosen the reigns, we would do full continuous deployment from day one. That means that every time a developer pushed code to github, it will be automatically build, tested, and deployed all the way to production. This gives a wonderful focus on keeping quality high at all times. You can’t delay those tests, since pushing untested code will break the build and get the whole team on your back. But you also don’t want to skip the test (or write a non-checking unit test just to fool the coverage numbers) because you might actually break production. Yourself. Demonstrably so.
There’s also the simple psychological effect where no one really cares about letting test coverage from 2.1 to 2.0%, but the whole team will ask for an explanation when it drops from 100% to 99.9%.
You still have to go through a learning process. You won’t magically be better at anything by starting over. It just creates a situation with a bigger chance of success.
InfoQ: What’s your advice to teams that are considering to rebuild software instead of refactoring it?
Lagerweij: First, don’t try and rebuild everything before releasing. Find a way to integrate your new system with the old and to provide value to the users from the new system as soon as possible. Something like the strangler patternor branch by abstraction can help here. If you don’t do this, your project will either get canceled at some point, or go on forever, but will certainly never end well.
Second, stop fooling yourself. Using a process that demands discipline, like continuous deployment, can feel like adding additional burden on yourself on top of the external demands on the project. But that discipline is what will help you to avoid falling in the same traps as before and writing a completely new legacy system. It’s what will force you to learn new skills, such as continuous refactoring, how to test drive your code, how many different types of tests are needed to feel in control, how to automate your deployments, how to deal with monitoring and error handling. All of these are so much more relevant when the path to production is short and direct.
And third, and maybe most importantly, involve the customer! Even more than usual, with a rewrite the temptation is to take the answer ’make it do what the old system did’, and not have any further need for customer interaction. But we need to know what the customer needs now. We also need to know what he doesn’t need anymore. We are all familiar with those 80/20 rules about features being used. We need to test that, and use that to deliver what actually has value today. You see, refactoring doesn’t just need to happen at the code level, reviewing your requirements, business processes and even business models can be just as important.
Of course, it’s still going to be a lot of work, but if you stick to these principles, you can have clean code, a learning team and happy customers while you do it.