BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Now or Never: the Ultimate Strategy for Handling Defects

Now or Never: the Ultimate Strategy for Handling Defects

I’ve heard this question asked and re-asked so many times during training programs, coaching engagements, and small and large meetups.

The question is: “How do we handle defects in Scrum?”

My answer is simple. It might look little scary from the beginning, but I’ve used it many times and I’ve never seen problems because of my approach. So keep reading.

The first thing to figure out is whether a defect arose from the the user story of the current sprint or from work done in the past (during previous sprints).

If a defect relates to functionality developed in this sprint, we should follow our definition of “Done” (DoD). If it says “all known defects should be fixed” then we should fix our error. I should admit, though, that I haven’t seen such a strict DoD in real life.

Practically, as we know, quality is a mutual agreement between the Development Team and the Product Owner (PO), so there always will be some agreement on what will be fixed and what will not. It might be something like “all critical bugs should be fixed” or “all defects with severity above major should be fixed” or so on. It is not likely that the Development Team and PO agree on the policy right from the very beginning — this is usually a mutual, iterative process. The PO may guide the Development Team at the beginning by explaining from a business perspective which defects should be fixed and why and what should be left alone at the moment. The Development Team may in turn help the PO understand the consequences of not fixing an issue.

The PO, for example, might tell the developers to ignore a small UI issue that does not affect functionality but just doesn’t look good. The PO knows that the whole UI is undergoing user testing and may significantly change in the future. The Development Team, for example, may discuss with the PO two possible solutions for some user story. The PO is leaning towards a faster/simpler solution but the Development Team explains that the choice will not scale — it won’t be able to handle 30 times the current traffic, which would require a complete rework. Another example could be browser support. More or less every web product has explicit or implicit rules of supported browsers and versions. A problem that appears in the latest version of Chrome definitely should be fixed within the sprint, while the same problem appearing only in Opera 10 (six years old now) would never be fixed, so there is no point even in logging it.

Over time, the PO and Development Team build a shared understanding of the boundary of quality, and it becomes clear whether a defect falls above or below this line. The developers must fix bugs above the line and need to do that within the sprint. In other words, the Product Backlog item cannot be considered done until this defect is corrected (as it is stated in DoD).

Now comes the scary part: what if a defect sits below the quality boundary? That means that we are not going to fix it right away.

We might have heard of organizations that relist such a defect on the Product Backlog or on a separate bug log (which is more or less the same, and I do not want to go into details of the difference, which is not relevant to the context of this article) for later work.

My scary suggestion is simple: delete the defect or, more accurately, close it with a resolution like “won’t fix”. Deletion may be too radical for teams that haven’t done it in the past and, especially, for QA professionals. In such cases, I suggest logging the resolution to allow the potential to return to the bug in the future (which sometimes happens).

But let’s be realistic. The time the team needs to fix a defect in the future will conflict with the time required for new feature development and all other work. If it is not worth fixing a defect right now, it’s not likely that we will find the time to return to it later. Also, it becomes more and more expensive (difficult) over time to correct the problem because the team loses context.

I’ve never seen old, relatively unimportant bugs (so unimportant that we do not fix it right away) take priority over new features. This bug will never be fixed. And that is okay. Our goal is not to fix all known defects but to deliver a valuable product to our customers. The PO’s responsibility is to maximise the impact of the results of Development Team’s work and build an awesome product, not a product without known defects.

Now, what should we do if a new defect appears that derives from some functionality we’ve built in the past? I would start with a short analysis of why we missed it in the first place. This is an opportunity to improve — if we learn why we missed it, we can prevent similar problems in the future.

As for the bug itself — if we can quickly solve it and it will not impact the forecast for our sprint, then there’s nothing to think about, right? We fix it and move on. If the defect is of considerable size, we must ask a simple question: does the defect matter enough to that we should fix it in the next sprint? In other words, does it rise above our quality threshold? If so, let’s put it on the top of the backlog. If it doesn’t matter enough to make the next sprint’s backlog, we should just delete it. If we’re not going to fix it as soon as possible (in the next sprint), we won’t ever fix it.

I know that this suggestion might sound weird. Believe me, it’s not that bad. Actually, instead of believing me, just try it out after reading my story on this, later below.

Let’s be honest with ourselves. Let’s stop using the backlog (or bug log or any queue for that matter) as a trash can. Having a longer queue of issues will increase the average lead time of our system. We could say that any backlog isn’t just a “first in/first out” queue and manage it that way, but managing (reviewing, re-sorting, and so on via triage meetings, etc.) our bug log demands time and energy. In my experience, the benefits of these activities with long bug logs are overrated. Just stop doing it.

If a bug is critical enough but we haven’t fixed it, it will remind us about itself — don’t worry about that. Just recently, one of my teams had such a case. They knew about the problem, which appeared rarely in unpredictable situations. After a quick analysis, the team decided it was not important enough (below the line) because of its infrequency and closed the issue. However, the bug reappeared in several weeks under different conditions. The team then realised that the bug was more important than they initially thought and decided to investigate the problem — and fixed it.

I’ve seen projects with long lists of open defects. Whenever I start to work on such a project, I usually select all problems that have spent some threshold (like six months) without modification and close all of them. Often, this is 80% to 90% of all defects. Some new teams and especially QAs might be a little uncomfortable with such a move but I personally do this so that there is somebody to blame later on (just in case).

Again, it’s very important to rigorously follow our DoD.

Depending on the industry, environment, and the system we are developing, our DoD will vary. Compare a mobile game that involves pushing objects into other objects with some geometry calculations with, say, medical equipment. Obviously, the two DoDs would be radically different. However, the logic of processing and prioritizing remains the same. For example, in a banking system, we could not tolerate any errors related to rates, credit, debit, and other calculations. All such defects are substantial and sit above that imaginary quality line. Security issues would be there as well. However, that doesn't mean that we should fix everything. A graphical problem on the transaction screen of the online banking system might not deserve action. There is always a line — or, there should always be a line. If you are fixing all problems that appear, my guess (and all my experience confirms that) is that you are missing an opportunity to do something more valuable.

Let’s quickly discuss what to do if our defect is an urgent and critical production problem. It’s a no-brainer — we should fix it. Such a case doesn’t often happen (and if it does, it is important to investigate the root causes).

I’ve summarized the few possible scenarios in one decision tree:

I should also mention one comprehensive strategy that simplifies the flow chart a bit. Teams agree with the PO on having some time during a sprint (like 10% of the available time) to fix bugs. It’s an easy and straightforward plan. Of course, the same logic more or less still applies — important stuff (above the line) gets fixed and unimportant stuff does not — but this way, at least, we no longer think that fixing a defect means that something else is pushed off the Sprint Backlog. Once we’ve used up this time buffer, however, we may fall back to the process described above.

These ideas might sound a little chilly, so I’ll illustrate them with several examples from my experience. The first one is about dead bugs.

Several years ago, I started working with several teams on a project that had about 590 defects tracked in the bug tracker. We searched for all bugs below high priority that had not been touched for six months or more. That search returned about 540 defects or 90% of the queue. I suggested deleting them all. After a quick discussion, we closed them all with the resolution “won’t fix”.

As with most modern bug-tracking systems, that action notified all people who were subscribed to these defects: reporters, assigned people, watchers, etc. For three out of the 540 bugs, somebody got back to us to explain why we could not simply just close it and why the problem needed fixing — three out of 540!!!

It was awesome. Now, we knew what is important, so we reopened those three bugs. The bug queue became 10 times smaller, which meant that it was possible to go through all bugs in a meaningful way. I should also note that after three sprints, teams pared down to 20 to 30 bugs in a list and kept it that way since because they stuck to the process described above. Teams added defects that were significant enough (above the line), and the developers were working on them within a sprint or two. Defects that the team had no chance to look at in near future, which in the past were added to the growing queue, were no longer added. As a result, the queue became alive and manageable, with nothing in it more than two months old.

This story illustrates one of two main takeaways that I’d like this article to provide: don’t keep long lists of issues from the past, fooling yourself that you will take care of them in the future. If you haven’t done it for years, you won’t ever do it.

Another case illustrates how quality is a mutual agreement between the PO and the Development Team. I was coaching a new product development in a mobile startup. Most of the engineers on the Development Team had an enterprise background and initially had quality standards that were much higher than the PO’s. As a result, they fixed a lot of defects that they found during a sprint development.

The PO ran tests with the users every sprint. Users constantly came up with new ideas during the tests, which caused significant changes. The PO expected and appreciated that.

There was a conflict of perspective between the PO and Development Team. In these circumstances, it made no sense at all for the team to fix tiny UI details. They would have to rebuild everything under development several times in the quest for a product that was easy to use and pretty enough from UX perspective, so many of the fixes would end up wasted work.

The first step in resolving this conflict between the PO and the Development Team was that the Development Team stopped fixing such defects — but the developers continued to rigorously report all of them in the bug tracker, which grew enormously as a result. It took a few more iterations for the developers to realise that these issues sometimes quickly became obsolete or redundant or that they would never be fixed.

Over time, collaboration between the PO and Development Team brought about a mutual understanding of what should be fixed and what should not. Yes, there were times when the Development Team hadn’t logged something important because they thought it was unimportant — but discussion with the PO about misunderstandings matters more than any mistake itself. That discussion helps to build a common understanding of this imaginary quality line, and these two sides continued to refine that understanding.

To summarize, it is “now or never”. We should fix a problem right away or keep silent until the end of the universe wait until the next time we hear about it, which may allow us to reassess our decision.

Keeping backlogs and other queues short and concise will let us focus on important stuff and will prevent us from overburdening our brains with too many things to worry about.

And the last but not least thought is that fixing any defect requires more time and energy than preventing it in the first place. Investing time in improving (or introducing) engineering practices may solve the root cause of this article’s initial question: “How do we handle defects in Scrum?” In my experience, test-driven development takes long to master but, once mastered, prevents a lot of potential problems. Actually, any X-driven development practice will preempt problems. Using acceptance-test-driven development or behaviour-driven development and having our “engineer” write cases with our “QA” would be a great example of investing in future quality — something that would prevent future defects.

What do you think of the subject? How do you handle a long list of defects in your project? Let me know in comments. Feel free to ask a question in comments or by contacting me directly.

About the Author

Kirill Klimov is a generalising specialist in building human systems using modern management frameworks (agile/lean/kanban) so that companies can deliver maximum value to their customers and employees have a meaningful, enjoyable professional life. Kirill is an author at the non-profit project Agile Quote (@AgileQuote), which provides readers with daily inspirational quotes on agile software development. Quotes come from on-the-ground practitioners and speakers in the agile community and contain links to resources. Kirill is co-author of Who is agile in Ukraine, a book of stories and personal reflections on the journeys of people who stumbled on agile in Ukraine, are from Ukraine, or are somehow connected to Ukraine. You can track Kirill on Twitter as @f0g or via LinkedIn.

Rate this Article

Adoption
Style

BT