I have been in charge of implementing Lean Software Development in a software vendor house for about 2 years. During this time I have been coaching a large team throughout the development of two successive versions (let’s call them V2 and V3) of our enterprise solution.
We have gradually implemented seven major changes in our organization that have helped our R&D department to remove waste from our software development process with encouraging results. This essay is about implementing these seven changes, the results we obtained and what we have learned during the journey.
Context: teams and project
70 people were involved in the new version of our enterprise solution, split onto the following teams:
- Development : 6 scrum teams, each developing one application of our solution,
- Integration : in charge of doing end to end integration testing,
- Functional Analysts : by and large the Product Owners,
- Functional Architects : in charge of functional consistency throughout the different applications,
- Technical architects,
- UX designers
- Continuous Integration and Release team
Framing the problem
For the initial V1 version (the one we developed using standard development process, i.e. before implementing Lean Software Development) we initially planned a duration of 12 months: 10 months of actual feature development and 2 months of freeze code / Integration testing.
It actually took 12 months to develop the features and deliver them to Integration Testing: two months delay. But then we had a supplementary 11 months (instead of 2) of Integration Testing / stabilization phase to go through before being able to release. Not to mention another 7 months to deliver two Service Pack versions.
From a cost-per-activity driven perspective, we didn’t have major issues on the actual development schedule or cost (20% delay). We had one on Integration / maintenance: instead of the 2 months scheduled for stabilization it took us 11 months. The 7 months of service pack was not even accounted for the release: it was on a general maintenance budget.
From a Lean perspective (i.e. based on customer’s point of view), here is how the problem is defined:
- A 12 months planned development took 23 months to be delivered to the customer (30 months if you take into account the Service Packs) – the problem is 11 months delay. Which has 3 main consequences:
- Delay: we are late so the customer is unhappy
- Cost: the company has to pay an extra 11 months for resources to work on the release
- Delay: while spending effort on release N, we don’t work on release N+1 which will be late as well
- Income: time to cash is 11 months longer and we are accumulating software stocks which represent investment by the company.
Figure 1: Problem in terms of delay
- Most of this time and effort is spent on waste: reworking the delivered software to fix the bugs during stabilization phase and service packs
Figure 2: Problem in terms of waste
What the People said
Some excerpts from the many interviews we did at the beginning of the project:
It’s difficult for R&D to do something simple. A feature that has value for the customer but that is easy to develop does not interest anyone in there.
No one in R&D knows what the customer does with the product. Marketing finds 20 bugs in 5mns on first versions delivered by R&D
Responsabilities are diluted throughout the organisation so strategies are individual for survival rather than collective towards performance.
Cause analysis: understanding our misconceptions
The major misconceptions we made during this development process were:
- Developing features is our main objectives as features are what customers value
- If there are a few bugs here and there that’s OK as we’ll cope with them during stabilization phase
- 2 months are enough to freeze the code and deliver quality software.
As mentioned above, most of these come as a result of our cost per activity perspective which made us tend to believe that our software development process was acceptable: it was our maintenance that was bad.
As a result, the teams were rushing to develop features without putting on too much focus on quality thinking « we’ll have 2 months to fix all of them and that will be plenty ».
What happened during code freeze:
- Many bugs were hiding other bugs further down the path
- Fixing bugs introduced regressions as some part of our software had a low coverage of tests
- As a result of 1 and 2: we were not able to predict when the code will be good enough to be released. It actually took 11 months.
- We had to develop 2 services packs in the following 7 months.
Based on this description of the problem as observed during V1, the rest of this essay is about what we have put in place while implementing Lean Software Development on the following two versions.
Our hypothesis
As we implemented Lean Software Development approach we had a different perspective.
Our hypothesis was: poor quality in the first place is harming our lead time as it resulted in much rework and 18 (11 + 7) months delay.
The Standard
The Standard in Lean is pretty straight forward in terms of quality: you don’t pass an item to the next step of the process if you know that there is a quality issue.
Sound simple. Yet, there is a difference between simple and easy as Jeffrey Pfeiffer and Robert Sutton explain in The Knowing Doing Gap: it is difficult to actually implement in every day practice.
Understanding the process
In terms of software development, understanding the process means that one has to be clear on the following three things to really be able to harness the process:
- The unit of work that goes through the process
- The actual process – with the different steps.
- The interfaces between each steps: how can we tell the level of quality of the unit of work is OK and can move to the next step
The Unit of work is the User Story (see dedicated section below).
The process was split in 3 big major phases (each being subdivided into a few steps):
- Functional analysis: the definition of a User Story.
- Development and test
- Integration Testing
Lastly we defined our interfaces and the OK/NOK conditions:
- A User Story does not go into development unless it is Ready To Develop, a status given by Functional Analyst, Functional Architects (in charge of our solution overall functional consistency), UX designer, developers and testers (embedded in the team)
- A User Story does not go into Integration unless it is Done, which means there is zero bug on the functional scope (i.e. not only the nominal case is OK, as we used to do, but all tests defined by the tester embedded in the team are executed successfully) and both Functional Analyst and UX rep are OK with the US
- We don’t go to release unless there are no bugs left on end to end process integration testing
Seven changes to remove waste
Once we had defined the rules, we needed to provide the team with the means to apply them. We did seven things that really helped them in doing so:
- User Story as unit of work
- Daily Follow-up
- Visual Management
- Functional Analysis formal follow-up
- Embedded tester
- End of sprint code freeze
- Stop the Line
User Stories
I have noticed many times that while doing process design, people focus solely on the actual different steps of the process; they don’t spend enough time in thinking about the unit of work that goes through it.
I am a big fan of User Stories as they are small, simple, and most importantly, described from a User perspective, they make the value visible and they can easily be estimated (refer to the INVEST acronym).
Just agreeing on this Unit of work helped us a great deal in aligning the team efforts on customer demand. Rather than pushing system capabilities to the user so she can work out how to use these capabilities to get the job done, the User Stories pulls development from customer demand. This exactly converge with one of the main point raised by lean thought leader Daniel T Jones during 2013 edition of Lean IT Summit : moving IT strategy from technical capabilities push mode to solutions pulled by customer demand.
It may just sound like an intellectual semantic twist but it is a major change in the way software engineers work. Rather than defining from inside out, the system is defined with user stories from outside in as depicted in the following illustrations:
IEEE 830 Software Requirements Specifications : the system should do this and that and this as well
User Story : As a {Role} Ben can do {action} and obtain {result as business value}
Implementing User Stories has proved to be very challenging, most importantly with the software engineers as it has completely changed their perspective on their daily work.
To succeed with implementations the following has been critical:
- Train people to User Stories
- Coach functional analysts to help them define the right user stories
- Initiate functional analysis sessions for all roles (development, tests, UX) to contribute in User Story design and make it Ready To Develop.
There are some technical areas where User Story design is not easy. In that case we have challenged the software engineers to see if it was possible for them to reframe their thinking and integrate technical solutions into User Stories, from the user perspective. When it was not (which hardly ever happened) we let technical specifications approach take precedence. But we succeeded in changing teams default behavior to design with user stories.
Daily follow-up
Initially, like many other software development teams, we thought we were harnessing our process because we were doing these long and formal (read obedience and power games) weekly meetings. Having managers around the table and formal minutes written made us believing we were controlling our project. But guess what: we just did not control anything as the 18 months delay of version V1 proved.
In a first step (V2), we applied Scrum: all teams applied 15mns morning Scrum meeting and we scaled it up at project level.
- 9:30: Scrum meeting for each of the six application development teams, with developers but also testers and functional analysts
- 9:45: Project Scrum with all Scrum Masters, Functional Analysts, the test lead, lead architect and managers from the different function teams (teams were component driven, e.g.: Business Layer, Application Layer, Web Layer etc …)
But then it became clear that we were missing information. A developer from business layer team Scrum A didn’t have much possibilities to coordinate actions with another business layer developer from Scrum B. Similarly a tester in Scrum Team A didn’t have chance to share information of what she was doing with another tester from team B. This resulted in duplicate work, coordination issues etc …
So in a second step (version V3) we decided to add a supplementary morning Scrum from function teams between team Scrums and project Scrum. The new scheduled was then as follow:
- 9:30: Scrum meeting for each of the six application development teams, with developers but also testers and functional analysts
- 9:45: Function Scrum. Each functional team (Business Layer, Application Layer, Test) has a meeting to reconcile information form the different application teams. This helped weave the information from a matrix perspective: application Vs function.
- 10:00: Project Scrum with all Scrum Masters, Functional Analysts, the test lead and managers from the different function teams
At the end of the last meeting, at 10:15 AM in the morning, everybody (including the Project Manager) has a clear understanding of where the projects stands, what the main problems are, who is in charge and what are the odds for the team to succeed with the objectives of the sprint.
Figure 3: Scaling the daily follow-up to the whole project
In all fairness, this is what I wanted to achieve initially, as I saw this organization in Lean From The Trenches, the book by Henrik Kniberg. But I thought that this would provoke rebuke by the team as it may have looked like a killer in terms of communication overhead. The great thing is that the teams saw the benefit of daily meetings as of version V2. After failing in organizing weekly 1 Hour meetings then 30 minutes bi-weekly meetings to tackle their function issues, they naturally decided to go toward 15mns daily meetings.
So rather than imposing it before hand, I nudged the team so they reached that conclusion by themselves. It took one release and six months more than if we imposed that solution from the start, but as this is the team’s suggestion and not the coach’s, it now lasts.
Visual Management
At the end of V1, we questioned different people to find out if they knew where project information was stored. Their answers showed that they did not know where it was and could not tell during V1 what the project status was and what they have to do every day to fix it.
In order for people to know where they stand with the project, we went for visual management at project level. Likewise daily follow-up, we scale scrum visual board approach up at project level. But rather than following-up on tasks, as we do in application team scrum, we were following the status of our process unit of work: the user story.
We had this big board (picture) where each Scrum Master could share the status at the User Story level with Functional Analyst, project manager, UX, Release Manager, Lead Tester.
Figure 4: Development board
Each line represents one application team and each column a different steps of our process. The objective is to move User Stories from left to right fast with a great level of quality. Note the small label at the top: these are the explicit rules allowing to move tickets from one phase to the next. Note that from a technical point of view that this is a Pushed flow from upstream process, not a Pull flow whereby work is pulled by customer demand (standard Lean approach of flow).
Each external problem (i.e. that can’t be dealt with by the Scrum team during the morning Scrum as there are external dependencies) is raised and dealt with by whoever can help at project levels. Thus, issues are addressed on a daily basis and prevent teams from being stuck at one point. Problems are visually identified by pink Post-its with the name of the person in action. For some reason, it looks like people take much faster action when the problem is flagged on a board with their name on it than when it is in a mail in some minutes that nobody reads anyway. Go figure ;-)
In a first step, for V2, we have only followed the development process on the board.
In a second step as of V3 we have introduced supplementary boards:
- For function teams for their respective daily follow up meetings: Business Layer, Web Layer, Integration. The unit of work here is the component being modified (development) or the Test campaign for integration team
- For functional analysis follow-up. More on this in the following section
4. Functional Analysis Formal follow-up
As we were implementing the consolidated daily Project Scrum focusing on development, it appeared during V2 that we had many issues related to User Stories that were entering the Development phase without actually being Ready To Develop. Some had pending UX issues, some could not be properly estimated by the teams as they didn’t have enough time to think about it before hand. As a result there were many exchanges back and forth between the Scrum Master and the Product Owner during development, many bugs raised because of User Stories not being understood ; in other words: much waste.
So for V3 we formalized the Functional Analysis phase, aiming at formally bringing User Stories to the Ready To Develop status. This came in two forms.
First we had a weekly Functional Analysis session for each team with all types of participants (beyond the Functional Analyst) : developers, testers, UX engineers and the Functional Architect. The objective of this meeting was to address a limited number of user stories and bringing them to the Ready To Develop Status. All the issues that can be addressed internally were dealt with during this meeting. Any external dependencies (cross-functional consistency, UX) was then dealt with by Functional Architects and UX respectively.
Then, on top of the development board, we set up a second board dedicated to Functional Analysis, so that we can discuss this in the morning project scrum. As a result, the 10 AM Project Scrum was not only giving a clear consolidated status on User Story development for the current sprint but also User Story analysis being prepared for the next one.
Figure 5: Functional Analysis board (User Stories Design)
Again, there is one line per application, one column per step in the process and all rules to move the User Story from one step to another are made explicit. The objective is to bring User Stories to the Ready to Develop column on the right for those to be efficiently dealt with at development stage.
Embedded Tester
In a first step (Version V2) we set-up cross functional development team (see article Bug Fixing Vs Problem Solving – I won’t elaborate in this article about the benefits). In order to improve quality and the efficiency of the team we decided for version V3 to embed one or two testers inside the developer team.
Up until then, the testers were in their own area getting one release every week and running tests on those while mostly interacting with developers via mail.
Having the tester seated amongst the developers was a major change. The major role of these embedded testers was to:
- Participate in the functional analysis stage, and give their approval on status Ready to Develop for User Story. Which means that the tester understands the US and knows how she will test it
- Write the tests that will make sure the User Story quality is good and it is DONE
- Provide a status of the testing / issues, bugs etc. … during both the morning daily scrum of the team but also in the functional Integration Team Scrum (right after the Application Team Scrum)
- Execute tests every day on given releases to find bugs as soon as possible and have them fixed
- Coordinate the validation of the user story by functional analysis and UX rep
This has just works wonder for many reasons. In retrospect I think the major one has been that developers witnessed the consequence of their bad habits related to quality every day. We have a great story related to this.
In one team, the tester (call him Terry) had to take 2 weeks holidays and there was no tester left that could be assigned to this team. So the developers knew that they could not complete their scrum and they self-organized to solve this issue. They picked one of them (let’s call him Bob) to replace Terry. Bob spent 2 days with Terry prior to his departure to understand the actual standard of work (how to write a correct test, how to execute it, what are the important point of control, etc. …). He then did the work for 2 weeks and then spent another day to hand over upon Terry return.
During his time as a tester, Bob has learned the hard way how irritating it was to get an incorrect release delivered and he developed a deep understanding of how frustrating Terry job could be just because of developer’s inconsistencies. Back to his developer job and enlightened by his two weeks experience, Bob then submitted many improvement proposals to his fellow developers to ease the Terry’s daily work.
Beyond quality improvement, it was inspiring to see that testers were much more engaged as they were involved throughout the whole User Story lifecycle, from definition to final integration. They no longer see themselves doing the dirty work at the end of the value chain line.
The best thing out of this integration was what the Test Manager told me : “I was relunctant at first with having my people spread geographically in all these teams. But it was the right thing to do we are now much more engaged and efficient”.
End of Sprint code freeze
We wanted to understand why teams were releasing software with known bugs to Integration Testing phase. What most Scrum Masters told us: because we don’t have enough time and we have to deliver value through features.
We asked the Scrum Masters and teams what would help them in delivering features without bugs and their answers converged to « having an extra week ». So we decided that during our 4 week sprint, the last week will be fully dedicated to code freeze to make sure the team has the means to:
- deliver User Stories without any bugs and
- dedicate enough work to design and make next sprint user stories Ready To Develop from their end.
Practically, this means that a team of 5 developers was only taking 5 developers x 5 days x 3 weeks = 75 days of estimated work per sprint (out of their 100 days availability).
I know this is not lean as it us just spreading small waste phases throughout the different sprints as opposed to the end of the development as it was in V1. But we had to do some compromise as there were many new changes in the way the team works: it was a bit of give and take.
It has been a hard battle with management who (rightly from the activity driven cost analysis perspective) argued that the cost of development all of sudden increased by 25%. We had to explain that we wanted to check our counter-measure and see if that would have benefits on our total lead time.
The need to have zero bugs for User Stories to be DONE has made the process much demanding in terms of rigor and quality. The code freeze week has allowed the team to achieve this objective.
This DONE definition has also helped in making the issues the teams were facing with their optimistic estimates visible. Soon they started to adjust and make more relevant estimates for each user story.
As the team improved in terms of estimates and became better organized for Functional Analysis, we have removed the end of Sprint Code freeze week from our sprints which are now 3 week long. The team is now mature in terms of quality. This step can be seen as some scaffolding on the way to build predictability.
Stop the line
The Stop the Line described here is not the real Lean Stop The Line (aka Andon) where you stop the chain and ask from support by the manager at the first issue. Our Stop The Line was just to instill the idea in application development teams that keeping on developing new User Stories while there are pending bugs is bad. So the rule was that development had to stop if there was more than one major bug open per developer. In a team of 6 with one tester, 5 bugs is the limit from which development stop to focus exclusively on bugs.
Again this is not a pure lean rule as you would stop as soon as there is one bug but, again, we had to put some water in our wine for it to be accepted by teams.
This had a tremendous effect on quality. The end result was that since the bugs backlog was almost null when we arrived to integration code freeze phase, we magically met the deadline. We still had some Non Value Added activity but at least we were meeting the deadline.
Check
We have gradually implemented these 7 organization changes throughout two versions and we could see the result in terms of delay, not to mention that the Product Owners were pleased with the result.
Prior to getting into the results, it might be worth showing how these changes have been gradually introduced throughout the Lean Software Development (LSD) implementation:
(Click on the image to enlarge it)
Figure 6: Gradual implementation of changes throughout the versions
The results are depicted in the diagrams below.
Quality
Figure 7: Evolution of quality
The evolution of the quality ratio (Nbr of bugs remaining at release time divided by nbr of man/days of development) proves that the conjunction of embedding the testers in the team and applying the Stop The Line principle had tremendous results in terms of quality.
Note that a real Lean approach would have tested the two counter-measures separately to fully understand the impact of each on quality the performance.
Delay
Figure 8: Evolution of delays
What the people said
Just like what we did at the start of the project, we had many one-on-one interviews at the end of this version. We discussed every single person involved in the project.
With this version, we have learned that it is possible to deliver software on time with an excellent quality level.
We have been using the product extensively during a whole day without finding a bug. This builds trust with R&D.
The consistent integration of all applications makes it very easy for us Marketing to tell a great story to our customers.
It makes us grow to see that we have a global vision of the project, everyday, and to see how we will make things happen.
The team has contiually improved. We have adapted the process to our specific needs and this has allowed us to solve many problems.
The fact the whole team is involved on User Stories from design to test and integration allows everybody to take ownership of what we develop. This is very motivating.
Show Me the Money
Whenever we describe a return of experience to Michael Ballé, the famous French lean expert always asks: “So what? Show me the money?”
The starting point was 100% late on the version V1. Should we have done the same with V3, a 7 months project involving 70 people, the same development effort would have cost (70x7=) 490 man month more. Therefore, we saved 490 men months, which is about 40 FTE (Full Time Equivalent). The very same software has cost the organization 40 FTE less than it would have 2 years before. I let you convert a FTE in your own currency and salary range. For our company this is slightly over 2.5M€ saving. For each version.
Time-to-cash also is impacted. When you deliver software on time rather than 7 months late, the money generated by the sales are in your company account 7 months earlier. It also makes a difference.
Act – what we have learned
The most important take-away from this project is that ensuring quality at every step has spectacular effects on reducing the waste in your process, removing delays and on improving team engagement. The implementation of our Stop the Line process has proved instrumental in making our whole process more efficient.
The second take-away is that making the process visible helps a great deal in project collective ownership by the team. This helps not only in making problem visible but also in transforming the dynamics of the team. When all the User Stories are blocked in Testing column is great to see all developers spontaneously offering their help to the test effort, something which that would have seemed impossible months before.
The third take-away is that small a chain of consecutive stand-up meetings is a great way to share a clear status of projects while aligning the whole project teams on common priorities. Besides, it has also helped in building trust among the teams. I remember this scrum master coming to the project status meeting every day for a week saying they have not been doing any new development as they are fixing bugs because of the Stop The Line rule. It takes a lot of courage to just say that but also benevolence from other people in the team to hear it without commenting or lamenting the lack of quality. When this happened, I felt very proud of the whole organization as I knew we were on the right track: the culture has changed from blame shifting to problem solving.
The fourth take-away is that change management is a slow process and it requires patience, active listening and benevolence from the change agent. The thing I have found working the best for me has been one-on-one 30 mns coaching sessions with all different roles : Scrum Masters, Tests Lead, Functional Analysts, Project Leaders, Managers. When there is a change initiative there is a lot of anxiety (refer to seminal work by Edgar Schein on that topic). You must offer a space for people to relieve their anxiety and these one-on-ones has worked extremely well on our project.
Next Steps
In retrospect, I think we have missed two things at the very heart of a Lean system in that project: pull flow and daily problem solving using PDCA with teams and A3 thinking with managers.
Pull flow organizes the work according to customer demand: no User Story progress to the next step unless the next step of the process is available and capable of addressing it. Though we have some boards to manage our User Stories, this was not a Kanban. We still had a push flow with no limit in WIP and no signal from downstream to control the flow. This has brought us in situations with testers being drowned with developed User Stories to validate.
Despite the results, there are still challenges left and further impediments to remove: some room for improvement, as it always is with Lean, but the team is now on the right track.
About the Author
A Lean IT Coach at Operae Partners, Cecil Dijoux is an IT professional with 25 years international experience. Passionate about 21st century Management (Lean, Agile, Enterprise 2.0), Cecil blogs in french and english here about organizations cultures in an interconnected world. One of his blog article has been discussed by The New York Times online and Read Write Enterprise. Cecil also happens to be an international speaker and the author of "#hyperchange- petit guide de la conduite du changement dans l'économie de la connaissance" a french downloadable e-book on change management in the knowledge economy.