"DevOps: A Software Architect's Perspective" is possibly the first long-form work looking at DevOps from an architectural perspective. That is relatively surprising considering many of the capabilities that organizations pursuing DevOps wish to achieve: availability, scalability, short time to detect incidents, short time to repair incidents, reduced cycle time. All of which require some level of architectural involvement.
The authors adopt a pragmatic view of DevOps as a set of practices that reduces cycle time and increases application quality. The book then focuses on the architectural thinking that should take place (for example, designing appropriate monitoring systems, deployment approach and pipeline or feature toggling in practice) in enterprises that embarked on a DevOps journey and how their internal organization supports or hinders adoption of practices leading to the stated DevOps goals.
The deployment pipeline is the place where the architectural aspects and the process aspects of DevOps intersect.
There is an underlying assumption in the book of a separation of responsibilities between dev and ops teams and fragmented ownership of the stack components. Consequently, the book covers roles, responsibilities and processes involved in DevOps and Continuous Delivery implementation which apply mostly in large organizations. Smaller companies that rely on cross-functional teams and lightweight processes to move faster might find those areas less useful than the technical architecture descriptions.
The architecture of the system being developed acts as the implicit coordination mechanism [between teams].
Microservices are presented as a model system architecture for DevOps, with a deep analysis of the pros and cons, including Conway's Law.
The DevOps goal of minimizing coordination among various teams can be achieved by using a micro service architectural style.
Cost/benefit analysis of deployment options, monitoring, testing and support are all covered in depth in the book, especially for microservices (as an example, should multiple services be deployed into the same VM to reduce latency while risking issues due to deployment race conditions?).
The authors acknowledge the fact that DevOps is highly contextual and organization-dependent, and thus 20% of the book describes real case studies on migrating to microservices, standardizing continuous deployment pipelines and supporting multiple datacenters in production.
Overall, making the need for sound architectural decisions in terms of deployment and operational activities explicit is probably the most important take away from this book. Architects need to be concerned about infrastructure management and the deployment pipeline as much as the actual system architecture.
InfoQ talked with Len Bass, one of the book's co-authors, to understand the motivation for this book, what does looking at DevOps from an architectural perspective mean, DevOps education, microservices and more.
InfoQ: Why did you feel the need to write about DevOps from an architectural perspective?
Len Bass: I am a system designer and I wondered what changes I needed to make in my designs to support DevOps practices. Existing books do not deal with techniques such as using feature toggles to manage version incompatibility or having the system verify its pedigree on initialization yet these are techniques that help reduce errors and support DevOps practices. The architectural perspective focuses on the system being built and its reflection of organizational and management practices. Conway's Law, for example, would say that microservice architectures must be developed by small teams (as is the practice in many organizations). This is a reflection of organizational thinking on the design of a system that was not well covered prior to our book.
InfoQ: In the book you give a goal-based definition of DevOps as opposed to a values-based one. Why?
Len Bass: A definition should satisfy three criteria -it should include things you want to include, exclude things you do not want to include and enable a decision as to whether a particular phenomenon is in or out. The value based approach is weak on the last point. Looking at a practice, how do I know whether it satisfies certain values or not?
InfoQ: One might argue that achieving goals is a natural consequence of living by a consistent set of values, so shouldn't we focus on those first?
Len Bass: I agree understanding values is important, I would just not say it is definitional. Having a definition that specifies the goals and then asking whether the values support those goals seems to me to be a better way to proceed. It gives you something to measure the values against.
InfoQ: DevOps focuses strongly on collaboration and, to a certain extent, promotes full-stack ownership (including ops environment and tools) by team members. Is there a place for enterprise architecture/architects in this setting? Do you still need particular skills to make adequate architectural trade offs in a cross-functional team?
Len Bass: Enterprise architects set up organization wide constraints that individual systems must live within. Some decisions should not be delegated to individual teams (defining the schema of mission critical databases for example). One goal of enterprise architects is to promote interoperability among the different systems in an enterprise. Ideally, these constraints are only the ones that are necessary for interoperability and consistency of business processes but these constraints will exist.
A team needs to have the skills to decide on trade offs. Whether these skills are embodied in a single individual or are shared among different members is not as a important as that the skills exist. The essence of a cross functional team is that it covers different perspectives and the members have different skill sets.
InfoQ: How much leeway is there for emergent design and adaption (frequently found in fast-paced development/delivery) when architecting things like an adequate monitoring system and metrics?
Len Bass: Activities that have system-wide impact should occur within the constraints of the enterprise. There are two types of monitoring and metrics that are relevant to a team. One is the monitoring and metrics of their particular code (service, subsystem, whatever). The team needs to have leeway to do the right thing in this case. The other type of monitoring and metrics are those with business relevance - transactions per second, latency, reliability, orders per second, etc. Monitoring needs to present both the business relevant metrics and have the ability to drill down to determine which code segments are contributing what to those metrics. Developers when they deploy a new version should first ensure that the business metrics are not affected (or are affected in predicted ways) and then that their particular new deployment is behaving well.
InfoQ: How much effort should organizations aiming at continuous delivery expect to spend on their deployment pipeline and associated requirements (e.g. automated provisioning, configuration, test automation and parallelization, etc)?
Len Bass: One of my gripes about the DevOps movement is that it is all advocacy with very little mention of the costs. I am an editor of a special issue of IEEE Software that will appear in March and one of the papers to appear in this issue identifies their costs as four people full time for a year plus ongoing costs of supporting tools. This is only one data point but I would like to see cost as a first class citizen in the presentations and writings about DevOps. The more such data points we have, the better we will be able to answer your question and the easier it will be to get management buy in for the adoption of DevOps practices.
InfoQ: What are in your opinion the most important quality attributes for a deployment pipeline? And who should be in charge of implementing and verifying those attributes are met? Development teams? Operations teams? A Platform team?
Len Bass: The relevant importance of different quality attributes is a business decision. Certainly, performance, reliability, and security are important to most organizations and the development pipeline should be monitored to give metrics about those attributes. Effort involved in establishing and maintaining the pipeline is another important attribute. Which one is most important will depend on the business an organization is in.
Who is in charge of the pipeline is also organizationally dependent. How large is the current organization and what are the current responsibilities of various groups will affect the choice of who is in charge. How much autonomy the development teams have will also affect it. In some top down organizations, the development teams operate with very specific guidance. In other organizations, development teams have much more autonomy. One of the things I have seen is that as a result of reorganizations and normal aging of systems, teams inherit code that they did not develop. The standard software engineering issues of documentation come up in this situation. If people move between teams, then uniformity in the tools being used makes it easier for them. This is one of those questions where there is no best answer but it depends on the particular organization.
InfoQ: Do you still see many organizations treating build, deploy & test architectural requirements like reliability and performance as 2nd class citizens?
Len Bass: I see some (mature) organizations that use architectural patterns sometimes supported by libraries for things like reliability and performance. I see three types of organizations although I can't tell you how many fall in each category. There are those mature organizations that have their act together when it comes to architecture and treatment of architecture. These are the poster children for DevOps as well as some not so public. There are organizations that are trying to move to better use of architecture in their development and there are those that do not have a clue.
InfoQ: Would you say the IT industry is suffering from the legacy of application-oriented computer science degrees, as opposed to a system-oriented view reflective of today's complex socio-technical systems?
Len Bass: Certainly I hear complaints (mostly indirectly these days) from industrial managers that students do not have the right set of skills. The academic business is a very conservative business and it is difficult to change curricula. I think that complaining about skill lack at DevOps conferences is totally preaching to the choir. These types of complaints need to be taken to places where academics congregate (e.g. Conference on Software Engineering and Training) and people need to get involved with the development of reference materials such as the SWEBOK (Software Engineering Body of Knowledge) and the ACM model curricula.
InfoQ: You are possibly one of the first college professors to have introduced DevOps and Continuous Delivery as part of a graduate computer science program. How do these topics fit in with the rest of the curriculum and what have been the reactions so far?
Len Bass: My students are quite enthusiastic and that is a pleasure to see. My course is about half oriented toward the theory of DevOps (as expressed in our book) and half oriented toward tools. The students built a deployment pipeline as their running assignment using mostly open source tools. The Master of Software Engineering at Carnegie Mellon University has a complementary course focused on building Internet scale systems and dealing with issues such as scalability, reliability, and performance in that context.
InfoQ: Do you think these topics should be included even earlier, at the undergraduate level?
Len Bass: Students should be taught topics such as version control and configuration management from their first undergraduate courses. Topics such as architecting for Internet scale systems and architecting for DevOps practices require an architecture course as a prerequisite and so senior undergraduate courses might be the correct place to introduce those topics. Topics usually start out as graduate level courses and then get migrated lower into the curriculum.
InfoQ: Going back to the complexity of today's highly distributed and inter-dependent systems, the book proposes microservices as the architecture model that best aligns with DevOps goals. What approach would you recommend for moving to microservices and what are some common pitfalls that you see on that path?
Len Bass: Migrating to microservices is a hot topic today. It should be done incrementally. Find commonalities that exist in multiple different systems and implement them as microservices. Then refactor the applications to take advantage of the microservices. This will enable an organization to gain familiarity with the practices associated with microservices and the platform requirements assumed to support them. It will also enable the organization to determine the costs associated with such a migration. Many organizations are not yet familiar with the cloud and this incremental approach enables gaining that kind of familiarity as well.
InfoQ: In the book you warn about deployment race conditions between microservices and other potential inconsistencies that need to be managed as part of the operational overhead implicit in a microservices architecture. What are some of the trade offs companies should be thinking about before committing to a microservices model?
Len Bass: Microservices can be structured either flat or deep. Flat structures reduce the reuse associated with a system but promote performance. Deep structures introduce overhead from all of the message passing but support reuse. The extent to which performance is an issue depends on the business. Deep microservices are probably not appropriate for high transaction systems. As I said before, committing to a microservices model should be done incrementally so that its appropriateness for your business can be assessed.
Most systems are hybrids of different architectural styles. Microservices can be factored out from existing systems and a hybrid style results. Do a redesign in increments and before each increment decide on the benefits and whether they are worth the costs.
InfoQ: Microservices ownership allows teams to work autonomously (a corollary of Conway's law) but when problems arise from interactions between services isn't there a risk of (re-)introducing a blame culture?
Len Bass: Cultural changes in organizations are difficult to make. Any time there is interaction between two teams there is potential for finger pointing. Teams and organizations must be vigilant about the manner in which postmortems are performed. A transaction mechanism although not in the steady state is to have trained people chair the postmortems so as to avoid destructive types of behavior. This doesn't scale so it isn't suitable for steady state but once a non-blame culture in embedded in an organization, it becomes difficult to change as to other behavior patterns.
InfoQ: A curious remark in the book, given the ever growing number of vulnerabilities, is that developers responsible for the pager is not the best solution in terms of time to detect and repair a system attack, as those require skills usually beyond their application knowledge. Any recommendations on how to make better, more informed selection of on-call duties?
Len Bass: A lot depends on the source of the pager being activated. Activating a pager based on particular application metrics makes sense to send to the developers of that application. Activating a pager as a result of system-wide metrics needs to go to someone who has system-wide knowledge. I don't think there is any one solution for who gets pager duty and I guess I would advocate multiple pagers depending on the source of the alarm.
InfoQ: Finally, why did you feel the need to include several case studies? What surprised you the most in these case studies?
Len Bass: I have always liked case studies back when I was writing architecture books. It is a way of providing concrete details in a particular context. So much of engineering is context dependent and it is difficult to express the various contexts that influence a decision in a general text. I would rather have had one more case study but it didn't work out.
I am always surprised when I do case studies by the competence of the people with whom I interact. People are serious, dedicated, and very knowledgeable about their domains. This was true in the architecture case studies and it is true for the three case studies in the DevOps book. This makes doing case studies a pleasure for me.
About the Book Author
Len Bass has over 50 years experience in the software business including 25 years at the Software Engineering Institute (SEI) at Carnegie Mellon University and three years at the National ICT of Australia. He has just published the book DevOps: A Software Architect’s Perspective which joins his two award-winning books in software architecture, including Software Architecture in Practice, 3rd edition and Documenting Software Architectures: Views and Beyond, 2nd edition. Len has written numerous papers in computer science and software engineering on a wide range of topics and done software development and research in multiple domains, such as scientific analysis systems, embedded systems, and information systems.