BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Book Review and Q&A - The Art of Scalability

Book Review and Q&A - The Art of Scalability

Art of Scalability, The: Scalable Web Architecture, Processes, and Organizations for the Modern Enterprise, 2nd Edition, by Martin L. Abbott and Michael T. Fisher, is a book on scaling organisations and products to adapt to web scale growth of their products and services. As well as having technical and architectural implications, scale needs to be dealt with on the organizational level. The goal of the book is to show the reader how to organize technology, people and processes to result in a virtuous circle, a path of continuous improvement to scalability.

The recently published second edition adds real-world cases from Apple to Spotify and picks up topics that gained importance since the first release, such as cloud and DevOps, so InfoQ took the opportunity to speak to the authors.

InfoQ: Scalability is seemingly a technical problem. How did you come up with the idea to write this book from the organisational perspective?

We felt the same way 8.5 years ago when we started our firm. We were just going to help companies with their technology/product architectural needs. But time and time again we identified that while the symptoms of scale limitations manifested themselves within the product architecture, the causes were equally likely to be organizational or process in nature. As such, we started focusing on organizations, process and architecture to provide the greatest opportunity to keep problems from happening in the future.

InfoQ: So who should be reading your book in the first place, IT specialists or management?

We think the book is written to offer something for product focused engineers, engineering management, engineering executives, product managers and product executives. One of the premises of the book is that engineers need to be business savvy and that managers and executives need to be product/engineering savvy. The book is focused on those who want to create the best products - the same folks who find value in understanding both business and product development.

InfoQ: You write that people are the most important part of scaling a system. Can you please elaborate?

We often joke that the robot apocalypse as envisioned in movies like "The Terminator" has not yet happened. As such, people are the ones who define products and either explicitly or implicitly their scale limitations. They are, therefore, the most important part of the scale equation.

InfoQ: Apart from people what are the three most important aspects for scalability and why is this so?

  • Process: Having the right amount for your company for its maturity. Too much and you kill innovation. Too little and you invite disaster. The right amount of process forms the guide rails that help keep our product initiatives on the road even when we as humans make mistakes.
  • Architecture: Specifically ensuring the architecture can answer the questions of how does it scale and how does it fail? This last one is the one we miss most often. We design things to "work" - not to "fail". But when we fail to understand how our solutions break, they break unpredictably.
  • Organization Structure and Ownership: Building teams that align with and completely own the services they create both engenders higher levels of scalability and higher levels of innovation.

InfoQ: With broad availability of elastic clouds isn't scalability already solved for us by the cloud providers?

This is a common and dangerous misconception. Elastic clouds reduce the cost and time to market of expanding our solutions "on demand". They do not architect our solutions to scale or make them more highly available. Adding more servers, more capacity, etc - yes. Making a solution less likely to fail or more easily sharded into discrete components - no.

InfoQ: Given all these aspects and the fact that transforming an organisation for scale is a systemic endeavour, where should such a transformation begin?

At the top - with management, share and stake holder approval. No change of this magnitude will ever be successful unless the entire team agrees it must happen.

InfoQ: Companies currently not only search for a scalable architecture but at the same time demand a substantially shorter time to market. What is your suggested approach to achieve both?

Read our book :) Seriously - many of the solutions we offer to help companies scale also help decrease the time to market. Services development, for instance, helps teams own discrete components of an architecture cradle-to-cradle. Assuming that the teams do not conflict on ownership of a service, friction in development is decreased and time to market shortens.

InfoQ: You write about ITIL and DevOps. If you see these approaches on the opposite side of the agility dimension, what mix fits best for scalability?

Yes and No. ITIL and the ITSM have some powerful concepts embedded within them, such as the notion of the separation of problem and incident management. If taken to an extreme (see process above), it can be disastrous. But from a concept perspective, their thoughts in these areas are not only powerful but necessary. It is absolutely possible to take ITIL processes and embed them within DevOps. There are other areas where we think they are in stark contrast and where we favor the DevOps mindset. For instance, from the perspective of workflow management for every day work (build a server, etc) we are not big fans of ITIL processes.

Key points from the book

The book explores scalability from the perspectives of people, processes, architecting.

People: The first part of the book is a short introduction on management. The authors emphasize just how important HR topics are and that the right person in the right job at the right time and with the right behaviors is essential to scale. They give concrete advice:

  • They suggest to structure your organization with the following points in mind:
    • The ease with which additional units of work can be added to the organization.
    • The ease of measuring organizational success and individual contributions over time.
    • How easily goals can be assigned and owned by groups, and whether the groups are empowered to deliver these goals.
    • How conflict will be handled within and between groups, and whether this will help or hinder the company mission.
    • How the structure will help innovation and time to market.
    • How the organizational structure will decrease the cost per unit of value created.
    • How easily work flows through the organization.
  • While very small teams obviously do not provide enough resources to accomplish the priorities of the business, overly large ones can cause productivity loss and degraded morale, with a suggested span from 6 to 15 people per team.
  • Only spend 5% of your project management time creating detailed project plans and 95% of your time developing contingencies to those plans.
  • Focus on achieving results in an appropriate time frame, rather than trying to fit activities to the initial plan.
  • Scalability metrics should include measurements of availability, response time, engineering productivity and efficiency, cost, and quality.
  • Calculate the cost of outages and downtime to demonstrate the need for a business culture focused on scalability.
  • Technologists must take responsibility for putting scalability initiatives in terms that non-technology business leaders can understand.

Processes: The authors describe ITIL and CMMI, but go on that a fit is necessary between the processes and an organization in order to avoid culture clashes or bureaucratic exuberance. They suggest to let the team determine the right amount of process.

As an example they cite capacity planning and headroom calculation: "It is important to know your headroom for various components because you need this information for budgets, hiring plans, release planning, and scalability project prioritization. Headroom should be calculated for each major component within the system, such as each pool of application servers, networking gear, bandwidth usage, and database servers." The authors recommend not to plan on using more than 50% of the maximum capacity on any component. Depending on the type of the component this number could be increased 60%, but they consider a planned 75% load an absolute maximum.

Architecture: Here the authors summarize their 15 most adopted architectural principles:

  1. N + 1 Design. Never have less than two of anything, and remember the rule of three.
  2. Design for Rollback. Ensure you can roll back any release of functionality
  3. Design to Be Disabled. Be able to turn off anything you release.
  4. Design to Be Monitored. Think about monitoring during the design phase, not after the design is complete.
  5. Design for Multiple Live Sites. Don't box yourself into one-site solutions.
  6. Use Mature Technologies. Use things you know work well.
  7. Asynchronous Design. Communicate synchronously only when absolutely necessary.
  8. Stateless Systems. Use state only when the business return justifies it.
  9. Scale Out, Not Up. Never rely on bigger, faster systems.
  10. Design for at Least Two Axes. Think one step ahead of your scale needs.
  11. Buy When Non-Core. If you aren't the best at building it and it doesn't offer competitive differentiation, buy it.
  12. Commodity Hardware. Cheaper is better most of the time.
  13. Build Small, Release Small, Fail Fast. Build everything small and in iterations that allow the company to grow.
  14. Isolate Faults. Practice fault-isolative design, implement circuit breakers to keep failures from propagating.
  15. Automation over People. Build everything to be automated, never rely on people to do something that a robot can do.

The first edition of the book has popularized the AKF Scale Cube, a three-dimensional model for analyzing and solving scale topics, where:

  • The x-axis represents the cloning of entities or data and an equal unbiased distribution of work across them. It tends to be the least costly to implement, but suffers from constraints in instruction size and data set.
  • The y-axis represents separation of work based by activity or data. It tends to be more costly than the x-axis but solves issues related to instruction size and data set size in addition to creating some fault isolation.
  • The z-axis represents separation of work biased by the requestor or person for whom the work is being performed. It tends to be the most costly to implement but very often offers the greatest scale. It resolves issues associated with data set size and may or may not solve instruction set issues. It also allows for global distribution of services.

The AKF Scale Cube has for example been applied in the NGINX Introduction to Microservices.

About the Book Authors

Martin L. Abbott is a founding partner at the growth and scalability advisory firm AKF Partners. He was formerly chief operations officer at Quigo, an advertising technology startup sold to AOL, where he was responsible for product strategy, product management, technology development, and client services. Marty spent nearly six years at eBay, most recently as senior vice president of technology, chief technology officer, and member of the executive staff. Prior to his time at eBay, Marty held domestic and international engineering, management, and executive positions at Gateway and Motorola. He has served on the boards of directors of several private and public companies. Marty has a B.S. in computer science from the United States Military Academy, has an M.S. in computer engineering from the University of Florida, is a graduate of the Harvard Business School Executive Education Program, and has a Doctor of Management from Case Western Reserve University.

Michael T. Fisher is a founding partner at the growth and scalability advisory firm AKF Partners. Prior to cofounding AKF Partners, Michael was the chief technology officer at Quigo, a startup Internet advertising company that was acquired by AOL in 2007. Before his time at Quigo, Michael served as vice president, engineering and architecture, for PayPal, Inc., an eBay company. Prior to joining PayPal, he spent seven years at General Electric helping to develop the company’s technology strategy and was a Six Sigma Master Black Belt. Michael served six years as a Captain and pilot in the U.S. Army. He received a Ph.D. and an MBA from Case Western Reserve University’s Weatherhead School of Management, an M.S. in information systems from Hawaii-Pacific University, and a B.S. in computer science from the United States Military Academy (West Point). Michael is an adjunct professor in the design and innovation department at Case Western Reserve University’s Weatherhead School of Management.

Rate this Article

Adoption
Style

BT