BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Obscuring Complexity

Obscuring Complexity

Leia em Português

Key Takeaways

  • If done well, Model Driven Software Development can partially obscure some complexity, but you are going to have to treat the source code output as build artifacts and take ownership of the templates. Maintaining code generating templates is a kind of meta-programming that most developers are not used to doing.
  • Twelve Factor applications can truly achieve lower complexity, but only when integrated with mature or stable (i.e. boring) data stores.
  • You can lower complexity in microservice orchestration by building in some limited, cross cutting intelligence into the connectors, but you must be careful because too much intelligence creates the opposite effect.
  • You can write smaller applications when using a heavyweight framework, but beware that efficiency may suffer, that these kinds of services are harder to tune for performance, and that it may take longer to debug certain kinds of issues.
  • With reactive programming, you end up trading backend complexity for frontend complexity.
     

One of the most important things that software architects do is manage the complexity of their systems in order to mitigate release disruption while maintaining sufficient feature velocity. Systems can be simplified, but only by so much. Modern software must meet a lot of sophisticated demands in order to be competitive. When we cannot reduce complexity, we try to hide or shift it.

Software architects tend to manage that complexity with the following time-honored strategies:

  • They can decrease the size of their applications by either reusing generic frameworks or by using programmatic code generators.
  • They make it easier to scale out their systems by keeping close tabs on application statefulness.
  • They design systems that degrade gracefully under increasing load or partial outage.
  • Finally, they normalize workloads by moving to eventually consistent systems.

Let’s go into more detail on these different strategies, and the historical context under which each strategy was formed, in order to better understand the advantages and disadvantages to each approach.

For each strategy, I mention complementary example technologies for Java, JavaScript, Python, and .NET developers. These lists are by no means complete, so please accept my apologies if I don’t mention your favorites.

It’s Only a Model

MDSD, or model-driven software development, increases feature velocity because the developers save time by writing less boilerplate code. Instead, a model is specified, and a generator combines the model with templates of the boilerplate code to generate code that the developers used to have to write by hand.

My first exposure to MDSD was back in the days of CASE (computer-aided software engineering). It resurfaced when UML (unified modeling language) was at its peak. The issue with MDSD back then was that it was being pitched to generate all the code, meaning the type of model needed to capture all possible requirements was so complex that it was easier just to write the code.

MDSD is making a resurgence due to a technology called Swagger (new versions of the modeling specification are now curated by the OpenAPI Initiative), where you specify a model just for what your APIs look like. The model and a set of templates are input to a generator, which outputs boilerplate code that surfaces the APIs. There are separate templates for code that produce and consume the API being modeled, and starter templates are available online for just about any relevant technology stack.

For example, examine the Swagger templates for Spring Boot, which generate the REST controllers, Jackson annotated request and response POJOs, and various application boilerplate code. It is up to the developers to add the logic and implementation details (e.g. database access) needed to satisfy the requirements for each API. To avoid engineers having to modify the Swagger generated files, use either inheritance or dependency injection.

How can MDSD obscure the complexity of your application code? It is tricky but it can be done. The generator outputs the code that implements the API resources, so the developers don't have to worry about coding that. However, if you use the generator as a one-time code wizard and commit the output to your version-controlled source code repository (e.g. git), then all you did was save some initial coding time. You didn't really hide anything, since the developers will have to study and maintain the generated code.

To truly obscure the complexity of this code, you have to commit the model into your version-controlled source code repository, but not the generated source code. You need to generate that output source from the model every time you build the code. You will need to add that generator step to all your build pipelines. Maven users will want to configure the swagger-codegen-maven-plugin in their pom file. That plugin is a module in the swagger-codegen project.

What if you do have to make changes to the generated source code? That is why you will have to assume ownership of the templates and also commit them to your version-controlled source code repository. These are mustache templates, which look like the code to be generated with curly brace-delimited substitution parameters and decision branch logic sprinkled in them. Template programming is a form of meta-programming that is quite advanced.

In the end, the best you can hope for with MDSD is that it can obscure complexity for your junior developers, but at the expense of having your senior developers support those templates.

Model-Driven Software Development
Pros Cons
  • Smaller application size
  • Increased feature velocity
  • More complicated bulid pipelines
  • Templates must be maintained

On the Origin of Computer Systems

In 2011, the folks at Heroku published a curated collection of best practices for writing modern, cloud native, service-oriented software. They called it the Twelve-Factor App. For a better understanding of why these twelve factors truly reduce complexity, we briefly review the history of how computer systems have evolved from simple, single machine setups, to complex clusters of connected virtual machines in a software defined network.

For a long time, applications were designed to run on a single computer. If you wanted the app to handle more requests, then you had to scale the app up by installing it on a bigger computer. Systems evolved into two-tier applications where hundreds of users would run a specialized client program on their desktops that connected to a database program running on a single server.

The next step in the evolution of computing was three-tier systems where the client programs connected to an application server that would access the database server. Web applications replaced client-server applications because it was easier to deploy the client portion (assuming that everyone's computer had a modern web browser installed on it) and you could accommodate more users connected to the system. Scaling up (replacing one computer with a bigger computer) became less attractive than scaling out (expanding from one computer to many computers). In order to handle the additional user load, that single app server was scaled out to a cluster of computers running behind a load balancer. Database servers could be scaled out by techniques known as sharding (for writes) and replication (for reads). Back then, all of these servers were deployed either on the premises of the company that used them, or in a rented data center.

For about thirty years, the most viable option for database software was relational databases, also known as SQL databases because application code communicates with them via commands written in the structured query language. There are many great relational databases available to choose from. MySQL, MariaDB, and PostgreSQL are popular open source databases. Oracle and MS SQL Server are popular proprietary databases.

A proliferation of other options have become available in the past decade or so. There is now a category known as NoSQL databases which includes wide column databases such as Cassandra, key value databases like Aerospike, document databases similar to MongoDB, graph databases such as Neo4j, and Elasticsearch-style inverted indexes. Even more recently, multi-model and distributed databases have gained some popularity. With multi-model databases, you can call both SQL and NoSQL APIs on a single database installation. Distributed databases handle sharding and replication without any additional complexity in the application code. YugaByte and Cosmos DB are both multi-modal and distributed.

With the advent of the cloud, companies no longer had to employ engineers who knew how to assemble and cable in racks of computers or sign five-year lease agreements with computer manufacturers and managed hosting service providers. In order to truly realize these economies of scale, the computers were virtualized and became more ephemeral. Software had to be redesigned to more easily accommodate all these changes in how it was deployed.

Typical deployment where two microservices and a database are scaled out.
Typical deployment where two microservices and a database are scaled out.

Applications that properly follow these twelve factors easily handle this proliferation of hardware with minimal complexity. Let's focus on factors 6 (processes), 8 (concurrency) and 9 (disposability).

You will be able to scale out your application more easily if the app is designed to execute on many stateless processes. Otherwise, all can you do easily is scale up. That is what factor 6 is all about. It is okay to cache data in an external cluster in order to speed up average latency or to protect the underlying database(s) from getting overwhelmed, but the cache should never contain any data that isn’t already in the database(s). You can lose the cache at any time without losing any actual data.

Factor 8 is about concurrency. These processes should be grouped in clusters such that each cluster of processes handle the same type of requests. Your software will be much simpler if these processes do not share any state other than using database(s). If these processes share internal state, then they have to know about each other, which will make it harder and more complicated to scale out by adding more processes to the cluster.

Your application will be more responsive to changes in load and robust to destabilization if each process can quickly initialize and gracefully terminate. This is factor 9, disposability. Dynamic scaling gives you the ability to quickly and automatically add more processes to handle increased load, but that works only if each process doesn’t need to take a long time to start up before it is ready to accept requests. Sometimes a system will destabilize, and the quickest way to resolve the outage is to restart all processes, but that works only if each process can terminate quickly without losing any data.

You will avoid many bugs and brittleness if you architect your systems in such a way that the stream of inbound requests is handled by many processes running concurrently. These processes can, but do not have to be, multi-threaded. These processes should be able to start up quickly and shutdown gracefully. Most importantly, these processes should be stateless and share nothing.

Obviously, there is not a lot of demand for applications that cannot remember anything, so where does the state go? The answer is in the database, but database applications are software too. Why is it okay for databases to be stateful, when it is not okay for applications to be stateful? We have already covered that applications need to be able to deliver on a fairly fast feature velocity. The same is not true for database software. It takes a lot of engineering time, thought, and effort to get stateful done right at high load. Once you get there, you don't want to make a lot of big changes, because stateful software is very complex and easy to break.

As mentioned earlier, there is a large proliferation of database technologies, many of which are new and relatively untested. You can get some degree of usefulness out of a stateless application after only a couple of engineering months of effort. Some or most of those engineers can be recent graduates with little professional experience. A stateful application is completely different. I wouldn't bet on any database technology that didn't have at least two decades of engineering effort in it. (That’s engineering decades, not calendar decades.) Those engineers have to be seasoned professionals who are very smart and have lots of distributed computing experience and monster computer science chops. If you use an untested or immature database engine, then you will end up introducing additional complexity into your application in order to work around the bugs and limitations of the immature database. Once the bugs in the database get fixed, you will have to re-architect your application to remove the now unnecessary complexity.

Innovating Your Technology Stack
Pros Cons
  • Adopting new technology that is stateless can be fun and affords a competitive advantage with little risk.
  • Adopting new technology that is stateful is very risky.
  • It will most likely increase complexity for your apps, instead of decreasing it.
     

It is a Series of Tubes after All

As systems evolved from a single application to clusters of interconnected applications and databases, a body of knowledge was cataloged to advise on the most effective ways that these applications can interact with each other. In the early 2000s, a book on enterprise integration patterns (or EIP) was published that more formally captured this body of knowledge.

Back then, a style of service interaction known as service-oriented architecture became popular. In SOA, applications communicated with each other through an enterprise service bus (ESB) that was also programmed to manipulate the messages and route them based on configuration rules that closely followed EIP.

Workflow engines are a similar technology, based on Petri Nets, that was more business-focused. It was sold on the premise that non-engineers could write the rules, but never truly delivered on that promise.

These approaches introduced a lot of unnecessary and unappreciated complexity which caused them to fall out of favor. Configuration grew to a complex tangle of interconnected rules that became very resistant to change over time. Why is this? It’s the same issue as getting MDSD to model all requirements. Programming languages may require more engineering knowledge than modeling languages, but they are also more expressive. It’s a lot easier to write or understand a previously written small snippet of code to handle an EIP requirement, than to author a large and complicated BPMN model specification. Both Camel (an Apache project) and Mulesoft (acquired by Salesforce in 2018) are ESBs that attempt to simplify their respective technologies. I hope that they succeed.

The reaction to ESB / workflow-flavored SOA became known as MSA, or microservice architecture. In 2014, James Lewis and Martin Fowler summed up the differences between MSA and SOA. With SOA, you had dumb endpoints and smart pipes. With MSA, you had smart endpoints and dumb pipes. Complexity was reduced, but perhaps by too much. Such systems were brittle and non-resilient (i.e. easily destabilized) during times of partial failure or degraded performance. There was also a lot of duplicity in the separate microservices that each had to implement the same cross-cutting concerns, such as security. This is true (although to a lesser degree) even if each implementation simply embeds the same shared library.

What followed was the introduction of API gateways and service meshes, both of which are enhanced versions of layer 7 load balancers. The term “layer 7” is a reference to the OSI or Open Systems Interconnection model that was introduced back in the 80s.

When calls from the Internet or intranet are intended for microservices on the backend, they pass through an API gateway which handles features like authentication, rate limiting, and request logging, removing those requirements from each individual microservice. 

Calls from any microservice to any other microservice pass through a service mesh which handles such concerns as bulkheading and circuit breaking. When requests to a service timeout too frequently, the service mesh immediately fails future calls (for a while) instead of attempting to make the actual calls. This prevents the unresponsive service causing the dependent services to also become unresponsive due to all of their threads waiting on the original unresponsive service. This behavior is similar to a bulkhead on a ship preventing a flood from spreading beyond one compartment. With circuit breaking, the service mesh immediately fails calls (for a while) to a service that has been failing most of its previous calls in the recent past. The rationale for this strategy is that the failing service has gotten overwhelmed, and preventing calls to that service will give it a chance to recover.

Deploying an API gateway and a service mesh.

API gateways and service meshes make microservices more resilient without introducing any additional complexity in the microservice code itself. However, they increase operational costs due to the additional responsibility for maintaining the health of the API gateway and/or service mesh.

MSA vs SOA
Pros Cons
  • For EIP, code is simpler than configuration.
  • API gateways reduce duplicity of implementing cross-cutting concerns.
  • Service meshes increase resiliency.
     
  • ESBs make it harder to understand systems and predict behavior.
  • Systems that use workflow engines are more likely to become resistant to change over time.
  • API gateways and service meshes introduce additional operational costs.
     

March of the Frameworks

Another way to reduce the amount of code that developers have to write is to use an application framework. A framework is just a library of general-purpose routines that implement functionality common to all applications. Parts of the framework load first and end up calling your code later.

Like I mentioned earlier, relational databases were originally developed in the mid-70s, and were so useful that they remained popular throughout technology trends described earlier. They are still popular today, but using them in the web application world introduces a lot of complexity. Connections to relational databases are stateful and long-lived, yet typical web requests are stateless and short-lived. The net result is that multi-threaded services have to deal with this complexity using a technique known as connection pooling. Single-threaded applications are less efficient in this manner; therefore they have to depend more on sharding and replication.

Object-Oriented Programming became quite popular during the client-server era, and has maintained its popularity since. Relational data does not fit into the object-oriented structure very easily, so object-relational mapping frameworks were developed in an attempt to obscure this kind of complexity. Popular ORM frameworks include Hibernate, SQLAlchemy, LoopBack, and Entity Framework.

In the early days of web application development, everything was built in what later became known as the monolith. The graphical user interface or GUI (basically browser rendered HTML, CSS, and JavaScript) was generated server-side. Patterns such as MVC (model view controller) were used to coordinate GUI rendering with data access, business rules, etc. There are actually many variations on MVC, but, for the purpose of this article, I am lumping them all into the same category as MVC. MVC is still around, and popular modern MVC frameworks include Play, Meteor, Django and ASP.NET.

Over time, these kinds of applications became large and cumbersome; so large that their behavior was hard to understand or predict. Making changes to the application was risky, and releasing new versions was disruptive because it was hard to test and verify correctness of these overly complex systems. A lot of engineering time was spent rapidly fixing the buggy code that got deployed without proper vetting. When you are forced to fix something quickly, you don't have the time to come up with the best solution, causing poor quality code to slip in. The intention is to replace this poor quality code with good quality code later on.

The answer was to split up the monolith into multiple components or microservices that could be released separately. The GUI code was all moved over to what is now called SPA (single page applications) as well as native mobile apps. Data access and business rules were kept server-side and split up into multiple services. Popular microservice frameworks include Flask and Express. Spring Boot and Dropwizard are the most popular Jersey-based servlet containers for Java developers.

Microservice frameworks were originally simple, easy to learn, and exhibited behavior that was easy to understand and predict. Applications built on these lightweight frameworks got big over time due to the above mentioned complexity factors. The bigger an application becomes, the more it resembles a monolith. When they weren’t splitting up big microservices into smaller ones, architects started looking for ways to reduce application size by hiding the related complexity in the framework. Using opinionated software, annotation-based design patterns, and replacing code with configuration reduced the number of lines of code in the applications, but made the frameworks more heavyweight.

Applications that use a heavyweight framework tend to have fewer lines of code and enjoy a faster feature velocity, but there are downsides to this form of obscured complexity. By their very nature, frameworks are more general-purpose than applications, which means that it takes significantly more code to do the same work. Though you have less custom application code, the actual executable, which includes the relevant framework code, is much larger. This means that it will take longer for the application to start up as all this extra code gets loaded into memory. All that extra unseen code also means that the stack traces (that get written to the application log whenever an unexpected exception gets thrown) will be a lot longer. A bigger stack trace takes an engineer more time to read and understand when debugging. 

At its best, performance tuning can be a bit of a black art. It can take a lot of trial and error to reach the right combination of connection pool sizes, cache expiration durations, and connection timeout values. This becomes even more daunting when you don't see the code that you are trying to tune. These frameworks are open source, so you could study the code but most developers don't.

Lightweight vs Heavyweight Frameworks
Pros Cons
  • Lightweight frameworks are easier to debug and tune.
  • Heavyweight frameworks increase feature velocity and lower release disruption.
  • Applications have less duplicitous code.
     
  • Lightweight frameworks require the devs to write more code.
  • Heavyweight frameworks take longer to start up and shut down.
  • Usually means accepting the black box code within a framework
     

Eventual Consistency

Instead of synchronously processing each API request immediately, reactive systems asynchronously pass messages around to its internal subsystems in order to eventually process each API request.

It's hard to say when reactive programming was first introduced. The Reactive Manifesto was published in July 2013, but there were plenty or precursors to that. The pubsub, or publish-subscribe pattern, was first introduced in the mid-80s. Complex event processing, or CEP, briefly experienced some popularity in the 90s. The first article that I saw on SEDA, or staged event driven architecture, was published near the end of 2001. Event sourcing is a recent variation on the theme of reactive programming. Reactive systems can be coded in the pubsub style or as message flows in domain scripting languages that resemble functional programming.

When a reactive programming system is distributed across multiple computers, there is usually (but not always) a message broker involved. Some of the more popular brokers for this are Kafka, RabbitMQ, and ActiveMQ. Recently, the Kafka folks have released a client side library called Kafka Streams.

Typical deployment for a distributed, fully reactive system.

ReactiveX is a very popular reactive framework with libraries for many different programming languages. For Java programmers, there is Spring Integration or Spring Cloud Data Flow, Vert.x, and Akka.

Here is how architects use reactive programming to obscure complexity. Calls to microservices become asynchronous, which means that whatever was asked of the API doesn't have to be done when the calls return. This is also known as eventual consistency. This makes those microservices more resilient to partial outages or degraded database performance without introducing much additional complexity. You don't have to worry about the caller timing out and resubmitting while the original transaction is still running. If some resource is not available, then just wait until it becomes available again. I will admit that it can be a challenge for junior developers to debug reactive programs (especially if coded in the pubsub style), but this is mostly because they are unfamiliar with this paradigm.

So, where did the complexity go? There is a lot of complexity in modern message brokers, but you are most likely going to just be able to use one of those and not have to write your own. Like any technology, they do have their own caveats, but they have very reasonable limitations.

For application development, the complexity was moved to the frontend. Eventual consistency might be wonderful for backend systems, but it is terrible for humans. You might not care when your vacation pictures reached all your friends in your social network, but if you are an enterprise customer negotiating an interconnected multi-stage order, then you will want to know precisely when each part of your order gets submitted, validated, approved, scheduled, and eventually fulfilled.

In order for the GUI to accommodate that very human psychological need, it will need to notify the user when what was asked of the backend does complete. Since the API call isn't synchronous, the frontend will have to find out some other way. Polling the API for status updates does not scale well. That means that the web browser or mobile device will need to use a stateful and long-lived connection by which it can receive updates from the backend without any prompting. In the old days, you could extend XMPP servers to do this. For modern web browsers, there is good support for websockets and server sent events. Spring WebFlux, socket.io, and SignalR are three popular libraries that permit server-side services to communicate with client-side javascript in this manner.

Web browsers enforce limits on such connections, so the client application will need to share the same connection for receiving all notifications. Because most load balancers close idle connections, the application must account for that by occasionally sending keep alive messages. Mobile devices are notorious for intermittent connection failures, requiring reconnection logic in the client software. Also, there must be some mechanism by which the client application can associate each notification (there may be more than one) with the original API call. There still has to be some mechanism to determine the status of previous API calls for when the user returns to the application after being away.

Reactive Systems and Eventual Consistency
Pros Cons
  • Reactive systems are more responsive and resilient.
  • Reactive systems may decrease complexity on the backend, especially for data-intensive applications.
     
  • Reactive systems may increase complexity on the frontend in order for the application to be emotionally satisfying to its users.
  • Highly scalable, distributed reactive systems increase operational costs with the adoption of a message broker.
     

Conclusion

From the early days of the first mainframes to the present day with the cloud, systems have grown in complexity, and software architects have found new ways to manage that complexity. When possible, reducing complexity without sacrificing capability is the best course of action. Twelve-Factor Apps have great advice on how to do that. With EIP, reactive systems and eventual consistency, you might think that you are reducing complexity when you are actually just pushing it around to another part of the system. Sometimes you just have to hide the complexity, and there are plenty of model-based generators, frameworks and connectors to help you do that, but there are both advantages and disadvantages to that approach. As we learned with Twelve-Factor Apps and reactive systems, nothing increases complexity like statefulness, so be very wary and conservative when adding or increasing statefulness in your applications.

However they reduce, hide, or redistribute it, software architects will continue to manage complexity in order to keep delivering quality software more quickly in a world with ever-increasing demands for more functionality, capability, capacity, and efficiency.

About the Author

Glenn Engstrand is a software architect at Adobe, Inc.. His focus is working with engineers in order to deliver scalable, server side, Twelve Factor compliant application architectures. Engstrand was a breakout speaker at Adobe's internal Advertising Cloud developer's conference in 2018 and 2017, and at the 2012 Lucene Revolution conference in Boston. He specializes in breaking monolithic applications up into microservices and in deep integration with Real-Time Communications infrastructure.
 

Rate this Article

Adoption
Style

BT