Key Takeaways
- Microservices are not a panacea; they have their place in modern architecture, but just not any place.
- Understanding the business domain is vital for assessing whether a microservices-based architecture is an adequate choice.
- The Single Responsibility Principle is key to delineating the functional boundary of microservices.
- Like any other architectural style, Microservices are governed by a set of tenets. It is important to understand how any deviation from these affects a given technology solution.
- Microservices must be considered within the broader contexts of distributed architecture and distributed computing.
In this article we examine the potential architectural fitness of microservices in the context of Master Data Management (MDM), and the challenges a microservices-based architecture may face when solving problem domains that require compute-intensive tasks, such as the calculation of expected losses on a portfolio of unsecured consumer credit.
We will begin by introducing a meta-model for business architecture, and use its elements and their relationships to define both business and problem domains. We will then introduce the Domain-Driven Design approach as a means to apprehend complex business domains, and assist with the creation of technology solutions.
Microservices, a new architectural paradigm, will then be presented, their anatomy described, and their benefits and drawbacks briefly articulated. The possible fitness of microservices to implement an MDM Data Hub will then be presented for consideration.
Finally, we will explore the limited conditions where microservices-based architectures may or may not be suitable for computation intensive applications involving large volumes of data.
Business Architecture Meta-Model
Figure 1 introduces a simplified business architecture meta-model, whose elements are used to define business domains throughout the document. The meta-model contains five elements, and a brief definition is provided for each of these building blocks.
Figure 1: Simplified Business Architecture Meta-Model
A Business Service exposes some functionality that fulfills a business need for a customer, whether internal or external to an organization. Business Capabilities indicate what an organization does to reach its business objectives, instead of how it does it. They are the top layer of the business architecture, belonging to the strategic business domain, and are governed by the business principles of the organization. One or many business capabilities are usually leveraged to realize one or a collection of business services. A Business Capability is reified by many Business Functions, which in turn are implemented by many Business Activities. Business activities are then organized into Business Processes. A Business Function is defined as a business behavior element that groups behavior based on a chosen set of criteria (typically required business resources and/or competences). A Business capability contains one or more business functions. Similarly, a Business Process is defined as a business behavior element that groups behavior based on an ordering of activities. It is intended to produce a defined set of products or business services. Finally, a Business Activity represents the work performed within a business process. (Business activities may either be atomic or compound, but this distinction does not affect the topic at hand.)
Business capabilities, functions, processes and activities are performed by one or many roles (i.e. an individual or team) in an organisation.
Unlike the construction industry, software development methodologies are still greatly in a state of flux. A number of them have emerged over the past 20 years, starting with the waterfall approach, and leading to agile methods like SCRUM, and its relatives such as Extreme Programming (XP) and the Rational Unified Process (RUP).
Domain-Driven Design (DDD) is the latest methodology available to software professionals for designing a piece of software that matches the mental model of a problem domain. In other words, Domain Driven Design advocates modeling based on the practical use cases of the actual business. In its simplest form, DDD consists of decomposing a business domain into smaller functional chunks, possibly at either the business function or business process level, so that the complexity of both a business and problem domain can be better apprehended and resolved through technology. To this effect, figure 2 illustrates how the elements of the earlier business architecture meta-model collaborate to form two business domains.
Because of the many documented implementation failures of Service Oriented architecture (SOA), or simply due to its natural evolution, a new architectural style has emerged in the past few years. While rooted in SOA, it integrates an important object-oriented design principle. This new architectural style has the Single Responsibility Principle (SRP) at its heart, and proposes to build software by composing multiple stand-alone service applications, while developing them separately from the start with the help of bounded contexts; a DDD construct that represents a logical subset of one or many business capabilities that provide business value from the perspective of a given role. The boundaries of that logical subset consequently define a business context in which the role is constrained to operate, as shown in Figure 2. The name of this architectural style is microservices.
Figure 2: Domain-Driven Design and Microservices – Conceptual View
James Lewis and Martin Fowler of ThoughtWorks define microservices as follows: “In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies”. This definition of microservices is important because it identifies some key tenets that are subsequently used to justify the position taken in this article. Figure 3 attempts to visually overlay these fundamental tenets over the anatomy of a microservice.
Figure 3: Anatomy of a Microservice
To illustrate Domain-Driven Design and microservices-based architecture, let’s look at an example centered on Transaction Cost Analysis (TCA). Figure 4 depicts a typical trading timeline, which provides an indication where this analysis usually takes place.
An asset management firm may have the best investment models, but they are worthless until financial instruments are actually traded. The total performance of any financial portfolio is influenced by the selection and allocation of the underlying asset classes, the quality of its implementation, and other critical dimensions. Indeed, the costs associated with implementing a portfolio, through the purchase and sale of instruments, usually reduces a portfolio’s return.
Figure 4: Trading Timeline
Portfolio implementation costs can turn high-quality investments into moderately profitable ones, or low-quality investments into unprofitable ones. Investment managers must therefore manage transaction costs proactively, because lower transaction costs mean higher portfolio returns. By providing greater transparency into investment strategies and trading performance, TCA helps investment managers to lower their trading costs and determine the effectiveness of their portfolio transactions.
So using our earlier business architecture meta-model, TCA in its simplest form could be modeled as a business capability that is reified by three business functions, as depicted in Figure 5.
The Transaction Cost Measurement business function is usually influenced by both explicit and implicit costs, as summarized in the table below.
Figure5: Transaction Cost Analysis – Simplified Business Domain.Tabel 1
The introduction of explicit and implicit costs suggests that the Transaction Cost Measurement business function can be further decomposed into, at least, two sub domains: Explicit Transaction Cost Measurement and Implicit Transaction Cost Measurement. For simplicity’s sake we will omit the DDD functional decomposition for the other business functions.
Figure 6 proposes a basic context map for both the Explicit Transaction Cost Measurement and Implicit Transaction Cost Measurement business functions. In his seminal work on DDD, Eric Evans states that a Context Map is designated as the primary tool used to make context boundaries explicit, as it shows microservices in the perspective of bounded contexts and business domains. In the TCA example, five possible microservices have been identified for each of the sub business functions.
Figure 6: Simplified Context Map for Transaction Cost Analysis
Microservices-based architecture offers a number advantages. The modularity introduced by this paradigm offers ease and speed of development to match the pace of the business. Updating a subset of functionality is easy since it is localized in one or perhaps a few pieces of software. Also, the modular nature of microservices inherently enhances security and fault isolation. Indeed, if a given set of code is corrupted or compromised, it is adequately isolated from the remaining services, hence preventing the entire application from becoming unavailable.
However, microservices also exhibit many drawbacks, orchestration being a prime example. Indeed, the coordination of a large number of microservices is likely to be challenging, especially when each microservice is responsible for its own interactions. Data storage is another aspect that is likely to create difficulties. Because each microservice is responsible for its own persistence, data redundancy in an application that is composed of many microservices is likely, and can affect the way Master Data Management is traditionally performed. Reporting is another area that is affected by microservices. Also, the compartmentalization of microservices can make integration testing a difficult proposition. These benefits and drawbacks should not be taken to be exhaustive; they are merely provided to complete our introduction to microservices.
The remainder of this article introduces two specific subject domains exploring the possible architectural fitness and unsuitability of this new paradigm. Consequently, the adequacy of a microservices-based architecture is first examined in the context of Master Data Management. Its applicability is subsequently challenged in the context of compute-intensive tasks, such as total loss distribution calculations.
Master Data Management is a vast domain, and a brief definition will help level-set. “Master Data Management is a framework of processes and technologies aimed at creating and maintaining an authoritative, reliable, sustainable, accurate and secure data environment that represents a single and holistic version of the truth, for master data and its relationships…”. Master data is defined as “…those entities, relationships, and attributes that are critical for an enterprise and foundational to key business processes and application systems.” Customer, State, Zip Code, Currency Symbol or Security Ticker are a few examples of such entities.
Functionally, MDM requires the active management of master entities by cleansing, enriching and discarding redundant data, as captured in the use case diagram below.
Figure 7: MDM Use Case Diagram - Simplified View
The technical implementation of MDM has continuously progressed over the years to better address the enterprise nature of this domain. This evolution was characterized by the incorporation of new architecture paradigms in order to address evolvability, maintainability, scalability and performance of MDM technology solutions.
Figure 8 visually retraces the architecture evolution of MDM Data Hubs, starting from client-server applications to a possible microservices-based architecture. The next few sections will review how both the monolithic and traditional SOA architectures were leveraged to implement Master Data Hub solutions. A microservices-based architecture is subsequently presented as a possible architectural alternative.
Figure 8: MDM Data Hub - Architectural Evolution
Before the advent of distributed architecture and Service-Oriented Architecture in particular, Master Data Management was usually reified as part of a monolithic application. MDM functionality and its associated logic were conjoined with the functionality of the application that addressed a specific business or problem domain. Similarly, the persistence of MDM data was also combined with the transactional domain of the system, as illustrated in diagram 9. For instance, a client-server trading application, written in either PowerBuilder 6 or Visual Basic 6, would typically store its MDM-related entities such as Customer, State Code or Security, in the same underlying database used to record individual transactions.
Figure 9: Monolithic / Client-Server MDM Data Hub Architecture
While the physical proximity of MDM and transactional data allows for little to no latency, from an architecture standpoint however, this coupling of both MDM and business specific functionality introduced architectural rigidity, preventing each functional area to evolve independently. In addition, at the enterprise level, this architecture paradigm often generated functional and data duplications, as shown in Figure 10.
Figure 10
The adoption of distributed architectures, and SOA in particular, allowed the functional and persistence decoupling of MDM-related and domain-specific implementations (Separation of Concern principle), as illustrated in Figure 11. Irrespective of the MDM architecture style (Repository, Registry or other), this separation offered increased architectural flexibility, allowing each domain solution to grow independently to suit their respective functional needs. However, up until the introduction of NoSQL databases, the persistence of MDM data remained in relational repositories, since no appropriate alternative was available.
Figure 11: Traditional SOA MDM Data Hub Architecture – Registry Style
From a data architecture perspective, many MDM Data Hub implementations leveraged a Master-Slave construct or one of its variations, such as sharding. In the case of the Master-Slave approach, many servers handle read requests, but only one (or a few in the case of sharding) server can actually perform write operations so as to maintain master data in a consistent state, as depicted in Figure 12.
Figure 12: Master-Slave Architecture
This architecture often introduced a single point of failure, possibly constraining the ability to achieve high availability and seamless horizontal scaling. Therefore, data architecture is an important aspect to consider when implementing an MDM Data Hub solution. It is especially true since the emergence of NoSQL databases, as they offer credible alternatives to store Master data, especially when coupled with microservices.
Traditional SOA already enabled both the functional and persistence segregations between MDM and business-centric OLTP applications, thus facilitating greater reuse of Master data across the enterprise. Microservices build on top of these benefits by providing better support for distributed architectures, especially when multi-zones are involved (a topic for a different article). Indeed, their focused functional boundaries and deployment model allow microservices to be deployed near their consumers, possibly enhancing availability, scalability and throughput, as depicted in Figure 13. Also, the fact that each microservice is responsible for its own data implies that the structure and storage of Master data can be tailored to better suit the needs of consuming services or systems. For instance, a State Code microservice could be supported by a key-value repository, as opposed to a table in an RDBMS. Multiple instances of this microservice could be deployed in different geographic zones to complement either a client onboarding application running on the East Coast, or a Billing service running on the West Coast.
Figure 13: Microservices-based MDM Data Hub Architecture
However, it is important to assess this new architectural paradigm against a few possible drawbacks. For instance, the rationalization or entity resolution, cleansing, and enrichment functionality may be duplicated in each Master data-related microservice. Also, the uniqueness of Master entities across an organization may not be guaranteed, as the essence of MDM appears to be shifting away from the enterprise and moving closer to bounded contexts. As a result, we may even need to reconsider the use of conformed dimensions in dimensional design.
Next, the document explores whether microservices can be leveraged to solve business domains that require intensive computation. To this end, we will consider the calculation of the total loss distribution for a portfolio of retail credit cards with 50 million records as an example. Figure 14 visually depicts, at a high level, the steps to generate a total loss distribution based on individual loss events.
Figure 14: Total Loss Distribution in a Nutshell
The retail credit market is unique from an annual aggregate loss perspective because, unlike large wholesale loans, it cannot be analyzed by simply downsizing the models. According to one analysis “The retail credit market supplies funds to small, typically unrated borrowers. The relatively small size of each loan implies that the absolute size of the credit risk on any individual loan is minimal. Losses on any single retail loan will not cause a bank to become insolvent. Thus, the cost per loan of determining the credit risk of retail loans is often greater than the benefit in terms of loss avoidance, and ascertaining the credit risk on an individual retail loan basis may not be worthwhile.”
Consequently, developing a technology solution to assess the credit risk of a few accounts may not make economic sense, and any technology solution that addresses the credit risk associated with unsecured loans must operate on a large number of accounts in order to be economically sensible. And, the volume of data associated with this business domain is one of the key aspects that may render a microservices-based architecture unfit to solve compute-intensive tasks. This article specifically offers three reasons as to why a microservices-based architecture may present a challenge to generate a total loss distribution for a portfolio of unsecured credit cards. They are:
- Heterogeneity of data & data storage
- Computation capacity/strength
- Unsuitability of the HTTP semantics
Heterogeneity of data & data storage
Conceptually, microservices are encouraged to own their data and storage based on the bounded context they support. The format, content and storage of the data associated with each microservice can be adapted to best serve the functionality of each business function, as depicted in the figure below. As a result, a variety of data repositories (RDBM & NoSQL) and schemas can be found in a microservices-based architecture.
Figure 15: Microservices Architecture – Conceptual View
However, both Expected Loss and Value-at-Risk calculations rely on time series data. It implies that the source data must not only be complete, but also extremely consistent and standardized, leaving little room for any heterogeneity of data schemas. Consequently, the conceptual architecture depicted in Figure 15 would have to evolve as shown in Figure 16 to accommodate the time series requirement. By doing so, two instances of the repository would likely become redundant, leaving the microservices to share the same data storage, a violation of one of the core tenets of microservices.
Figure 16
From a functional perspective, and based on Figure 16, the diagram below describes a possible context map for the generation of a total loss distribution (annual aggregate loss). Conceptually, breaking down the steps into microservices appears to be an acceptable approach.
Figure 17: Possible Context Map for Total Loss Distribution
However, this is where the data impedance may again go against one of the core tenets of microservices. The calculation of the total loss distribution leverages individual loss events as its starting point. As established previously, all loss event records must comply with the same schema.
The first step consists of creating a risk matrix for all individual loss events. The result of this first step, though intermediary, is likely to be stored. ACID requirements are purposefully excluded to simplify the document. To respect one of the microservices tenets, the microservice is responsible for managing its own data and underlying storage, as shown in Figure 18.
Figure 18
The second step involves the generation of a loss distribution. The microservice for this bounded context will need the results of the first microservice to perform its responsibility, but the Loss Distribution Microservice should not directly access the data in the Risk Matrix Loss Data context or a tight coupling would be introduced, invalidating another tenet of a microservices-based architecture, as illustrated in Figure 19.
Figure 19
Consequently, there are two options for the Loss Distribution Microservice to obtain the data from the Risk Matrix Loss Data context. The first one involves a READ call to the Risk Matrix Loss Data Microservice, as shown in Figure 20. However, this option is unrealistic, as millions of records will need to be moved.
Figure 20
The second option would be to replicate the data from the Risk Matrix Loss Data context to the Loss Distribution context. This is technically more acceptable than the previous choice, however, the need for data motion will impact the performance of the whole process, especially when millions of records need to be moved.
Both the VaR Calculation and Total Loss Distribution Microservices will experience similar issues as the ones just described, so I will omit them for the sake of brevity.
The data motion aspect may be resolved by introducing a data cache, as illustrated in Figure 21. This design will in essence have the microservices share the same data store, albeit a virtual one, circumventing again a major tenet of microservices-based architecture. Also, as a side note, data stores could be added within each bounded context to persist intermediary results, and allow calculation re-runs in any of the three steps.
Figure 21
The architecture in Figure 21 leads to the second reason why microservices may be unfit for computation intensive calculations, depending on the infrastructure architecture selected to support them.
Computation capacity/strength
Figure 22 represents a typical infrastructure settings to host microservices. The first option shows Microservice #1 deployed on a 3 node-cluster, where each node contains 2 CPUs (load balancer has been purposefully omitted). The second configuration shows Microservice # 2 deployed in a virtual cluster composed of 2 physical servers, with also 2 CPUs for each machine.
Figure 22: Typical Deployment for Microservice
These infrastructure settings are usually proper when dealing with OLTP systems (fault-tolerance, load balancing, scalability), but may not be powerful enough to accommodate computation intensive tasks, and in particular when compared to task farms and grid computing, as illustrated in Figure 23.
Figure 23: Grid Architecture & Task Farm Parallelism
Unsuitability of HTTP semantics
Another dimension that may also render RESTful-based microservices challenging to use for compute-intensive tasks, is the semantics associated with the HTTP protocol. From a functional perspective, HTTP requests map closely to transactional actions associated with the processing of one or a limited numbers of records, maybe a few hundreds, as illustrated in the table below.
Software design techniques would suggest exposing a public interface that reflects the actual functionality of the underlying program, so that consumers can use it without ambiguity. Consequently, in the context of the total loss distribution calculation, the microservice that supports the Loss Distribution context should expose one or more methods that convey the generation a loss distribution, as opposed to CRUD operations, as illustrated in the diagram below. So, semantically speaking, the HTTP protocol appears to lack the necessary vocabulary to deal with operations that are not transactional in nature.
Figure 24
To further illustrate the point, let’s assume that the microservice associated with the Risk Matrix for Loss Data context can generate a matrix of thousands, if not millions, of records. As stated before, the next functional step involves the creation of a loss distribution. However, there is a functional mismatch between the operations offered by the microservice that support the Loss Distribution context, and the task that needs to be performed on the risk matrix. Indeed, based on their name, none of the HTTP methods exposed by the second service should contain code that generate a loss distribution. Figure 25 illustrates the public interface that should be exposed by the Loss Distribution microservice.
Figure 25
However, changing the public interface of the second microservice to better reflect its functional responsibility does not fix the fundamental challenge of passing millions of records from one microservice to the next. From a technology perspective, a proper solution would seek to minimize the data motion, and so the Hadoop ecosystem, along with Apache Cassandra, may be pertinent technologies to explore.
To conclude, microservices-based architectures may work best when both the data and functionality of each microservice can be well delineated and the interdependencies kept to minimum, or at least controlled. OLTP or transactional systems could be a good fit for microservices-based architecture, as long as ACID characteristics are not strictly required (CAP theorem & eventually consistent). However, the benefits associated with this architecture paradigm may be difficult to reap when solving problem domains linked to the analytics space.
About the Author
Philippe Assouline is a seasoned expert in enterprise architecture, specializing in engineering economic performance, operational efficiency, and enterprise competitiveness by conjoining risk management, cybersecurity and enterprise architecture disciplines.