BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Designing Microservice Architectures the Right Way: Michael Bryzek's Lessons Learned at QCon NY

Designing Microservice Architectures the Right Way: Michael Bryzek's Lessons Learned at QCon NY

Leia em Português

This item in japanese

At QCon New York 2018, Michael Bryzek discussed how to design microservice architectures "the right way". Key takeaways included: engineers should design schema first for all APIs and events, as this allows the automated code generation of boilerplate code; event streams should be subscribed to in preference to directly calling APIs; investment should be made in automation, such as deployment and dependency management; and the engineering teams must focus on writing simple and effective tests, as this drives quality, streamlines maintenance, and enables continuous delivery.

Bryzek, Co-founder, Chairman and CTO at Flow.io (and previously co-founder of Gilt), began the talk by sharing a personal anecdote about how a request to change a URL within a system from the pattern of 'foo.com/latest/bar.js' to 'foo.com/1.5.3/bar.js' received the response that it would "take weeks to complete", and that the team did not have the resources to do this. Puzzled as to why a simple URL format change would require this amount of work, he found out that the URL was implemented within a library, and this would require 100s of dependent services to be updated, some of which had not been touched in years, and would require additional dependency updates in order to build and re-deploy.

The anecdote was a clear example of a "not so great architecture", where near-term velocity had been traded against future paralysis. In contrast, a great architecture should actively help to scale development teams, delivering quality and enabling high performance and low cost, while supporting future features naturally. The microservice architecture is a popular modern style of implementing systems, but it needs to be designed the correct way; engineers should strive not to design "spaghetti" systems, and instead aim for a "layered" approach.

Bryzek - Great Architecture

Next, Bryzek presented a series of common misconceptions with microservices. The first misconception was "microservices enable teams to choose the best programming languages and frameworks for their tasks". The reality is that this can be expensive, and team size and investment are critical inputs into the decision to support another language within a technology stack.

If we look at Google as a generally great engineering company, they have about 20,000-30,000 engineers, and at last count 8 programming languages. So I like to say [adopt] one programming language for every 4000 engineers [within your organisation].

The second misconception discussed was "code generation is evil". However, the reality is that "creating a defined schema that is 100% trusted" is important for resources and events, from which code generation can be very useful for scaling development efforts and maintenance. The third misconception, "the event log must be the source of truth", was compared against the reality that events are a critical part of an interface, but it is "okay for services to be the system of record for their resources". The final misconception, "developers can maintain no more than three services each", was countered by a discussion that this is in reality the wrong metric. When done correctly Bryzek stated that this is where "automation shines"; developers at Flow.io maintain approximately five services each, and weekly maintenance takes less than 5% of their time.

Next, a tour of the Flow.io architecture was presented. This included a look at the resource-oriented API definition schema, written in JSON, which allows everything from entities and corresponding properties to be defined, alongside appropriate metadata. The schema definition files are stored within Git DVCS, and continuous integration is practiced. A series of tests enforce consistency across the entire set of APIs, and effectively act as an advanced linter. The goal with these tests is that it should result in engineers believing that "one person wrote the entire API".

The engineering team use the open source apibuilder.io tooling in combination with their tests to help prevent errors and verify potentially breaking changes during the API design phase. The apibuilder CLI utilities allow the generation and update of code associated with resource APIs and routes (primarily written with the Play 2 framework at Flow.io). An API client's code can also be auto-generated, which includes the generation of mock clients that enable high-fidelity and fast testing. Bryzek stated that within the Flow.io architecture the system of record is the API specification; code generation ensures that the team "actually adheres to the spec".

Moving on to discuss database architecture, it was suggested that each microservice application owns its database. No other service is allowed to connect to this database directly; other services should use only the service interface, which consists of the provided API and an event stream. The Flow.io engineering team has invested heavily in creating a single CLI "dev" tool that is used for managing all infrastructure and common development tasks. A database can be created (using AWS RDS in the example shown) by a single command. All storage requirements are defined using a JSON schema, which although independent from the associated API schema, uses the same toolchain. Database DDL operations and configuration, such as table creation are auto-generated, as are the associated application service client Data Access Object (DAO) code. This normalises access to the database, and ensures proper indexes exist from the start.

The next section of the presentation focused on deploying code, the trigger for which is the creation of an associated Git tag. These tags are created automatically by a change on master (e.g. a pull request merge), and are "100% automated, and 100% reliable". Bryzek demonstrated Flow.io's continuous delivery management system, "delta" (the code is available on GitHub), and described how important he believes this practice is.

Continuous Delivery is a prerequisite to managing microservice architectures.

The definition of infrastructure required for each microservice is kept simple, and a single YAML file is created that defines metadata such as the compute instance type, ports open, and version of the OS configuration. All services within the system expose a standard 'healthcheck' endpoint, which enables all of the deployment, observability and alerting tooling to determine health.

In the next section of the talk, Bryzek discussed the importance of events and event streaming. Many third-parties ask for access to the Flow.io API, to which he responds, "we have an amazing API, but please subscribe to our event streams instead". The key principles for an event interface include: the provision of a first class schema for all events; producers guarantee at least once delivery; and consumers implement idempotency. Within the Flow.io architecture the end-to-end single event latency is ~500ms, and the implementation is based on PostgreSQL, which is scaled to ~1B events per day per service.

Bryzek - principles of an event interface

The approach to events means that producers create a journal of all operations on their corresponding database table, recording inserts, updates, deletes etc. Upon creation of an event a corresponding journal record is queued to be published. This happens in real time, is asynchronous, and a single event per journal record is published. Replay of events from a service is achieved simply by requeueing journals records. Consumers store new events in a local database that is partitioned for fast removal. Upon the arrival of an event the record is queued to be consumed. Events are processed in micro-batches, with a default period of 250ms between each consumption attempt. Any failures are recorded locally. More information and code examples on this approach can be found in the Gilt Technology db-journalling repository. Interested readers are also recommended to explore the similar concept of change data capture (CDC), and example open source technology in this space includes Debezium.

The final section of the talk focused upon dependency management and "keeping things up to date". The goal should be to regularly and automatically update all services to use the latest version of their dependencies; this is critical for security patches and bug fixes in core libraries. Referring back to his opening anecdote, Bryzek stated that "this should take hours, not weeks or months", and the same process should be used for internally developed libraries as external open source. The Flow.io engineering team upgrade all services every week to the latest dependencies, and the team use their "flowcommerce/dependency" tooling (released as open source) to track dependencies. The tooling initiates automated upgrading within a service's code base, along with the creation of an associated pull request. Each service is deployed as soon as the build passes.

Summarising the talk, Bryzek stated that engineers should design schema first for all APIs and events, and events should be consumed in preference to directly calling APIs. Engineers should invest in appropriate and effective automation, focusing on tasks such as deployment, code generation and dependency management. Teams should also be encouraged and enabled to write "amazing and simple" tests, as this drives quality, streamlines maintenance, and enables continuous delivery.

The slides for Michael Bryzek's QCon New Talk, "Designing Microservice Architectures the Right Way", can be found on SlideShare. The video of the presentation, alongside all of the talks from QCon New York, will be released on InfoQ.com over the coming months.

Rate this Article

Adoption
Style

BT