Wesley Reisz talks to Daniel Bryant on moving from monoliths to micro-services, covering bounded contexts, when to break up micro-services, event storming, practices like observability and tracing, and more.
Key Takeaways
- Migrating a monolith to micro-services is best done by breaking off a valuable but not critical part first.
- Designing a greenfield application as micro-services requires a strong understanding of the domain.
- When a request enters the system, it needs to be tagged with a correlation id that flows down to all fan-out service requests.
- Observability and metrics are essential parts to include when moving micro-services to production.
- A service mesh allows you to scale services and permit binary transports without losing observability.
Subscribe on:
You were chief scientist and CTO at OpenCredo. What’s your plans for the future?
- 01m:50s I’m doing a consulting CTO gig with *SpectoLabs*, a company based in the UK, focussed around micro-service testing, products and training, along with some freelance consulting.
- 02m:00s I also write for InfoQ and some other websites as well, and I’m looking at some of the newer technologies like IoT and catching up with the latest JavaScript trends.
You’ve been working with micro-services for over five years. What were the first projects like?
- 02m:50s I used to be CTO for Instant Access Technologies, a startup based out of Old Street in London.
- 03m:00s We didn’t want to do classic SOA - we knew we had a well defined set of contexts for the way the application was going to process data and deliver value to the customer - it was a price comparison website.
- 03m:25s We didn’t want to go heavy SOA - a few of us had been on projects that used WSDL, UDDI, that kind of pain - so we didn’t want to go that route.
- 03m:30s We didn’t want to go monolith either - we knew that there were going to be shifting scaling requirements, there would be quite a few developers around the codebase.
- 05m:45s A few of us got together and architected the design where we would have loosely coupled services, sending JSON across the wire and using HTTP status codes, and it grew from there.
What do you see as the differences between the micro-services of today and the SOA architectures of the past?
- 04m:30s Companies spent months or years trying to create a universal model - but it’s very hard to model the real world one way. Every model is wrong, but some are useful.
- 04m:50s Trying to fit the business into one model is very hard, and even though the classic SOA [Service Oriented Architecture] led towards micro-services, it was still very much monolithic thinking in terms of the model names.
- 05m:05s Micro-services are a move away from that; Adrian Cockroft’s loosely coupled services with bounded contexts.
- 05m:20s There’s a lot of good things that can be taken from SOA, but we’re taking concepts from Domain Driven Design, like how we break up our domain, and applying that to services.
- 05m:30s The key difference is the loose coupling between contexts and between services; not relying on WSDLs and heavy-weight contracts.
- 05m:50s It keeps coming back to Postel’s law [RFC 1122’s “1.2.2 Robustness Principle”]: be conservative in what you send and liberal in what you accept.
- 05m:55s Classic WSDL approaches were almost a legal contract; there’s benefits in that, but micro-services lean more towards Postel’s approach.
What does loose coupling mean?
- 06m:30s For me, loose coupling is about being able to interchange components within the architecture.
- 06m:45s With a monolith, it’s very easy to end up with spaghetti code. Everything gets tide together, because it’s possible to cheat across boundaries.
- 07m:00s So even in Java, where we have interfaces, you can often sneak around the interface and access something you shouldn’t be able to do.
- 01m:50s From there, it becomes a broken windows thing, as often other developers will start cheating in the same way. After a few years, it becomes a big ball of mud or spaghetti code.
- 07m:25s In my mind, loose coupling is about breaking things up, defining clear interfaces, and enforcing those boundaries.
What is a bounded context?
- 07m:45s The classic definition comes from the Domain Driven Design world: Eric Evans, Vaughn Vernon - I’d recommend listeners read up on their Domain Driven Design books.
- 08m:00s In terms of scoping them, it’s really tricky. It’s one of the hardest parts - it’s the same whether you’re doing micro-services or monoliths.
- 08m:10s Scoping the boundaries of domain components in your application - for example, there’s very often a User domain, a Checkout domain and a Stock domain - and how you define what goes into each of those is genuinely really tricky.
- 08m:30s The domain driven design books were written before the advent of micro-services, but they give a lot of hints about how to scope those bounded contexts - things like looking for a ubiquitous language, which is a way of asking whether you’re talking the same language within the context.
- 08m:50s When the business uses subtly different words between different parts of the codebase - a User here, an Account there - but you can split them because they use different words in the context.
- 09m:10s Perhaps one is more focussed with security, while the other is focussed on the representation of the user within the application.
- 09m:20s Getting the Goldilocks model of the just-right service is something that I struggled with for many years - it’s challenging, but don’t be afraid of experimenting.
- 09m:40s Some up front design can be helpful to prototype, but don’t do full up front design - iterate as you learn more about the domain.
How do you recommend your clients define the domain for each of the areas?
- 10m:00s It definitely depends on whether you’re building a new application or whether you’re working with a legacy (also known as money-making!) application.
- 10m:15s I don’t recommend micro-services if you’re a completely green field application, where you don’t know what your domain is yet.
- 10m:25s If you’ve got an existing application (or a suite of applications) - how do you break those things up?
- First off, talk to the business, depending on how the business is organised. You might be able to see clear parts, like the billing team, the accounts team, the stocks team. So that can be a good way of splitting up the domains.
- 10m:50s Sometimes you don’t see that at all - there’s just layers like sales and marketing - and there, you have to dive into the code, look for certain nouns, verbs and so on. That will give hints of where to start looking.
- 11m:20s You can end up with some services - like a user service - which are more cross-cutting concerns, since the user touches every part of the application.
- 11m:30s Word clouds can show up where these occur in the codebase - Adam Tornhill talks about a lot of these things - did a great presentation at QCon London on tools to analyse codebases.
- 11m:55s They can be a good place to start doing investigative work.
At what point does the architecture dictate the design, or do you change the architecture to dictate the design?
- 12m:15s The organisation structure influences the architectural design and vice versa [Conway’s law], but James Lewis talked about an inverse Conway manoeuvre, where you structure the organisation to force the code to be a certain way.
- 12m:25s There’s different ways to do this - so which should you do? Often it depends on whether it’s a technology-led business or a business-led business.
- 12m:35s If they’re a technology company, it’s often easier to change the organisational structure - and if not, the code will evolve and people will gravitate to certain parts of the code base, which gives the cross-functional teams.
- 13m:00s With a classic business-focussed company, it can be quite difficult to do that. They’ll look at IT as a cost centre, and assume they are just providing a service - so won’t want to reorganise because of that.
- 13m:15s In that case, it’s often better to change the technology first, demonstrate some provable wins, so that you can split the technology out - and try at the same time to prove the business case for cross-functional teams, so that you can develop features to customers quicker.
Can you expand on your previous comment that you don’t recommend micro-services for greenfield applications?
- 13m:50s This is a controversial topic. Martin Fowler and Stefan Stiklov had a great discussion - Microservice first: and Don’t start with a Monolith
- 14m:50s I’ve seen a few cases in the wild where customers tried to start with micro-services but didn’t know their domain.
- 14m:20s We are also seeing successful companies - such as Monzo, who gave a talk at QCon London on building a bank with Go micro-services.
- 14m:35s Another bank, Starling, had a hackathon a few weeks ago and they were using micro-services, in Java.
- 14m:45s They knew their domain - finance is quite a constrained domain to some degree - but they went all in to micro-services.
- 14m:50s So it is possible for people to write micro-service first systems - but they all knew their domain. I’ve seen plenty of examples where the micro-services first didn’t work when they didn’t.
- 15m:05s Understanding the domains is a key point - so a new business that is entering a well understood market has a better chance of success than a startup carving its own way and learning the domain on the fly.
- 05m:25s It’s a classic lean startup - create an MVP [Minimum Viable Product] to find out if the demand is there, is anyone going to pay money for this - and as you’re learning, the domain forms itself.
- 15m:45s You either go very granular with the services - which means there’s a lot of them, and is hard to operate - or you are constantly rewriting boundaries, in which case you’re better off with a monolith because it’s easier to rewrite boundaries in a monolith than in micro-services.
- 16m:00s So we’re seeing both approaches in the wild - your mileage may vary.
What else might be a smell that the monolith might need to be broken into micro-services?
- 16m:30s It’s generally one of two things: either the technology doesn’t scale, so you have to load-balance across lots of servers, maybe you have to use lots of servers because you’ve got different parts of your architecture which should be scaling independently - or the second one is you can’t scale your dev team.
- 17m:00s If you imagine a monolithic rock with developers chipping away at it, you can only get so many people around the rock before you get developers hitting each other or stepping on each others’ toes.
How do you start to migrate off a monolith onto micro-services?
- 18m:05s It is difficult - every business is unique, such as the resources that are available or the tolerance or appetite for change.
- 18m:15s I recommend starting small and working your way up. The classic method is to look for something that is small but adds value.
- 18m:35s The classic is a user registration page or newsletter sign-up page; something that is useful and adds value to the business but isn’t mission critical.
- 18m:45s Break it out - often you have to insert seams; Michael Feathers talks about these in his book “Working Effectively with Legacy Code” - so some kind of API between the existing codebase and the new service that you’re putting in place.
- 19m:10s The most important thing is getting the whole delivery process from soup to nuts.
- 19m:20s Once you’ve got the service and identified the seam, you need to build a supporting continuous delivery pipeline. Being able to rapidly deploy changes is an important part of that.
- 19m:50s Without the ability to rapidly get things out into production and experiment, it’s going to be hard.
- 20m:00s Engage with QA and InfoSec people to get it running into production and alongside the existing application.
- 20m:15s You’ll then be in a position to present findings from the experience, about not only the coding changes but also the deployment operation, the developer experience, and that’s where the real value lies in traditional enterprises.
What are the next steps when you scale services up?
- 21m:15s Whatever you’re doing, you are deploying onto some sort of platform - even if it’s a bare metal server.
- 21m:30s When you’re deploying a few services, it’s no big deal doing it manually - but as you scale up more than that, it’s operationally quite challenging.
- 21m:45s How you orchestrate and schedule the services becomes a learning experience - you begin to appreciate platform features like service discovery, routing and fault tolerance.
- 22m:00s Micro-services are effectively a distributed system, even if they’re all running in the same container.
- 22m:20s There’s an inherent flexibility in that - you can split and scale services independently - but there are operational concerns that need to be addressed, like monitoring.
- 22m:40s So once you pass that initial three services, you have to make sure the platform supports the future services you’re going to work on.
- 22m:50s By the time you hit your third service, you have a feel for the language, the way the services are going to work, the framework you need.
- 23m:05s For example, at one company I worked with who used Java - by the third service, they had created a Maven archetype to facilitate this.
- 23m:15s There’s a number of other examples out in the wild - Twitter has Finagle, Netflix has Karyon - but once you have frameworks for the micro-services, it’s all about the platform underneath them.
What do you need to consider as you bring them into production?
- 24m:00s Observability is important in production. When you have multiple things in play, the individual things can be working but you need to check both the glue between the services and the overall system health. You need to ensure that value is being delivered to the customer.
- 24m:25s Adding metrics into the services is important - not just technical items like queue depth or latency - but business metrics as well can be a game changer.
How do you identify business metrics?
- 24m:55s There’s a great book by Alistair Croll who talks about Lean Metrics, which was very useful for me.
- 25m:10s If it’s business metrics, it’s always about the value you’re trying to deliver.
- 25m:15s Often it’s the happy-path journeys; can you search for a product, can you look at them, can you add them to the basket and can you checkout?
- 25m:35s If you look at those, the metrics spring out - orders per second, are people dropping off midway through the process?
- 25m:55s Reading around the theory and discussing with the business as well as going through the journeys is critical.
- 26m:10s There’s a lot of work going around at the moment talking about event storming - another technique, like context mapping in the DDD world - that aims to identify at events in the business; not technical events, but actual events, such as orders dispatched.
- 26m:30s Often once you identify those events, the metrics fall out, such as orders per second.
What about tracing?
- 26m:45s Distributed tracing is super important.
- 27m:00s I’ve used Zipkin a lot - the key enabler is correlation identifiers.
- 27m:20s As soon as a request comes in the door - whether it’s an API gateway or some other point of ingress - gets assigned a correlation identifier.
- 27m:35s The identifier needs to be passed in every service call, either as an HTTP header or as a property in a message.
- 27m:55s This allows log entries from distributed services to be aggregated under a same identifier for a specific request.
- 28m:05s Zipkin can sample requests live as they are going through the appliciation, and the correlation identifier is key to making that work.
What’s your view on JSON versus binary protocols for inter-service communication?
- 28m:45s I’ve seen JSON being used in a lot of clients - it’s very interoperable.
- 29m:00s If you’ve got REST services with Python on one side and Java on another, it works well.
- 29m:05s I’ve also seen in classic SOA days where SOAP didn’t work well across platforms.
- 29m:10s So I like the idea of REST and JSON being easily consumable, but I totally agree that we are wasting countless energy cycles serialising and deserialising objects to JSON.
- 29m:30s So it has a financial impact, on paying on the processing and energy costs of the serialisation.
- 29m:35s Binary protocols also have advantages; there’s less cost in serialisation and deserialisation, as well as less data going over the wire.
- 29m:50s But the cool thing, with Thrift and Avro and gRPC is they are an interface definition language - so you can specify the contract.
- 30m:00s You need a very clear contract (or specification) for data that has to go over the wire.
- 30m:05s That can actually add a lot of value in and of itself.
- 30m:10s As much as JSON and REST takes advantage of Postel’s law, where you can send twenty megabytes of data over the wire and only pick out ten kilobytes worth of JSON, you still have to deserialise the whole thing, which can lead to casual programming.
- 30m:35s It’s very flexible - but when you have to create an IDL, you have to think of the types and interaction between services.
- 30m:45s For example, streaming data is very popular in the reactive space at the moment - the ability to keep the channel open, apply back-pressure and so on, can be advantageous.
- 31m:00s JSON and REST on the other hand is very much anchored around the request-response lifecycle.
- 31m:10s I see the value of binary protocols, gRPC in particular, which has been picked up by the Cloud Native Computing Foundation.
- 31m:20s Having gRPC in the stack gives you a lot of advantages over REST and JSON.
How does tracing work with binary protocols?
- 31m:45s Now you’re looking at a binary protocol going over the wire, instead of text. So Zipkin doesn’t directly work in that example. You need to have some sort of proxy or service mesh.
- 31m:50s I’ve been using linkerd, which is a JVM based service mesh, which is also now a sponsored project in the Cloud Native Computing Foundation. IBM wrote one, and there’s Lyft’s Envoy.
- 32m:10s A service mesh addresses a lot of these issues like inter-service communication - and observability is a key part of that.
- 32m:20s Linkerd has support for Zipkin - so all of your gRPC communication is encrypted with TLS but the mesh lets you pull out the components to send to Zipkin.
What about using the latest and greatest?
- 32m:00s I see people wanting to move to micro-services; at the same time, there’s a lot of cool technologies like Docker and Rkt, thinks like Kubernetes, Mesos, Amazon ECS - so it’s tempting when you’re adopting one thing (like micro-services) to adopt a lot of other things at the same time.
- 33m:30s In my mind they are separate things - you can implement architectures to use many different technologies, and you can use different technologies to implement the same architecture.
- 33m:45s What I was getting at with this - technologists love to play with technology and experiment with.
- 34m:15s One thing I’ve seen people getting tempted at is getting lost in technology details, like getting lost in Java frameworks.
- 34m:30s I’ve now pulled back and focus on core principles and core values that can be delivered by technologies.
- 34m:40s When you make a choice to bring something into the stack, be very careful about the operational overhead, skills gap, and evaluate technology correctly - not only at a technical level, but at the CTO level, it’s more important to evaluate the architecture.
- 35m:10s As the CTO you’ve got a lot of business responsibility, and you may not be in a place to comment on the technology or coding hands on - but you’ve got to have something in your tool-belt that says: this is how I know it’s not just fancy technology coming in, it will deliver value as well.
- 35m:40s There’s a presentation on YouTube, which I put together quickly - but I realised that it has got a lot of value, because as people move up the technical ladder to CTO you need to keep dialled in to making the correct choices at the appropriate level of the technologies.