Transcript
Cummins: I'm Holly Cummins. I'm an IBMer. I currently sit in corporate strategy, thinking strategic thoughts. I've spent many years in the IBM Garage as a consultant. The pattern as a consultant is that you have an opinion, and then you go out and you meet people, and you go, "Why are you doing that?" Then, "That's interesting. It never occurred to me, someone would do it that way." Sometimes you revise your opinion. Sometimes you just get depressed. It's very educational, so then I try and share what I learn.
Farley: My name is Dave Farley. I'm an independent software consultant. I'm probably best known for writing "Continuous Delivery" book. Actually, I think I've got a claim for inventing microservices in the 1990s. I worked on an innovative project where we built loosely coupled independently deployable things that we called, at the time, cooperative business objects using semantic messaging. We invented our own version of XML. It was a very exciting, fascinating project. I'm quite opinionated about microservices, and as a strategy for building bigger complex systems.
Engstrand: I'm Glenn Engstrand. I'm a software architect. I've been a software architect for over a decade now I suppose. Currently, I'm at Optum Digital, which you may know as Rally Health, which is in the healthcare space. Before that I was in Adobe. Then before that, I was at a dating site called Zoosk. I think it was during that time that I first saw one of Chris's meetups and/or workshops. Chris has been around my neck of the woods presenting and educating for a long time.
Richardson: I've learned from everyone, either today or in the past. It's cool. "Continuous Delivery" is one of my favorite books.
Doing Decomposition beyond Two or Three Services When You Have Core Business Objects
This is in the context of microservice architecture and defining services and their boundaries. It's like, what do you do with these objects that are pervasive throughout the overall domain. In my simplistic view of banking, it's customers and accounts. In eCommerce, it's orders and customers. How do you do decomposition beyond two or three services, when you've got these core business objects?
Cummins: It's one of those things that I ask myself all the time, because I've seen a lot of instances of it going wrong and causing problems. I've seen fewer instances where I'd say, yes, that's exactly the pattern. That's the way to do it. I think the strategies that you can come up with, one is sometimes we think objects are the same, and they're actually different, because they're used in really different ways. Although everybody has an order in one context, what we care about is the customer address. In another context, we care about the stock levels. Sometimes we want to just actually realize that these things aren't the same. Other times, I think we actually have to accept that they are the same and then you just have to write a contract for them in blood, and encase the API in cast iron. We just recognize that this is so common, that it's not going to be very flexible.
Domain-Driven Design
Richardson: You touched on concepts from domain-driven design, bounded context, or the same word having different meanings in different contexts. Like Facebook, the shipping department has a very different view of what a book is from the marketing department, for instance.
Farley: I think the domain-driven design idea is important when thinking about microservices. Because the whole point of a microservices architecture, it seems to me is to try and decouple the pieces. Ultimately, the advantage of microservices is that it decouples development, it reduces developmental coupling so that teams can make progress more independently of one another. Otherwise, it's just a service oriented architecture. It's not microservices. That decoupling is important. One of the things that I like in most definitions of microservices is that people say they should be aligned with a bounded context. That makes sense to me. I was chatting with Eric Evans about this a couple of weeks ago, and he came up with an idea that resonated with me, which is that the messaging layer is a separate bounded context. I think multiple separate bounded contexts. You have the bounds of the service, and then the messaging is something else. The protocol of exchanging information between the services is another abstraction. One of the things that resonates with me, another thing from Eric's book, is that you always translate when you're crossing bounded context. We should be translating the messages as they go across. Then that makes the example that Holly came up with an easier problem to deal with, where we have these ideas that are sometimes the same and sometimes different and sometimes related. I think in a good microservices architecture, you've got three layers of abstraction going on. You've got the services themselves. You've got the context of the protocol of exchange of information. Then floating above that there's this top level thing, which is the correlation of ideas. The book ID that ties with this version of bookness, and this other version of bookness and those kinds of things.
Richardson: I saw that in one of the very recent YouTube videos you put out on your most excellent channel. I thought that was fascinating, because it aligned with an idea I had a while ago, where, within a service, you obviously have a domain model or bounded context, they can never really tell them apart, to be honest. Then there's an API, which supposedly should imply encapsulation. An API is a model as well, because of encapsulation, that's not the same. It's hiding things. Somehow there's a mapping. Then at the application level, or you can say tangibly with an API gateway, that's exposing a model to the outside world. The API gateway is translating that model to respective service API models. Each service API is translating to internals.
Farley: I think it is turtles all the way down. If we're trying to use this as a model to manage the complexity of the systems, then we should be working to try and isolate those systems to some degree. Allow ourselves the freedom to have these little layers of insulation between one part of the system and another.
Service Cluster
Richardson: That actually reminds me, in Selina's presentation, at Airbnb, they have a concept of a cluster of services, and that had an API layer 2. It is multiple layers.
Engstrand: Chris, you originally talk about this in your scale cube. It is very relevant to this discussion about the present and future of microservice architecture, how do you split up? What was a microservice, people keep jamming more endpoints into it, eventually, it becomes a monolith, and it's time to go through another split. Like Holly said, using bounded context is a good way. Like Dave said, stratifying it, orchestration versus data. Backends for frontends, I think I recently blogged about that. There's all kinds of strategies. On the one hand, there's mitigating release anxiety. That's the problem when too much stuff is in one service. On the other hand, Chris, you actually talked about this in an interview with Thomas recently about the Conway's Law costs. If I'd need to call another service that is owned by a different team but I need it changed, there's going to be some resistance to that, because now I have to open a ticket for that team. It's on their roadmap, and I don't know if they'll get to it. Maybe we can just think of something to do on our side. There's both sides of that. Ideally, you should come up with some way of partitioning what you need to do in a way that honors both sides of that equation.
Farley: I think that the idea of developmental decoupling is core to microservices. Otherwise, to me, they don't really add any value. It's one of the really common anti-patterns that I see is that people don't observe that. The majority of microservice implementations that I see in the wild where people claim to be implementing microservices, what they really mean is that they got a bunch of services living in a separate repo. They're not going to release those things independently of one another, because that's too scary. Then they're going to test them all together afterwards. That means you just got a big monolith in multiple repos, and that's the worst of all worlds. That gives you no advantage that I can see.
Cummins: Your IDE isn't going to help you.
Engstrand: There's no doubt about it, if you see two or more services and you're using just traditional semantic versioning not timestamp versioning, and they're on the same version, major, minor number, you're like, ok, you've released it. They're always released in sync, so look what we have here.
Richardson: That's definitely a smell. It's an indication that the boundaries are incorrect.
Farley: I think the idea of just accepting that there's a higher cost to the decoupling. If you want to work with pieces that are decoupled from one another, there's a tax to pay for that. You're going to do some translation. You're going to support multiple versions of the API in parallel, or whatever your strategy is to allow these things to change more independently, but there's a value in them being able to change more independently. You invest in that value by paying the tax of the decoupling.
Richardson: You hope that the value exceeds the cost.
Farley: Yes. Sometimes it doesn't.
Cummins: I like the idea of the decoupling tax, though, because we often talk about the overhead of microservices themselves, but we don't always talk about the overhead just of being decoupled. I remember way back when I first learned EJB. Of course, there's a lot to hate about EJB. One of the things that it was solving was that decoupling as well. I just looked at it, and my head wanted to explode, because I was just like, "I don't understand what problem we're solving. This is so complicated. Why do I have to change? Is everything traced through?" You lost a lot. You, in principle, gained a lot as well. I think there's a similar thing that having things spread across in that way has a cost.
Richardson: It's like if you've got two modules, one invoking the other with a method call. That's really simple. As soon as you package them up and have some form of inter-service communication, it's costly and more complicated. In that case of Airbnb, I think when they started refactoring the Ruby monolith, maybe they had 500 developers all committing to the same single monolithic code base. They had a bunch of issues with that.
Farley: Mind you, just to put that into context, Google have 25,000 developers all committing into the same monolithic code base.
Richardson: I know. They have PHDs working on their build system. I don't want to disrespect them, but I feel like you should be applying your brain power to solving business problems as opposed to your build system.
Farley: I disagree with that. Absolutely, we should be applying our brainpower to solving problems. The way that you do that is by establishing fast feedback. Investing in getting the fast feedback is, I think, nearly always a good investment.
Richardson: I agree.
Utilizing Enterprise Business Capabilities to Understand Bounded Context
There was a question about using enterprise business capabilities as a starting point to understand bounded context/domains. Anyone have any opinions on that?
Farley: It depends what you mean by enterprise business capabilities. Business capabilities, sure. The enterprise thing makes me nervous, because that starts sounding like EJB again. My preferred starting point these days is to do event storming. That gives you that high level abstract picture of the flow of information through the system, and you start to see little clusters that seem to indicate that there might be something interesting going on in this flow of events that indicate bounded context in your system. I think thinking in those sorts of terms, certainly part of my personal strategy for designing these kinds of systems is always I want the conversation between the services to be understandable by somebody that's not technical. I want it to operate at that level of abstraction. At the level of the business problem. That's the level of exchange of information between the services. I think that gives you a good tool for this abstraction. If that's what the questioner means, I think that's a good starting point, although the enterprise bit makes me nervous.
Richardson: To me, that's part of business architecture, which is a whole other school of thought. Where you're modeling your business as a hierarchy of business capabilities, and also as a series of value streams. Each stage of a value stream invokes a business capability that acts upon a business object. It's some other way of looking at a business as opposed to the more domain-driven design influenced way of doing things.
Service Communication via REST, and Complete Decoupling
If services communicate via REST, does that mean that they're not completely decoupled from one another as opposed to using asynchronous communication?
Farley: Yes. By using synchronous communications in any form, they're more coupled.
Engstrand: Even in an asynchronous situation, you still need to know what the format of the schema of the message is. Isn't there coupling to that? Whether I read it off of Kafka or I respond to a thing, if I'm looking for a certain attribute and what it means, there's some coupling there. I'm not sure it's possible to completely decouple. It's loosely coupled, not completely uncoupled.
Farley: I agree with that, certainly. I think that when you're talking about synchronous calls, you're coupled in a different way. You're coupled temporally. You're coupled in time. That's a significant complexity when you're talking about distributed systems in my view. I've done quite a lot of asynchronous systems now. It's a much nicer way of working as a distributed computing model. I don't know whether we're going to get there, but I think that one of potential future directions for microservices is probably the adoption of the actor approach. Little stateful bubbles that just take asynchronous messages. I think that gives you an opportunity to separate accidental and essential complexity, better than any other approach that I've seen. I'm a big fan of asynchronous systems for distributed systems these days.
Cummins: I've got two thoughts. One is about the coupling, because I think you're spot on that there's [inaudible 00:20:19] coupling and temporal coupling. Chris was talking about design coupling in an article recently. Kent Beck wrote an article, which I really liked, and he was talking about the fact that we have this dream of decoupling. If they're interacting in any way they're going to be coupled. It's about managing the coupling. Managing the coupling, I like it because it implies that there's a number of different ways that we may want to manage it. We probably want to reduce it almost certainly. We probably want to think about what our objectives are as well. Being really explicit about it is another thing that we want to do. Understanding the costs and the tradeoffs is another thing that we want to do. There's a spectrum. I'm intrigued by the idea of the actors as well, because that feels like one of those ideas, that's a really good idea. It's come around a lot of times and everybody has gone, "That's a very good idea. Once I understand it really well, I'm going to implement it." Then it fades back out. Because if you look at something like Akka or Scala, they've always stayed niche. I wonder whether it's just too hard for mere mortals, even if it's elegant.
Richardson: Interestingly, people are pretty comfortable with the term event-driven architecture. Then you get into, what's the difference between an event-driven service and an actor style service? They seem close at some level.
Farley: I think there's some interesting moves. I think Holly's right, that it's a bit grungy. You've got to be a bit nerdy to go there at the moment, I think, with the tools and stuff. There's a lot of conversations around it. I think in the future, it's one of the directions that I think that distributed computing will take, because I think it's a simpler model for concurrency. That's one of these deep problems that you can't avoid. If you got concurrent information going on, that's going to make your system hard. Whatever else is happening, it means it's going to be hard. An actor is one way of simplifying that, and allowing us to move more of the smarts out of the actual bit of code that's doing the work. Because the bit of code that's doing the work is single threaded, is just taking an event in, doing something, maybe even something stateful, and then putting an event out. That's lovely. That's a lovely way of programming. I think there are some rooms for it to take on. I wonder whether that's one of those old ideas whose time might come with moves into the cloud and serverless, stateful serverless, and all those kinds of things, potentially.
Cummins: Maybe it just needs the right marketing, and just something that captures that mental model that captures the imagination like microservices have.
Farley: It does. There's going to be a bit of [inaudible 00:23:19]. Yes.
Richardson: Micro active.
Farley: Micro.
Conway's Law in a Company that's Frequently Reorganizing
Richardson: What about Conway's Law in a company that frequently reorganizes?
Engstrand: That's a bigger problem. If the company is constantly reorganizing, in other words, if you don't know who your boss is, on a month to month basis, for example, or if you get moved around a lot to different teams, there's cultural implications to that. You've got bigger problems, you're not even worried about Conway's Law anymore. Constant reorganization does not do well in terms of team performance, frankly. Yes, it happens sometimes. It's unavoidable. If it's just a never-ending churn, you've got bigger cultural problems than how long it's going to take to get some microservice owned by another team to add the feature that you need.
Farley: I'm going to disagree.
Cummins: Conway's Law probably still will hold, won't it? Because you can look through the architecture and you can see the stratifications and you can do the archaeology. You can say, this here was where it was owned by this team. Then you see this disgusting mess here where it got split into two, but it didn't get refactored properly. That was during the transition phase, and now we have the next layer, and it's like the tree rings of eke.
Farley: I think I do disagree slightly. I'll give the consultant's answer, which is, it depends. Depending on the context. I worked in an organization where we morphed a lot, we changed shape dynamically to fit the situation. In that organization, in the context of the work that we were doing, that was a good thing. Certainly, where you have the top-down management consultant driven culture reorgs on a regular basis, that's not usually a good thing. I think that if the team is responding to the dynamic changes in the business and the technology and your understanding of the problem, I don't think that's a bad thing, necessarily. Where Conway's Law drives you for that team is flexible, managed coupling between the pieces, so that you can compose them in different ways and use them in different ways, and work on them in different ways, I think.
Richardson: Maybe implicit in that is perhaps that implies that service ownership might change, or maybe service boundaries might change. To me, you got to do what you got to do, but it feels like somewhat expensive. I suppose the other part is, you're a startup and you're in a domain where you're still trying to figure things out, and you're undergoing this evolution, that maybe you should be using a monolith still, until things settle down.
Farley: That was one of the things that I've talked about in the video that you mentioned on my channel, which was that when you're in that fast discovery phase, when you don't know the answers yet. You want to be riffing fast and you don't want lots of repository boundaries and everything else in the way. You just want to be able to move quickly so you can try different ideas and change things cheaply. Then as the structures and the architecture of the system, your service boundaries and the protocols of exchange between them and so on, start to stabilize, that's the time when you start pulling things out, and they just tend to be more durable. Even then, you never finished with a service, but one might not be in active development for a while because it's doing its job and you might move on, and you have a team of people working on other things. I think that the ownership of those things certainly morphs.
The Actor Model
Engstrand: What I've found is the issue isn't so much that the actor model is hard or not, the issue is, are engineers paying attention to the threading model that they're currently coding in? Sometimes I've seen issues with the actor model. Sometimes I've seen issues with a Netty based system versus a Jetty based system. In a Jetty based system, the thread of the request is committed. In a Netty based system that's not true. You can run into bugs not caught by the compiler, not easily caught in code reviews, especially if they're not paying that much attention, where you realize that you're blocking more than what you think you're blocking, and you run into some problems there. I wouldn't say that the actor model is too hard, per se. Where I'm at now, ours is a predominantly Scala shop. We use Akka a lot. It's not hard per se, but you do have to understand how threading happens in your environment in order to avoid more subtle bugs.
Farley: Not within the actor itself, though. In the infrastructure.
Engstrand: Yes. It's not like actors are hard. If you're going to do a lot of concurrency across threads, then all of a sudden what thread this code is executing in is important. Are you being called from deep inside of an actor, or are you just being called from the router or the controller, whatever, however you frame that. All of a sudden that becomes important.
Farley: I think you've hit on an important point, because we're talking about some stuff that is genuinely world class hard now. When you're talking about concurrent systems of almost any kind, that's our version of quantum mechanics. A distributed system is orders of magnitude more difficult than a local system, because of concurrency. We buy into that world class difficulty, immediately we take a branched universal control system or spawn a thread. As soon as we take that step, we've made our world more complicated. Understanding that and managing that needs to be part of our jobs. At the same time, I think that we should be much more scared of it, much more nervous of it than often we are.
I've had the pleasure of working with a few people who are considered to be world experts in concurrent systems. One of my good friends, Martin Thompson, he's famous for high performance computing, and concurrency, and all that stuff. He's brilliant at it. His advice to everybody is that, don't do concurrency. Unless you're really forced into it, don't, because it's expensive. It's slow. It's inefficient in nearly all of the models. I think, deeply, you've got to worry about those things. You got to worry about coupling and you got to worry about concurrency. Everything else is easy. Those are the things that make our job hard, but also interesting. I'd agree with you. We need to think about that stuff and how to manage it.
Richardson: Maybe I view it in a slightly different way. I feel like there's various models, and the simpler one is request level parallelism using a single ACID database. That's a simple world. Then there's deviations from that. One of them, I say, with a microservice architecture with each service having its own database, then you're giving up on ACID. The acronym base is one more eventual consistency, and that's quite complex, or has the potential to be very complex.
Farley: I think one of the big strengths of the relational database model, which I never liked very much, was that it made the programming simpler, because it constrained you. The infrastructure looked over some of these problems. The systems weren't very scalable. They weren't very efficient as a result, but it was a simpler programming model. As soon as you do NoSQL or the more distributed approaches, you buy into this more complex world, I think.
Richardson: As Holly just demonstrated, the network is not reliable.
Cummins: If I was a microservice, that's what you have to factor in, is your microservice on the other side might go away.
Richardson: Yes.
Farley: Always.
Engstrand: Very true.
Richardson: Fortunately, the circuit breaker had a fallback strategy implemented.
Service Mesh, Legitimate or Anti-pattern
Is service mesh a legitimate thing, or is it just an embodiment of an anti-pattern?
Farley: No, I think I'm probably a believer. The best big system that I ever saw or worked on, we built a financial exchange. We didn't call it a service mesh then, I think it was before that term became currency. We absolutely built a service mesh. We built a bunch of stuff that made it really easy to just create a service, and to give us loads of accidental complexity stuff, like clustering, Pub/Sub messaging, close to guaranteed delivery, failover, all of those kinds of things in the mesh. All we had to do was write a little Java object.
Richardson: That may be a subtly different definition. To me, service mesh is something like Istio. It's basically managing synchronous communication between services.
Cummins: I think that contradiction actually is super interesting. Because I think a lot of people look at Istio and they say, I don't want Istio. I don't want a service mesh, that's much too complicated. I'm just going to write a few little things to allow me to find my services and allow me to deal with some of the security. Then all of a sudden, you still have a service mesh, like the one Dave described. It's just that you've written it yourself, and you have to maintain it yourself, which may or may not be what you wanted to do. We think, this isn't a service mesh, like Chris said, but it is, it's doing stuff.
Engstrand: For me, what you want is resilience, security, discoverability. Whether you embed that or you make it another server round trip is the question of service mesh. I don't think anyone's saying, no resilience, no discoverability, no security. They're just saying, do you want a control plane and a data plane and factor it off to another service? For backend service orchestration, it doesn't matter. If you're starting to do BFFs, Backend for Frontend, and you're having frontend developers support the BFF because it's so tightly coupled to the frontend, they're not used to resilience issues. If they can just call a service mesh, and in their code, it just looks naive. It just looks like, "Yes, I called it. I got the answer. Great, let's put it together." Then service mesh has value. If you don't really have that use case, then, yes, maybe it's not worth the complexity, the Ops expense. The fact that the sidecar proxy may not spin up in time for your pod to pass the readiness probe in a Kubernetes world, there's all kinds of little details about that that you have to deal with in Istio and stuff like that. It just depends on how much orchestration you're doing, and what team is owning the orchestration code.
Farley: I certainly would react badly to the service mesh that you just described. I think my model of a service mesh that's substrating the space-time continuum on which our little services rest is different, and in large part driven by this modeling idea of trying to move the accidental complexity out of my services, and allow the service mesh to look after that. You've got to architect for that. You got to understand the principles involved. Actually, as an individual service developer, you don't need to, if you get that right. One of the things that frustrates me is that we got quite close to what I saw, like an ideal development world on this project that I was talking about before. At the time, I was thinking, this is fantastic. We should try and find a way of generalizing this and getting this out for more people to use. I haven't seen anything do that. Akka comes close, maybe, with some of the things, but still, everybody has to worry about too much detail, it seems. I think there's a future there. Somebody at some point needs to grasp it, because I think there's a way of raising the level of abstraction to allow these services to work more independently. Because there's a lot of stuff you can generalize in that service mesh, and just provide out of the box. I'm loath to say that, particularly given Holly's earlier comments, but super EJB thing, where it just does more in the infrastructure. It will be better than EJB.
See more presentations with transcripts