Manuel's full question: I'm Manuel Pais and I am here at QCon London 2015 with Rebecca Parsons. Thanks for accepting our invitation, Rebecca. You obviously don't need introductions. You've been ThoughtWorks' CTO for more than 15 years now, besides an illustrious past in academia as well. So my first question is if you can give us a brief rundown of your latest projects at ThoughtWorks? What kind of things are on your radar right now?
Well, there are a few things. Obviously, hardware is becoming more ubiquitous with wearables, with increasing interest in sensors, with home automation becoming a reality, more devices for medical purposes. And so there's a lot of interest in how do you look at creating applications that can take advantage of these sensors. What's a good software development process for systems when you have things like hardware devices? What are some testing strategies to simplify the development of systems that now have these very different hardware platforms involved? It's a much more heterogeneous compute platform, not just distributed systems that are effectively the same. So that's one big area. And then, of course, there are all kinds of data implications moving on from that now that you have all of these sensors and all of these devices in the world what do you do with that information.
A lot of clients are very concerned about what's this big data thing and what does it mean to me. But we also have a lot of clients who are becoming increasingly concerned about security and data privacy, starting to think about how do I more properly protect my systems? Are there ways that we can think about data in such a way that it isn't as privacy invading as some of the other approaches are?
So we have a spectrum of clients who are all in on full personalization, and there are lots of benefits to that. Many people are willing to be very open with their data because of the benefits that they get from that. And yet there is a spectrum of people who are perhaps a bit more concerned about the level of data, the volume of data, and the conclusions that can be drawn about a person's life simply looking at data. So this combination of security, privacy, and data, data analytics and how I can make use of that data in a way that maintains that trust relationship between the client and their ultimate consumer.
I think the benefits really focus on the extent to which we can build systems that are easier to change. When you have a single large application, everything that you do goes through that application. Every code change you have to make, you run the risk of the unintended consequences of changing things within one part of the system and you don’t realize it’s going to impact something somewhere else.
So the real benefit comes from the ability to more readily develop and change applications to meet the changing business and environment. But there is a cost that comes with that. There's increased operational complexity. There are more things to deploy, more things to monitor. And so you want to consider microservices in those aspects of the business where there's a lot of change that's happening, where there's a lot of uncertainty and how you might want to address the market or in the kinds of expectations that a new business client or a new class of customers is going to place on your application.
Manuel's full question: Well, you just mentioned the extra burden, let's say, on the operational side when you have a microservices architecture in terms of deployment, automation, provisioning, and service discovery. In your experience, how does that balance or even out with the benefits of extra maintainability of your code and that at run time you can have better control of failure and not have a monolith that is either up or down. But when you need to do a balance between the extra operational complexity and the benefits for the business and for the development, how do you see that equilibrium?
Well, you definitely need to look at what are the pressures that would cause you to need to change your application or to scale different parts of the application differently. As those benefits increase, you are more likely to be in a position where you would want to pay that increased cost. There are other approaches and continuous delivery is a major enabler for microservices.
The approach that we're talking about with microservices really is questionable I believe unless you have the level of infrastructure automation, the level of deployment automation that comes when you adopt a continuous delivery approach. And once you use that, you've mitigated a lot of deployment complexities. So then the additional complexity really comes in what the application profile looks like, when it's running in production. And that's where you start looking at the more sophisticated monitoring, the management for different kinds of failures.
One of the interesting things though is that when you think about it from the perspective of the consumer, if I've done a good job in a microservices setting of managing how I deal with intermittent service failure, I am, as a customer, more likely to get a smooth experience than with the monolith that might not go down very often but when it goes down it goes down hard. And so I might have partial service in a microservices architectural approach where I have no service in the monolith approach. And so there are also these reliability and service delivery perspectives that have to be taken into account on the benefit side of the microservices equation.
I think so, yes. Well, first obviously, or perhaps not so obviously, I do think a strong DevOps culture is a prerequisite. If you don't have the right level of communication between the people who are going to be responsible for these services in production and the people who are developing those services, that simply is not going to work. It's one thing to throw a single monolithic application over a wall between development and operations and have it work. But if all of a sudden you are throwing 25 services and say go forth and make this work, the operations people quite rightly are going to revolt. So I think at a minimum, even if you don't have the level of automation, you need to have the right kinds of communication going and that's what the DevOps culture is all about.
Past that, you could probably do it without the full continuous delivery pipeline. But I think you would still want a lot of tools in there to simplify infrastructure, simplify deployment, and make those individual deployments easier to replicate so you have confidence. You're still going to want to do some level of smoke testing which is another part of the continuous delivery pipeline. So you make sure that all those things are actually wired up properly. And so while it's not essential to have full end-to-end continuous delivery, you're going to have to have some aspects of that. I do believe it's essentially you do have a DevOps culture.
Manuel's full question: In terms of continuous delivery, can microservices also have a bit of adverse effect in a sense that if you have continuously delivery in place with a monolith, you have one pipeline essentially, and with microservices, to have that ability to deploy independently, you will need more pipelines for each service that makes it potentially more difficult to test, to do integration testing, and to have an overall idea of how your system globally is working or not?
If in an enterprise all you really had was one single monolith, that argument might hold. But in any organization of any size and complexity, all application projects are integration at some level. And so you're going to have to sort out those issues around integration testing. It does increase the burden of that. But if you don't have a mechanism for supporting it, you can't really be operating at all in an enterprise environment since they're all integration projects.
You will have more complicated pipelines, and that's where I think some of the developments that are happening around the tooling in the continuous delivery space come in to help you manage these pipelines, visualize these pipelines, monitor the progress through the pipeline. Those things are necessary but those tools exist, and there's actually quite a bit of innovation that's still helping to improve that process. So I would not consider that a major cost associated with using a microservices architecture.
7. [...] What kind of things you need to take into consideration for that decision?
Manuel's full question: And now talking a bit more from the development perspective, let's say, imagining in an enterprise with monolithic applications or even traditional SOA services that might be rather large and not at the level of microservices, which factors would you consider important to think about to make an analysis if the enterprise should be moving towards microservices or not? What kind of things you need to take into consideration for that decision?
I would use a similar kind of analysis that we use in general when faced with a legacy system and trying to look at how do we make this application or this suite of applications ready for the level of change that is being introduced by the business environment and by the customer expectations? You take a look at where are the quality issues, whether it would be software quality or when the apps people or the testers panic when they say, "Oh, we're going to change this module of the system," and they all get terrified. You start to identify parts of the system. And if those parts of the system correspond with things that are changing a lot or that the business feels like they are going to be wanting to make different use of, you're going to focus your attention in those areas.
Those areas are also going to tend to be the ones that are going to change more rapidly and more frequently. And that's where you start to see the benefits of a microservices architecture. It gives you a level of flexibility that it's much more difficult to achieve in a monolith. So those factors that might point you towards a part of the system that needs remediation from a quality perspective in order for that application to address ongoing business needs, that's probably the same areas that you are going to start to look at okay, maybe this is some place where I should start; and rationalize a more appropriate service architecture for that component.
Manuel's full question: Right. That leads us also to the issue of how do you define the granularity of microservices, right? Not only in that particular case that you need to migrate from a larger code base but even if you are starting out or if you have a small application, it's always a question, the granularity of the services that you want to design, how do you recommend going about designing them?
Well, there is a real art to service granularity just like a good object design. There are aspects of taste and there are principles that you look at. In the case of a microservices architecture, the two things that are really intentional are the cohesiveness of the service itself and the level of coupling that you introduce in the service and the amount of communication that might go to other services. And drawing those boundaries that really is the most critical design decision. You want to look at things like what are the communication patterns. If there's a lot of chat that has to happen between two aspects of the system, you might be more likely to bring those closer together and possibly either combine them in a service or look at some other way to make that process, that communication more effective.
You also would like to look at things that tend to be correlated from a change. If I change this, I am also changing this at the same time. Well, maybe you might want to look at bringing those things together. You're balancing the communication cost with the coupling that's introduced when you've got things combined within a service. You'll also want to look at are there opportunities for independent scaling? So maybe I have one service that takes a long time for its part of the process. And maybe I have some other activities that there might be more throughput for it. Well, I might want to be able to scale those things independently so they should probably be in different services.
One aspect of microservices development over time is you may combine services. You may split them out. You may do the same thing with your data stores. There might be times when you combine data together and then you decide actually I need to separate out these pieces of data into separate data stores within your overall data architecture. And that's part of the normal process of developing a microservices application.
I think an important part is to look at the different pieces of data, the way they're used, what their lifetime is, what their level of protection is, and for each of those aspects of the data and each of those pieces of the data decide what are the characteristics that I need. There are some things that you are going to want a hard transactional boundary around. I don't want eventual consistency on my checking account. I want to know that when they withdraw something from my checking account, that is only done once. And so there are some things that need that level of consistency.
Once you allow for the possibility of eventual consistency, then you have to start thinking about what are the consequences? What are the ways that that eventual consistency might manifest itself, and are there ways that I can mitigate against that? You might do models that might tell you what's the probable length of time that you might be in an inconsistent state before you reach that state of eventual consistency? Are there activities in normal operations that would be vulnerable to a situation of inconsistent data for that period of time? You'll probably want to put code to mitigate against whatever those consequences are.
There might be a way that you could scan to ensure: yes, I have now reached a state of eventual consistency. That might be something that is important for certain kinds of reporting applications, for example. You might have to have a barrier between your operations and your reporting store so that you can ensure that okay, now I am reporting against something that is consistent. So it's looking at what are the potential points of inconsistency and what the consequences within the normal operation of that system are if it uncovers one of those inconsistencies.
10. [...] Do you have any recommendations on designing for failure?
Manuel's full question: We spoke before already about the benefit of better handling service failure with microservices. So that's clearly a positive benefit. But there's also a need to explicitly design for that failure, right? So that each service has to be aware that other services might fail and it has to know how to react to that. Again, do you have any recommendations on designing for failure?
There's a sense in which it's easier to work in that monolithic environment because you can just assume things are going to work because it's either all are going to fail or not. Where those services boundaries are are going to be pretty obvious when you are developing one service and I know I am relying on a different service. It's not that that's under the cover. So I know that that's happening. And at that point the design choice is how do I detect that there is a failure, what are my options from a flow perspective if that service does fail, and how do I recover my own processing, are there potentially ways that you can restart that service.
If you are looking at idempotent services, for example, well I can fire up another one and retry it potentially. Because you know when those service boundaries and where those service boundaries occur, you can think about, okay, what are my possible error scenarios and what are my approaches to it. We've been doing distributed systems for decades and we understand what kinds of recovery approaches are available to us? There's really nothing new from the services perspective although we do have some things where it's perhaps easier to think about how might I recover from something like this.
If you have an easy ability to fire up, say, a new instance of a service, that's another failure path. Obviously, you don't want to get into a retry loop on that one either. But again, we know how to do these things. What it means is people who didn't ever have to worry about some of these more distributed systems failures are going to have to know that. But the good news is it's going to be obvious when you get into one of those situations. I think we're going to start to see patterns developing on how people approach both detecting the failures and also the monitoring, the semantic monitoring that might be going on in that services environment where you might actually be able to spot these things before the operational flow runs into it.
Manuel: I guess one difference might be that with distributed systems we're talking about two, three, four, maybe up to ten systems talking to each other and handling failure of each other, while with microservices, the scale might be much larger so dozens or up to hundred services.
Yes. So the number of possible failure permutations is certainly much larger. But at any given point in time, when I am developing service A and I am relying on service B, that's a point-to-point connection, not necessarily a point-to-point communication but it's a one-to-one correspondence. I am talking to this service. I need to know how to respond to the various kinds of failures that could occur with that system.
So you don't get the same kind of combinatorial explosion as you are addressing the individual failure. Where you do start to see it is what kinds of tests do I want to run to understand how you might get cascading failures, what things that I might want to do to put in some kind of circuit breaker to stop cascading failures. And that's why you start to see some of the things like the Simian Army that helps understand what happens if these different parts of the system fail in this way?
So we do have to think about the space of possible failures for more of a testing and an overall systems perspective. When you are at the code level, you are addressing one failure of one service at a time.
It is nowhere near a commodity. And I think part of that is that when you look at what you have to do to go from where many enterprises are to continuous delivery, there's a rationalization of their hardware architectures and environments. There's new tooling that has to be introduced. There are still a lot of packages out there that it's difficult to automate the deployment of because of the way that packages are designed. And you are starting to see organizations when they are looking for COTS implementations that one of the selection criteria is how easy or difficult is it to put the deployment of this application into a continuous delivery pipeline.
So there's still a lot out there that we're still learning what's the best way to deploy some of these things automatically? How do we bring these different technologies, these different packages, these different architectures into a continuous delivery pipeline? Continuous delivery is though being used at not just startup enterprises but we’ve seen it in banks, we've seen it in travel organizations. So it is starting to get accepted outside of that early adopter category. But there's still a long way to go but before it's commodity.
Well, one of the things I am quite interested in, we have recently in the last few years opened offices on the continent of Africa. We have an office in Johannesburg, South Africa and an office in Kampala, Uganda. And I am really quite interested in thinking about the culture of software development and how that culture might change across different parts of the globe. We've got offices scattered literally all over the globe. And so as in my job I get to see what's the day-to-day life of a software developer in Johannesburg versus Beijing versus Perth, Australia, versus New York in the United States or London or Manchester.
And starting to think about what are the characteristics of a software development culture, because as we start to understand more about what that culture is, we can see where that culture is going and what are the opportunities for improving what the day-to-day life of a software developer is. What might be the consequences for things like agile software development if you got a completely distributed team. What are the cultural and tools prerequisites to be able to do pair programming across 300 miles. Some of those tools already exist but normally they're used within a given team, within a given culture. And how might we look at that as we start to see how the software development world is going to change over the next several years.