BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Service-Oriented Development

Service-Oriented Development

Bookmarks
48:14

Summary

Rafael Schloming talks about how the real goal of microservices is to break up a monolithic development workflow. He shows how to build software enabling to move fast and make things.

Bio

Rafael Schloming is Co-founder and Chief Architect of Datawire. He is a globally recognized expert on messaging and distributed systems and a spec author of the AMQP specification. Previously, Rafael was a principal software engineer at Red Hat.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

So, today, I'm going to talk not just about microservices, but I'm going to tell the story of how we started to think about microservices as service-oriented development. And I'm going to talk about why we think this is one of the most effective perspectives to approach microservices from. And I'm also going to cover how to start building your own platform for doing service-oriented development in an incremental way.

But, first, a quick poll. A show of hands, how many of you have asked these questions about microservices, like how to break up my monolith, how to architect my app with microservices, and what infrastructure do I need to put in place before I can benefit from microservices? A good number. These are the same questions we had when we started out, and we learned the hard way that these are not the right questions to ask to start with. What is the right question? We will get to that. But first, I will start with a little bit of the early history and the painful lessons that helped us figure out what the right question is, because this will give you the context to appreciate what I'm talking about.

Microservices at Datawire

So we were founded in 2013, and my co-founder and I, we were both tool builders by nature, so we wanted to help people build things, people were building distributed systems, I had a strong distributed systems background, and microservices is how people were building things, and it turned naturally into a company that wanted to help people with microservices.

Of course, we had a lot to learn at the start. So we dove in and started with a technology-first perspective and we talked to people using microservices, like folks at Netflix, Twitter, Google, Yelp, etc. We talked to engineers on the platform teams and tried to reverse-engineer the technology stack and how they are using it, and a picture for microservices emerged, a control plane and smart end points. And the control plane was the same set of backbone services, like discovery logging, tracing, and metrics, and the smart points were about providing the resilient layer semantics you need, like limiting and circuit breakers.

And so we looked at this and that's a lot of tech to build just to get started with microservices. We can help people by building this for them, and we built a control plane as a service, we use microservices ourselves and it seemed like a natural fit, the control plane was set into a separate suite of services, so why not? And things went great at first, it was fast to start out with, and then we launched and things started slowing down.

Debugging Velocity

And so, we took a step back, and, uh, scratched our head for a little bit and trying to figure out why things are moving too slow. And, you know, as always the case with this sort of thing, when you are in the middle of things, it is not always obvious; there's a couple reasons you think of, and we looked at a couple things. We looked at the technology we were using, we went through five or six different deployment systems, we looked at our architecture and we said, maybe we did not decompose this control plane into the right set of services, so we tried a couple of things there. We triggered that a bit.

And that -- neither of those things seemed to make that much difference. And so, finally, after going through a bunch of releases and, sort of, looking back and carefully asking the right questions, we figured out what was slowing us down. It turned out that every feature we released required carefully coordinating the efforts of multiple different people. And some of this, and these people weren't sort of naturally inclined to communicate on their own. And so some of this was for bad reasons, we fixed those, and that made the big improvement in our velocity.

And, but, some of this was actually for very good reasons. We didn't want to break our users, so we were releasing things; the process we had for releasing things was deliberately more careful. So, in the end, it wasn't our technology or our architecture that was slowing us down, it was our process.

Debugging Our Pipeline

Now, at the same time, something else was going slow. And that was something that was actually a lot more concerning: our pipeline. It was a big pipeline, we were talking to about 30 or 40 different companies, and they really liked us a lot. They kept, you know, they were all going through migrating to microservices at one stage or another, and they kept coming back to talk to us, because we had a lot of useful information for them.

But, a lot of them were moving slow and, because we talked to them frequently enough, we were able to track the progress and put together a picture of what was happening. We noticed a pattern. The slow movers fell into two camps. There were a bunch of companies that were very deliberate about banking off all of the different technology alternatives, and it seems that they thought the benefit of microservices somehow came from picking exactly the right technology stack.

And then there was this other category of companies that were approaching it from a much more architectural perspective, and they were trying to figure out exactly the right set of services to decompose their monolith into, and this meant that they were taking a while. And then we also talked to all the fast movers, and we had a brilliant idea to have them tell us about the first story, tell us about how you actually did this. And with the fast movers, we got a host of different answers. We talked to a company with a rails monolith and a file upload functionality, and occasionally some user would upload a multi-GB file, it would take down the monolith, and the whole application would go down. So they had to fix this pretty urgently. They decided it was, it would take too long to fix within the context of their monolith, and the file upload end point is a simple service to write, and they wrote it outside of the monolith and they were able to deliver it quickly.

There was another company that wanted to adopt modern CICD tooling, and they decided that it would take way, way, way too long to retrofit their monolith, and they just decided to start implementing new functionality outside of the monolith and adopt new tooling as they went.

And there was another company that had a bunch of personal private information, and they wanted to minimize the scope of the auto process that is required for code that has access to that sort of thing. And so, they -- they isolated and contained those in services that were as small as possible. So the common theme seemed to be, all of these companies had an urgent need that could not be addressed quickly enough in the context of their existing process.

Velocity Comes from Process, Not Architecture

And so, we kind of put two and two together with our own experiences, and decided that, okay, velocity actually comes from process, not architecture. And this kind of makes sense if you sort of take a step back and think about it, right?

We are all familiar with the classic death star architecture diagrams of thousands of services that come out of successful microservice companies. And, if you think about it, you don't stand in front of a white board and design that kind of death star. That's not how you get there. You get it by enabling a different way of working that, as a by-product, makes it faster and easier to churn out the hundreds of different services to get to that point. This is when we ditched service-oriented architecture, fine grain or otherwise, and thought about service-oriented development. And to understand this, it helps to recognize two things that dramatically impact the way we work.

So, first of all, in software, stability versus velocity is a fundamental trade-off. The faster you go, the more things break. And this is why things slowed down for us after we launched. When we were prototyping, we could break whatever we wanted, whenever we wanted. And the second we had users, we adopted a whole bunch of practices that slowed us down, and for good reason. If you have a lot of users that rely on you, or a few of you that rely on you for something really, really important, the more careful you need to be and the slower you can move. And what that means is that, if you are trying to add features while maintaining stability, there is no Goldilocks point on this curve for you.

A Single Process is Inefficient

A single process is inefficient because it forces a single stability versus velocity trade-off, and there is another important factor to recognize. The development process involves a bunch of distinct activities that I'm sure you are all familiar with. And something that can be harder to notice when you are small is that you can really only do one of the activities in this process at a time. And yet, when organizations try to scale, they seem to do this by building specialized teams associated with some or all of these activities.

And, as an organization like this grows, this turns into some combination of dramatic under utilization and departments fighting each other. If velocity is your top priority, then operations will get frustrating with developers for breaking things all the time. And if stability is a priority, then operations will ultimately end up putting processes in place that slow down development. And, when development slows, production management, or, sorry, product management is frustrated. And if your leadership doesn't understand this, then they can end up setting up a combination of goals in an org structure that literally pits an organization against itself.

Microservices Lets You Run Multiple Processes

And what this means is that a single process does not scale with your organization. This is why, instead of asking how to break up a monolith, we like to ask, how do I break up my process? And this is the question to ask if you want to go faster.

And this is the perspective to approach microservices from, because microservices lets you have as many different processes as you would like. With microservices, you can set up a distributed development workflow, and that lets you customize the processes that each service uses. And, when you do this, that's how you can move fast and keep things stable, and get benefits almost immediately. Multiple simultaneous workflows, including the existing monolith, all tuned for that ideal stability versus velocity tradeoff.

So this sounds great, right? So how do you actually get started? Well, starting from the right principles makes this a lot easier. But it is still a huge shift in how people operate, and this requires organizational and technical changes, and so I'm going to cover some of the ways you can make this easier on both fronts, starting first with the organizational side of things.

Organizational Implementation

First of all, you have to give in order to get on this front. You need a big emphasis on education, communication, and delegation. And if you look at the picture on the right, you can see why. On each of our small teams, all of our former specialists are exposed to every aspect of the development cycle. And so, there's a big learning curve. And also, because they are all specialists, nobody speaks the same language. So communication can be a challenge.

And, with this model, you end up delegating much bigger parts of your business to much smaller teams. And this is, sort of, the point. But it can also be very scary. But, when you do this, when you do this right, you can get a lot: education. With education, your specialists become generalists and this leads to holistic systems. With learning comes personal growth and job satisfaction, and with communication, the conflict that was there before turns into collaboration, and with delegation, you can achieve massive organizational scale.

And the benefits when this is done well, they are hard to overstate. I once spoke to an engineer at an ecommerce company, and he told me remarkable things about the impact of the latency of recommendations on customer conversion. They live and die with conversion rates, and these product recommendations increase conversions but if it increases the page latency beyond a threshold, then the human is frustrated and the conversion rate drops off. And if you think about this, it demonstrates a remarkable breadth of knowledge across many aspects of the business and the technology behind it. And so, what's the best way to go about doing this?

Create Self-Sufficient, Autonomous Software Teams

Well, we want to start by creating self-sufficient autonomous software teams. And why self-sufficiency and autonomy? If you have to work directly with other teams to get stuff done, their process is interjected into yours. If you have autonomy, you can choose the best process to meet your goals.

So to get there, there are two things to be aware of. First, you need to be aware of centralized specialist functions and try to eliminate this. No centralized architecture team, and no centralized operations team, and don't get confused between a platform team that provides a platform as a service, and a centralized ops team that is responsible for keeping the services running. These are two very different things even though, on an org chart, they can look very similar.

Think Spinoff

And second thing is to think of your teams like their own little spin-off companies. You probably already consume external services, like stripe and Twillio. Your new microservices team, think of them like another one of these, and pick a real, urgent business problem that you wish you could buy instead of build, and then you can form an internal spin-off to build it. This sets up the right mindset for a lot of things. It helps you think from a process perspective, it helps you define the service, you are thinking mission statement, what is the user and what are you trying to help them do? This is a more effective way of designing services than big, up-front design, and it helps establish communication because you are establishing a customer versus a co-worker relationship.

And it helps you form the right team, because with a spin-off, you need a mix of skills. You cannot put a bunch of developers on a team alone and expect them to succeed. You need people with product-oriented skills that understand the user and can translate that into engineering tasks, and you need people that can code and keep the service running. And you can do this without fancy tooling or technology, but it is more efficient if you give them an awesome suite of service-oriented development tools.

Technical Implementation

And that brings us to the technical implementation. And so, while it is a spectrum of stability versus velocity tradeoffs, I will break it up into three different stages. Early on is what I call the prototyping stage, and in this stage, the goal of the tooling is to give you really fast feedback from both perspective users and the tools that we all rely on, things like compilers.

And stage two is really about balancing production, balancing production concerns, users and growth. And the goal of this stage is adding features without disrupting users. And then there's a third stage, and that's the mission critical stage, where the primary goal is stability.

And so, we have all worked on stuff on all of these stages, there are sets of tools that you can use to do all of this stuff; this is easy, for rapid prototyping, you can build something on Rails on the laptop, for the growing user base, we can put it in Heroku, and for mission-critical stuff, we can go and move it over into a real cloud. So what's the problem with this?

Well, we are re-platforming at every single step, and that's a whole lot of work. And so, what we actually need is one platform that can support parallel development at different stages with seamless transitions between stages. And this is a whole lot harder. The good news is that Kubernetes, Docker, and Envoy give you the technology to do a lot of this stuff and, just in case not everyone is a Kubernetes expert, and if you have not heard of Envoy, I will talk about it. Kubernetes is like a thermostat; you tell it the desired state of the infrastructure, it tells you the state of the world and it tries its best to make you happy. And it provides a powerful applied primitive to update the desired state of the infrastructure to something new. When you do this, it will safely transition from the current to new state. And it is fast at doing all of this, and this combination is powerful. It lets me treat me infrastructure like source code, I can check it into Git, dif it, patch it, and this fits the way that I work in a lot of ways.

And Envoy, you can think of it as a powerful L7 network proxy. And it is L4/L7, but we care about it for the L7 capabilities and it provides a huge toolkit of L7 routing and resilience behaviors, and great observability and things like dynamic re-configurability and zero connection loss/hard restart. And there's a good reason to pay attention to both of these projects because a lot of smart people collaborate on these to solve some really hard problems.

And the bad news is that you actually need to wire all of these together into your dev tooling the right way in order to get any benefit. And this can be hard. These technologies are all super general, and they solve all of the hard problems you are going to run into, and a whole lot more that you aren't. And they are definitely worth learning, but learning them all up front and then playing around with all the different options for wiring them all together, that can actually take a while. So the rest of what I'm going to talk about is the key strategies that can help you get started with this incrementally. You might see a shameless plug here and there, we might have a tool to fit into this picture, but I will try to keep that to a minimum.

How Do I Actually Use These Technologies to Build My Workflows?

So how do we use these technologies to build tooling for my workflows? Well, if you remember, we want fast feedback from our tools as well as our users. And, to reach users, we need to run on production infrastructure. Operations does not always like letting developers mess with production, so you need to get buy-in from all parties for an accessible way to do this.

And second, microservices very quickly grow too many remote dependencies to run locally, and not being able to run code locally really slows down the feedback from all the tools you are used to depending on. And so, excuse me, we have two strategies for dealing with this.

One is you need to build some kind of self-service provisioning, and two, is development containers. And so, the combination of declarative infrastructure from Kubernetes and the dynamically reconfigurable edge proxy, or Envoy as a reconfigurable edge proxy, is a really powerful bases for self-service provisioning. And, when you wire these things together to provide that, you need to focus on how quickly you can spin up a service from scratch. You might not think that's a common case, but, if you have too much friction, you might get lazy and you might put an independent new future into an existing service, and this leads to accidentally coupling the process for the new feature into an existing process.

And, with Kubernetes and Envoy, you can with the aid of some relatively straight forward tooling, spin off and publish to less than a minute if you can code fast enough. This helps a lot because you can play around with ideas, you can mock up an API, you can call into other services, you can use your mocked-up API to see if it is worth taking it to the next level.

And this gets to our next problem. Coding on remote infrastructure is really slow, right? And just for some perspective: a VM-based pipeline has a deploy time, you know, measured in large fractions of an hour. A Docker-based pipeline has a deploy time that is maybe a few minutes. And, by contrast, hacking react on my laptop with live reload, it is maybe one to two seconds, and most of this is because somehow JavaScript managed to turn itself into a compiled language when nobody was looking. Hacking flask on my laptop with live reload, it is pretty much instantaneous.

So, how can we do better than this? Well, developing inside a container actually helps with this in two ways. If you put the build inside the container, you have a single source of truth for your services build tool chain and all of its dependencies. And this makes your dev environment consistent and portable to other developers. And this makes on boarding easier, or having devs jump between services easier.

And this helps you build a faster feedback loop. And there's a bunch -- as soon as you do this, there's a bunch of strategies available to you. You can run your dev container on remote infrastructure, and you can sync your edits -- sorry, sync your local edits and re-code and run the build right there. You can run the dev locally, and snap shot the live container to create an image in seconds. When you do this, this even works for compiled languages that take a long time to build from scratch. And using this technique, you can deploy just about anything in seconds. You can also, with a bit fancier tooling, use a two-way proxy to wire that live container that is running locally into the remote cluster; this allows you to do the live reload stuff into the debugger, and this happens while you receive code from remote services and sends traffic to the remote services it depends on.

And option two and three can actually work pretty nicely together. Option three lets you use all of your local tooling while you are in the zone, kind of hacking away and then when you are happy you can use option two to push your change into a shared deployment so other devs can see them. We implement options two and three into our awesome, free, open source tooling. If you like the sound of this, check it out, if you hate it, you should play with it in-depth just to make sure it is as awful as you really think it is. And so all of this fast deploy stuff is really motivated by prototyping workflow, but it actually goes beyond that.

Fast Deploy = Resilience

Being able to quickly fix things when they are broken is the most fundamental resilient strategy of all, and it often gets under used. I have a theory that this is a holdover from that org-level dev versus ops tension. The only way, typically, that developers have to keep things from breaking is to anticipate and eliminate bugs up front. And the only way operations can keep things running is to go back to a known good version. And this is limiting; developers do not think of strategies to test things in production, and increasing numbers of things these days can only be tested in production, and operations does not tend to fix things quickly by rolling forwards instead of backwards. That's a shame. In modern systems, things often break because of unanticipated environmental changes, and rolling back does not really help with those.

And this brings us to stage two: trying to add features without disrupting users. And so, the key organizational challenge here is recognizing the reality of stability versus velocity tradeoff, and making a deliberate choice for each service; measuring user impact is actually a business-level problem, and you need to consider this as well. It is not just about 500 devers. Going back to ecommerce, if conversion rate is fundamental to business, you need to measure that and make it available to developers. As we know, back-end latency can impact that a lot in unintuitive ways.

So the key technical challenge here is that you are changing code in production. And you have got lots of protection from hardware failure these days, but how do you protect yourself from your own bugs, right? If you roll out the exact same buggy code on empty nodes of a cluster, you are going to end up with some catastrophic failures. And as we said before, more up-front testing does not help. We need to minimize the impact of unanticipated bugs.

Strategy: Genetic Diversity

And the basic strategy here is genetic diversity. We want to keep multiple versions deployed so that any single bug is less likely to be catastrophic. The most basic form of this is canary testing, don't need to stop there, you can go to as many versions as you like, and Kubernetes can do the heavy lifting for you, in Git, it can deploy as much as you like that can forward, we do it by managing branches to deployment. And instead of keeping it correctly, we instantiate as many versions as we like for a variety of profiles from the branch. And that means you can roll the service back, roll it forward, you can run multiple parallel versions all in terms of whatever Git workflow you want to adopt.

And so this one diagram you are looking at, it can be both an infrastructure diagram and a Git branching diagram. This makes transitioning between workflows super seamless, you can deploy straight to a single primary version from the command line while you are early, just like if you commit to master if you are playing with stuff. And if you want to emphasize stability, you can run the same deploy command off of deploy or tag in the CI/CD server for a more regimented workflow.

You need more integration so you can trigger envoy configuration when these things spin up and down, and we do this in an open source project called Ambassador, and it provides that integration that dynamically reconfigured Envoy based on limitations in the Kubernetes manifests. And fast deploy helps with resilience, and this version of deployment helps with prototyping. This basic model, every developer can deploy a private version to hack on for any service. You can configure to give a host or URL prefix to access it, just like they would access the primary version. And that speeds you up even more, and before you know it, you will be ready to start worrying about stage three services.

And so, I'm going to try to keep this short, because this stuff is definitely not part of getting started with microservices. But, I'm happy to answer more questions about this later on. And what I will say here is that focusing on the organizational fundamentals will probably help you more than technology here. And that's because there is actually a bit of a failure mode here, where organizations can regress back to a centralized process when they get to this stage. And this usually happens because, a some point, you have a cascade failure. It is a big deal, and somebody is put in charge of making sure it doesn't happen again. And depending on what they decide to do, they can end up re-introducing a single centralized process.

Strategy: Service Level Objectives & L7 Observability

And now, the basic strategy for dealing with this is to define explicit service-level objectives, things like Perkens, humor, latency, and availability ranges and provide good layer seven observability. And to understand how these things work together, it helps to understand the nature of cascade failures. And they come up when you have a long chain of synchronous dependencies, and you can think of them like Christmas tree lights; when one service goes down, the whole chain goes down. To prevent this, you need to consider service D mission critical, there by slowing it to a crawl, or change the dependency that C has on D. There are many ways to do this, you can make it asynchronous, cache it, denormalize it, or eliminate it entirely.

And so how do you decide what to do when something like this becomes a problem? Well, first you need good network-level visibility so you can root cause something like this. And second, if service D has clearly defined service level objectives as part of its contract and this outage causes C in its service level objectives, which the prototype you might not care about if it is timed, it is clear if C should take action to change the nature of its dependency, or if D is violating its contracts and should take action to improve its stability. And this can happen without Draconian levels of centralized oversight. And this is one of those things that I find really, really interesting about microservices.

Summary

I came to this early on as a purely technical problem, but now I view it as something that is a whole lot more than that. And for me, learning about microservices has really been about learning how people can work together better. And the part I like best is that it has this almost paradoxical element, where enabling people to be more self-sufficient actually turns out to be key to making them cooperate better. And I really like that, despite the fact that I have been on both sides of every argument a group of crotchty engineers can blunder into, I think the only reason it works is because, deep down, people want to cooperate with each other. When you figure out how to unleash that, the results can be really phenomenal.

And so, start with thinking about how to break up your monolithic process, spin-off self-sufficient teams so people have the freedom to figure out how best to help each other, and do this by building awesome tools for service-oriented development.

Thank you. If you are interested in reading more about any of this stuff, you can check out -- we have some hands-on tutorials on the website, you can check out the tools that fill in just a small part of what you need for service-oriented development platforms up there. I'm happy to answer any questions.

Hi. You made a distinction between an ops team and a platform team- would you elaborate on that, please?

Yes. So this is something that I've seen happen as companies that thought they were doing microservices and they thought they were doing microservices at scale actually go through. And so, the ops team is really the ones that are in the front line and is responsible for keeping each service running, if you have thousands of services, you can't centralize that ops team. The platform team is really just responsible for providing what you can think of as almost as domain-specific platform as a service to enable the rest of the organization to function. And so a platform team is, you know, effectively the team that would be -- that would be building the tooling to support the rest of the team. But it is really more of a -- the relationship between the platform team and the service teams is really more of a customer relationship. I think Netflix is actually famous for formalizing that. So they produce a lot of internal tooling, but every team is free to choose whether or they want to use the tooling, build their own, or use something else. And so, hopefully that makes it clear.

Hi. Right here.

Yeah.

So you mentioned -- so you would be glad to learn that, in our company, we do almost all of these steps. We have all of this, and it goes really well. However, one of the bottlenecks we face is deploying to production; that pipeline that takes what you have developed internally across the company, by the time it makes it to production, that entire process could be really slow. It is not like it will go out in two days, because there's a lot of things we need to provision for, the redundancies, the data store requirements, the SLAs, etc.

Yep.

If you break all of these teams into antonymous teams, this entire end to end is really hard to deliver, because of that particular common requirement across all the teams that are deploying to production. So question: what challenges have you faced with respect to that? Have you experienced it to be a bottleneck ever, and how have you solved this problem, or what are your suggestions to handle speedier deployment to production and not make these things a bottleneck?

Yeah, so that's -- I mean, that's definitely a factor. I think that is why you see sort of the first generation, or earlier microservices companies have much larger platform teams, because they had to build a lot of that stuff for the rest of the teams to depend on. These days, there has been multiple generations of sort of microservices companies and the way that I look at microservices, sorry, the way I look at high-growth SaaS companies, they dump into one end and dump out open source technologies into the other end. They have a lot of money and people to build these services and lots of tooling to help them. And more recently, there has been a convergence around the tooling. The choices are more obvious, and it is easier to assemble this stuff. So Kubernetes, like I said, it is doing most of the heavy lifting around around that deployment, and using Kubernetes, you can spin up -- you can make that deploy process as fast as you want. Right? So the limiting factor with the right combination of Kubernetes and Docker and -- well, Kubernetes, containers, and some kind of dynamically-reconfigured API gateway, the limitation just comes down to the process you want to choose as opposed to the, you know, the limitations of the technology.

I have more of a comment than a question. I think that -- you seem to make the statement that there's a trade-off between velocity and stability.

Yep.

I think that's wrong. I think it is like a triangular, or 3D space, where complexity and simplicity comes in. In Amazon, we launched a service in three days in every pageload, it was simple. And getting that architecture is through simple services.

Yeah, that's an interesting question, a lot of people ask, how do you design a good microservice? And I -- I never used to know what to say, because, like, they are -- I have no familiarity with the domain of whatever company they're from. And so I'm really not in a position to answer that question. But I eventually came up with a good answer, or a clever answer. The way to design a good microservice interface is by throwing away a whole lot of bad ones first, and that's part of the difference between, you know the service-oriented architecture idea and the service-oriented development idea.

Right? It is the difference between the slow, deliberate architecture process of trying to predict in advance, and the faster and more iterative process, we will try to bunch of services, this is the problem that we are trying to solve, we will try a bunch of different API and see what sticks. Right? So one of the things that is shifting at a really rapid pace, that the cloud is enabled and a lot of the newer technologies for using the cloud that is enabled is that you can apply that much more rapid iteration process to distributed systems. That was not the case. Distributed systems used to be very slow and expensive to set up. And now, they are cheap. You can try stuff, if it doesn't work, you can throw it away. If you look at the last example, architecture the topology, with the Christmas tree lights, you have architecture happening, the shape of the service is defined but not in a centralized way, that's how you get architectures that are too complex to fit in one person's head. You are moving between the boundaries of your services, you are moving that choice of how to structure the boundaries between your services from a technical decision to, just, a much higher-level decision, right?

The job of this service to the help this user accomplish this thing and, you know, the team can try as many different APIs and have as many different parallel APIs as necessary to accomplish that goal. And that's why search is a great example; it is one of the most intuitive, easiest microservices. And when you are working with, when you are trying to work in a service-oriented development style, a lot of the natural architectural intuition that you might have, it gets in the way.

And so, you don't think that to just try things; you think about your instincts and what not that are probably based on data that is maybe five years out of date. And so that is something, that is something that you kind of need to, need to constantly fight against with this sort of thing. So I don't know if that answers your question, but that is one is of the ways that this tooling that just speeds this stuff up, it can actually help you with design, because you can try so many more of them.

I have a question. So you are talking about spinning off the teams.

Yep.

These small teams. Can you talk about, like, I think one of the things that makes it easy to have a monolith is that it is easier to attribute cache, so cache flow to a particular monolith. Can you talk about how to create that feedback loop with you have the small teams, what teams are being effective, and can you spin up the multiple of the small teams providing a similar service like you would see in the marketplace?

I think that is an interesting question. When you are talking about cache, are you talking about revenue, or cost?

Both.

Both? So it is something -- so, okay. In, you know, because we are trying to help people get started with microservices, we are tending to focus on the smaller to medium-scale issues, but that is something that definitely comes up when you are at a larger scale, and there are great examples of on both sides of how large organizations start to actually account for some of this stuff. I know that on the cost side of things, it is really common, when you build this kind of thing, it is common to start using way, way, way more resources, more cloud resources than you realize. And so, at some point, you usually need to give your teams visibility into that. Right? And so the first step you usually start with, there is actually trying to build some kind of traceability so that, and having, like, a leader board. Okay, who spent the most money this week? Right?

And, you know, taking steps like that can actually improve things a lot. And if you, you know, if you get to the scale where you need to do it, you can start building effectively internal accounting systems and stuff like that to accommodate that. I haven't heard as much on sort of the revenue side of things, but, yeah, I'm sure there are similar stories if you look. If you are interested, I can probably dig some case studies up that might involve some of that stuff. So, grab me afterwards.

Hey, on your left. Right here.

Thanks for the great talk. One interesting inspiration that I got is letting the customer, the end user, having the access of the version of the product. That is interesting, that feels like to me that we are dog fooding our product on our end user. How do you think that process conflicts with your business decision in general, and especially the feature-heavy companies?

That's a great question, and that is why it is so important to build your teams with that sort of, that cross-functional spectrum of talents. And that needs to include the business side of things as well. Because, where was it, I was at a company once that they were -- they thought they were doing microservices, they had multiple teams, but they sort of -- it was more, sort of, the dev, the separate dev team's picture. And there wasn't a whole lot of trust there. They had their -- they had been trying to refactor their monolith, I think, for two years. And they tried to do a big cut over at one point. They were an ecommerce company, and they had a big, scary experience, the conversion dropped, and they didn't like that. And they got gun shy, and they pretty much spent a number of years spending a lot of effort and almost being stagnant. One of the things that you can do when you get the business aligned, new features don't have to be scary and negative. Canary testing does not necessarily mean bugs, it can actually get -- canary testing can be domain specific as well. It can be a positive thing to have users try new versions of the product because they get new features earlier, you can incentivize it and give them credits to try new versions. There's a lot of steps to take to make that be a much more positive thing as opposed to a negative thing. In the end, when you do that, you are ultimately trying to help the end user, and so they will appreciate that if you make that clear.

Live captioning by Lindsay @stoker_lindsay at White Coat Captioning @whitecoatcapx.

See more presentations with transcripts

Recorded at:

Jan 13, 2018

BT