InfoQ Homepage Presentations Everything is a Plugin: How the Backstage Architecture Helps Platform Teams at Spotify and beyond Spread Ownership and Deliver Value

Architecture & Design

Everything is a Plugin: How the Backstage Architecture Helps Platform Teams at Spotify and beyond Spread Ownership and Deliver Value

View Presentation

Speed:

Download

40:08

Summary

Pia Nilsson and Mike Lewis explain how the Backstage plugin system brings disparate pieces of functionality together, and walk through examples of how Backstage can be extended and interconnected.

Bio

Pia Nilsson is Director of Engineering @Spotify. Mike Lewis is Staff Engineer @Spotify.

About the conference

Software is changing the world. QCon London empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Nilsson: I am Pia. I work at Spotify. I lead developer experience at Spotify, as well as leading Backstage since its inception in 2017. I joined Spotify in 2016, having been an engineer for 14 years. I joined as an engineering manager for one of the teams there in the platform organization. I was so excited to join Spotify. I was very impressed with the autonomous engineering culture, thrilled to work in that exciting, world-leading audio space that Spotify is in.

This is the reason for being on this stage, that I have never struggled so much in my entire life to add value quickly. Leading the Backstage team back then, which of course didn't exist when I joined, is a healing journey for me, as well, because Backstage is trying to solve many of the challenges that I personally struggled with in the beginning.

Lewis: My name is Mike Lewis. I am the tech lead for Backstage at Spotify, which means I get to work with the amazing team that we have at Spotify working on Backstage. I get to think about Backstage all day, which is so fun. I've been at Spotify about 5-and-a-half years now. When I joined Spotify, I was working in the premium mission, working on things related to spotify.com, and checkout, things like that. I was using Backstage every day and seeing the value that we get from Backstage at Spotify. When the opportunity came up to join the Backstage team and start working on it myself, I jumped at the chance.

Nilsson: We are here to speak to you about how we use our developer portal Backstage plugin architecture, to change the ways of working for our 3000 engineers. I think just that sentence is important, at least to me. It's not all about technology, although that's the heart of it. It's technology in order to change the ways of working in a meaningful way.

Backstage Journey

Before we get into the plugin architecture, and why it matters so much to us, I think it's important for you to know some little thing about what kind of challenges we were facing at Spotify, back in 2017, when we were really starting to talk about these problems. These are accurate clouds that I took from our ecosystem. Imagine them a little smaller back in 2017, but you can understand it was a similar challenge.

This is what was facing any engineer joining Spotify in terms of scale. Every single dot here is a component. You see the backend services. Every single dot is an actual component. Every single line is a component using another component. This is the scale that you are meeting today at Spotify. Back in 2017, it was starting to hurt our productivity quite a bit. All of our productivity metrics were trending in the wrong direction. One of the metrics we were measuring is number of days it takes for your 10th pull request, if you are a newly onboarded person.

That's just one of the crude metrics we were using. We were up to 60 days, which absolutely wasn't our target number. As I said, those are trending in the wrong direction. Adding to this complexity in scale, I'm sure many of you have heard about Conway's Law. Conway's Law is that the systems tend to look like the org chart they were created in. At Spotify, we have this autonomous engineering culture, which is just beautiful. It's just wonderful to work in. People are very excited. They are passionate. They own what they build. It's lovely.

The backside of that is that people were also, back in 2017, expected to deploy their code, for example, and do all kinds of infra for their code all by themselves. Many of these data endpoints and backend services, they were all created in some way that that particular team felt was the right thing to do. Entering into that scene as I was doing, that's what made me struggle so much. I could never extrapolate a learning from one place to the next because it was entirely different. Documentation were in different places. Services were written in different languages, of course, different libraries didn't work together. It was like the Wild West, but in a very loving way.

Mike and I, we worked for the platform organization. Of course, many of you know, platform organizations, our job is to fix this ecosystem for the engineers that are building the actual business value, so that they can do it much faster and with more happiness, so that it becomes greater and faster for the company. We were thinking hard about this particular problem, what are we going to do to help our engineers through this Wild West place they are in? I happened to lead, at that time, this little team that owned the backend component catalog. It was a tiny system.

Only the backend engineers cared about it, nobody else knew about it. They owned this backend component catalog. Having that reality, and being faced with this problem, I believe very much was the embryo to why we realized like, ok, we're seeing this problem, of course, of scale and complexity for all engineering disciplines. However, the backend engineers, they are just a slightly bit happier, because they have this component catalog, at least. What if we actually create the catalog for everyone? That's the little embryo of why we started the Backstage idea.

I think, over the quarters, we were doing all of these engineering surveys. The top two problems here were being surfaced to us. Engineers were calling out top productivity blockers, difficult to find things, and context switching. These were sort of people who were very aware of these two problems, and they are connected. As probably many of you understand that if you have problems finding things, you got to pull people into a meeting, you got to Slack them, you got to tap them on the shoulder. Then I will do that to you, and you will do that to me, and all the other seven teams I'm integrating with. You can imagine the amount of meetings that our engineers were running around trying to be helpful all the time. That we bundled into the world of context switching, that's what it means for our context. These were the top two problems, according to the engineering service we were running.

Then the platform organization, we thought, there is actually a third problem as well. If you remember these clouds, we were so unable to help engineers. We were seeing it. We were seeing our metrics trending in the wrong way. We were seeing people couldn't be productive within 60 days of onboarding. We couldn't do that much, because of all these dots, where there were no standards. Naively, Spotify had the opinion that standards are boring, and it's the opposite of freedom. In our little team, since we were feeling this pain, we created this little slogan saying, standards set you free. Then we rode around on horses here at Spotify, and like, "Standards set your free." Which is like a little bit of a joyous way of saying like, no, actually, if you don't have standards for your software ecosystem, you're totally tied down to all the boring stuff.

Because you are going to have to set up your CI/CD deployment systems. You are going to build your build pipelines. There are going to be no templates. You're going to invent it over again, over 500 teams. Of course, we were seeing a lot of duplication everywhere. That's the third productivity blocker that we were discussing. Simply put, this is what Spotify was looking for a solution for. Also, for all of you who are contemplating using a developer portal, this is what we used with our upper management to help them understand what are you trying to solve. We're trying to solve speed, scale, and chaos control. People can relate to that, because they see that it is a little bit of chaos.

Backstage - A Single Pane of Glass

Here it is. This is what Backstage looks internally for us, what it is, a single pane of glass for your infrastructure, that's what it is. It's simply put, a homepage. Even simpler than that, it's a catalog for your infrastructure, all of it. All of you I'm sure know, any system that some kind of platform organization, like what I'm representing, puts together and offers to our engineers internally, is only as useful as its adoption. It can be as beautiful as can be, but if three teams use it, it's just very expensive.

How do we make sure that Backstage was used? That's where infrastructure as code comes into play. The metadata on each component needs to be in the repositories. Hence, the ownership of that metadata needs to be transferred to the owning team of that component. I think that's a little small, but important engineering best practice that I'm sure all of you know about. That is the basis for why I believe our catalog happened to actually stay relevant, stay useful to our engineering population. Then, of course, we want Backstage to be more than only components. We want to add more kinds of functionality, such as measuring standards across the fleet. There are so many, monitoring and CI/CD. In order to do that, that's where the plugin ecosystem comes in.

Key Backstage Engineering Best Practices

I wanted to say, these two engineering best practices that I really think, if one tries to figure out like why was Backstage successful, really, for Spotify? Of course, we have been asking ourselves that question for very many years. I think it's as simple as this, infrastructure as code. Then, for today's talk, we're going to focus on the second one, which is the distributed code ownership of the plugin architecture. Extensibility is the plugin architecture that enables this distributed code ownership.

That is key to distribute it, to decentralize the decision making to the team actually owning the expertise. Instead of having, in our example, my little team building the Backstage developer portal, trying to figure out how to build all of these new functionalities into the Backstage portal so that it would become meaningful for all of the 3000 engineers. It goes without saying that that can never happen, that will never scale. It's like it's a must to have a plugin architecture for us.

Backstage Extensibility Architecture

Now we're going to deep dive into the plugin architecture structure, to give you a broad view of what it is.

Lewis: Extensibility is just so important for Backstage, and it's important at Spotify, and it's important elsewhere, too, because Backstage is open source. It's used by thousands of companies around the world. It's important in both of those contexts. I want to tell you a little bit about how extensibility works in Backstage and how it's changed over the years, and what we've learned along the way. First, I think it's important that we cover the high-level architecture of Backstage. How many folks are familiar with the full stack web architecture of building Node.js, React web apps, just JavaScript on the frontend? I'll try and keep it fairly generic. At a high level, Backstage is a web based React frontend. There is a horizontally scalable backend written in Node.js. That backend is talking to some infrastructure, a database, potentially a cache.

Then the backend is also talking to some third-party APIs as well. Those lines of communication between those things are usually HTTP, although other things can be supported at a plugin level, too. There are some logos there to represent this tech stack. In that context, what's a plugin? A plugin is just a bundle of TypeScript code that's been compiled into JavaScript, published to npm for public packages, or even private for adopters that don't want to share that package, and it's just used for private internal use, there's no need to publish it. It's generally for use in either the frontend or the backend, although sometimes there's isomorphic packages in the mix too. The standard is frontend or backend.

For those who are familiar with full stack web development, there's maybe an argument to be made that we've done it at that point. React in the frontend and Express in the backend are already pretty extensible, or at least composable. You can render components from other places. You can bring middlewares into your Express app, and you've done it. You've extended your app with new backend functionality via the middlewares, and new frontend functionality with your React components.

Extensibility by default is not enough for a couple of reasons. The first is, if you're building an extension in a model where all you have is React on the frontend and Express in the backend, that's hard work. There are no shared solutions for the things that you need to get started quickly and get working building on your plugin. There are also much more decisions to make. Every time you need to decide how you're going to manage your database connection, or logging, or any of those things, you got to start from scratch, make a decision. It's more cognitive load. It slows you down, so you're less efficient.

Also, the results, the plugins that get built in that ecosystem are less consistent, which is bad for people using them. That's on the plugin builder side. What about adopters, people using Backstage? In that world, Backstage adopters have a lot of fiddly wiring to do. They have to wire everything up themselves in the backend to run the middleware in the Express app and provide it with its dependencies. By the same token in the frontend, they need to render the React components at the right time and provide those dependencies too. It's a lot of work.

I think of it like, what we want in a plugin system is we want like a power plug that you just plug in, and it's done, you're finished. What that's like is like having to wire a plug. It's a lot more work, and it just is not efficient to work in that way. The last thing to mention here, actually, is that we want to encourage contributions. Backstage is an open platform. We want people to contribute plugins and plugin modules to that platform. If it's hard to build plugins and hard to build plugin modules, then that's discouraging people from doing that.

How do we start encouraging them? The first way is, I think of it like a tool library. If you haven't encountered one of these in the wild, outside the world of software, a tool library is a place where you can go and borrow the tools that you need to do a task in the real world. I'm not talking about software here, I'm talking about actual saws and drills and things. This means you don't have to own every single tool that you need to do DIY things at home, you can just go and borrow them. You don't have to figure out which one's the best one, you just take the one that they've got, use it. When you're done with it, you bring it back.

This analogy that I'm trying to draw here is between a tool library in the physical world, and the Backstage tool library, which is something we've built, which is a collection of core services in the backend and the frontend that provides you with a bunch of capabilities that you can leverage to do things more efficiently. To get stuff done. To bake in some sensible decision so you can just get productive quickly building your plugin. I'm not going to go through this whole list. Just to give an example, I think database management is a really cool system in Backstage, or as cool as a database management system can be. The way that it works is a plugin owner gets a connection to a database. They don't know anything about where that database has come from, it's just been configured by the adopter of Backstage.

Actually, behind the scenes, adopters can configure a single database and share that connection with all different plugins, and they all get access to a database within that single database instance. Or, each plugin can have its own database. That system of configuration and separation of databases is entirely abstracted away from plugin owners. All they have to do is just get the database connection and use it, and job done. That's the tool library. That's the first way that we are making life easier for people building extensions or people using them.

The second way is more focused on adopters. This screenshot is showing to an extent the old world of Backstage, the way things used to be when you were adopting plugins as an adopter. You'll see a lot of documentation pages like this if you look on the Backstage docs today. What you've got here is like that plug wiring analogy that I talked about. You have to pull in specific lines of code to your backend instance, and put them in in the right place to add the middleware and make everything work.

When plugins do different things, you have to put subtly different lines of code in. When things change, you have to adapt your code to address those changes. That's a lot of work. It's like wearing a plug every time. Over the last year, we've been working towards a solution, in particular, the maintainers have been working towards a solution called declarative integration, which takes away the need for this manual wiring, and instead makes it possible to install a plugin or a plugin module just by adding the package via yarn, which is the standard package manager for JavaScript code bases.

This is a solution that's in alpha right now, particularly immature in the frontend, but also in the backend, it's still pretty new. We're not recommending folks migrate to this yet. It's under active development, and it's really adding a lot of value in the ecosystem. We're going to show you a little bit more about that. I'll be doing a bit more of that during the demo.

What's next? I want to talk a bit about the mental model for Backstage, because it's not as simple as core frameworks, and then some plugins that sit on top of the core frameworks. I think what's so powerful about this extensibility model, and what I've thought was so cool, since I started working on Backstage is this idea of nested extensibility. We don't just have a core framework that plugins sit on top of and can extend. Instead, we've built a system that allows the plugin owners individually to allow other people to extend their plugins with plugin modules.

An example of the way this is really powerful is if you have a generic plugin, which is providing some shared functionality in Backstage, you can have plugin modules, which offer the direct connections to specific upstream APIs. For example, you might have a system that pulls in org data to the Backstage catalog that Pia mentioned earlier, and then a plugin module that knows how to fetch that org data specifically from an LDAP server. Adopters can write their own ones of those to pull org data from their own custom org data providers. We can provide additional ones in open source to support whatever integrations people need to use.

A key concept with all of this is the notion of importable points of extension. I'll show some code for this on the next slide, particularly to cover the importable bit. I want to just cover this example to talk about how extensibility works in this nested model. You'll see on the left-hand side, we've got the core framework, that foundational bit from the previous diagram, and just an example core service, the HTTP router in the backend, which is the thing that routes requests coming from the frontend to different parts of the backend, different middlewares. We've got some arrows here, that's pointing out the fact that the core framework is providing an HTTP router as a point of extensibility. The catalog plugin in this case, is extending that point of extensibility with a specific middleware. All of this is happening between the plugin and the framework without any interaction from an adopter.

That's the bottom layer, going to the middle layer of plugins. That same thing is replicated between plugins and plugin modules. An individual plugin like the catalog can export an extension point. In this case, the example is the catalog processing extension point which controls how entities are processed, as they come in from sources of data. Plugin modules can add additional ones of those via that extension point. This extensibility is nested. That's where I think the power really comes from. I said I drew some code that corresponds to this. Same topic, importable points of extension.

This is heavily abbreviated and simplified code. What's happening is we're importing the core services extension points from the core framework in the top section of code. We're grabbing the HTTP router, and we're adding our plugins middleware to that router. By the same token, in the bottom slice of code, we're importing the catalog processing extension point from the catalog plugin. Then we're using that to add an entity provider which provides us with the entity data that we need.

Extensibility Examples in Backstage (Access Control)

I'm going to cover just one example of extensibility in Backstage today. This is a concrete example of how extensibility works, in one particular case. The case I want to talk about is access control, which is a relatively new system in Backstage. Access control or authorization. You might have heard or seen this notion before, that access control as a concept is the product of decision making and enforcement. Decision making is whether a given user can perform a given operation or access a given resource.

Enforcement is ensuring that the system operates within that constraint, the constraint represented by the decision. If you have those two things, you're done with access control. That's the whole access control job done. When we think about that model, how can we find the right point to introduce extensibility? I think the first thing to think about is, who's responsible for each of those bits? In this case, I'm asserting that the individual plugins are responsible for enforcement. Because Backstage is so extensible and so generic, and we're not in any control of how plugins manage their resources or operations, that enforcement part has to rest with plugin owners. They have to decide how to enforce the access control restrictions that exist. Conversely, the decision rests with Backstage adopters who own Backstage instances, because they may have very different requirements, from instance to instance, about how access control is managed.

Some adopters might have strict regulatory requirements that limit which users can see which entities. Conversely, other adopters might have much more transparent cultures where they want everyone to see everything. Then the framework there is just making the point that the Backstage framework is stitching that all together. Given that model, and I think you probably see what's coming where the extensibility lies, is with the decision. We want the enforcement to be the same every time, every plugin. We want plugin owners to implement that enforcement consistently and in a single way inside their plugin. Decision making, we want to defer entirely to adopters. In the case of access control, we've introduced one point of extensibility, where you can replace what we call the policy with arbitrary code that decides what decision to make in any given circumstance, or with a configurable solution that lets you manage the decisions that you want to make, case by case, in the UI.

Demo (Extensibility)

Let's do a demo. What I want to show you is what the process looks like for adding and configuring an extension in both the Backstage backend and frontend. Here, we've got the Backstage backend running. Doesn't matter what's on the screen right now. The Node.js backend is running in this terminal. In this terminal, the Backstage frontend is running. I've got the frontend and backend running locally. You can see here, this is the Backstage instance running with that frontend and backend.

This is a generic instance of Backstage. It's pretty close to what you'd get if you scaffolded a brand-new Backstage app from scratch. The only caveat is I have replaced the standard backend and frontend systems which require that manual wiring still with the new backend and frontend systems which provide that declarative integration solution. How do I add features and capabilities to this Backstage instance, in that model? You'll see, we have a bunch of different kinds of resource available here. I can browse around and see the different ones.

There's a guest user, and there's me. We can see some groups and all that, and so on. We haven't got any resources in the system. Let's say I want to add some of the resources that I have in my software ecosystem. The resources that I want to add are planet resources, because my company is somehow in charge of planets, or partly because I was sitting in a room with a planets poster when I was working on this and got inspired. We've got an API running here, which provides me information about the planets in the solar system. You can see we've got planet names and we've got a planet image as well. We have this running as an API too, so that's available to query over here in this terminal window. If I want to add those resources to my Backstage catalog, there's actually only one step that I have to do because I already have a module written, which provides an entity provider in the right extension point to load those entities into the catalog.

If I switch to the backend package, in my Backstage instance, and I add the package name, which I believe is catalog-backend-module-planets. There we go. Let's stay in the terminal for now. You'll see that the backend is actually already restarted. You'll see that since it restarted, it's now discovered a new backend module, and you'll see that it's now got this log line refreshing planet resources. I'm leveraging the scheduling system built into Backstage in the core services to run this every 5 seconds. Now I'm refreshing the planet resources from that API every 5 seconds, and persisting them into the catalog.

Let me go back to the browser, and have a look at this tab. If we refresh and look at the kind, we'll now see there's resource kind, and we have all of the planets showing up here. I can click into one of them and see the details. Now let's say I want to add some frontend functionalities to this too. I'm going to do the same thing. If I go over to my terminal again, and switch to the app package, and add the frontend planets module. This time, we're going to have to restart the frontend to catch those changes. You see, it's still refreshing those plugins for us in the backend.

Once I restart this, and add one piece of config, this piece of config here to control the order of the entity cards that appears on the screen. Once that's done, we should be able to browse to one of these planets, maybe refresh to catch that config change. You see the planet image appears as a new card. The thing that I want to highlight here is that there's no code changes necessary apart from that configuration change to my Backstage instance at all. All I've done is add modules and they've stitched in automatically, provided their functionality to Backstage with none of that wiring required. I just plugged in the plug to the socket. We're hoping that's going to be stable in the next year or so, and people are going to be able to benefit from that consistently when they adopt Backstage.

Extensibility Value Proposition

What's the effect of all this extensibility? What benefits do we get from extensibility? Firstly, focusing on Spotify. Because all the user facing functionality in Backstage, and that's including core things like the catalog, are built as plugins, it's really easy to parallelize work. Teams can work on features independently without having to coordinate or collaborate to get things done, they can just work on their features without having to talk to each other. They can also have distributed ownership of those features. The cataloging team can work independently from the scaffolder team, and folks building new functionality on top of it can also work totally independently. That distributed ownership model is really powerful for allowing us to match our investments in different areas to the level of importance of that part of the system.

The other thing that we get from this is also consistency. Because everything is built on this Backstage foundation, expertise is transferable between plugins. If a person moves from team to team, they can easily contribute more quickly because it's a Backstage plugin still. What about outside Spotify? Firstly, all that same stuff. These things are true both inside and outside of Spotify. It's still easy to parallelize both in open source and in other adopters' instances of Backstage, those benefits for minimizing coordination and transferable expertise, distributed ownership, that's all still true. The bonus that we get in the world outside of Spotify, is that the tech stack, the standards, the choices that we've made about how Backstage fits together at Spotify doesn't have to be mirrored at other organizations in order for Backstage to be valuable.

They can pick the plugins that they want to use, or even build their own, to compose the perfect developer platform for their own needs. That's very different from the one where we built a fixed Backstage at Spotify, and then tried to get everyone to use it, because in that situation, you have to convince everyone that they should work exactly like you work.

Key Takeaways

I want to pick out some key takeaways from this. These are technical takeaways that seem important to me when we're thinking about how to build extensibility models into other software. The first is, when we're reducing repetition in systems like these, we should be reducing it by persona, so thinking about who is writing what code. When I think about Backstage specifically, I think about moving code from adopter instances into plugins, and moving code from plugins into the framework. Because the framework you write once, plugins get written plugin by plugin. Adopters have the most instances, the most numerous in that group. The more we can push things up into plugins, and then into the framework, the more the overall repetition is reduced. The other thing is, use the framework that you're building.

An extensibility solution where you have some core systems that aren't built with the extensibility model, and then you're trying to extend those things with a set of extensions that have different capabilities. It's much harder to get that extensibility model right because you're not leveraging it, especially in your core team. It's just getting used separately by this separate group of people. Conversely, if you build your core systems with that extensibility model, you guarantee that extensibility model is powerful and fit for purpose. Nested extensibility is just such a powerful concept for us. I really think it can apply elsewhere too. Making sure that you can have extensions that are themselves extensible is so powerful for making sure that you're enabling the maximum amount of flexibility in your system.

The ROI of Backstage

Nilsson: Just some finishing words on the ROI of Backstage. We got this question a lot since 2020, when we decided to open source. It's a very important and good question. As I mentioned, at the beginning, we were measuring our own developer productivity through number of days it took until the 10th pull request, which is not a fantastic metric. It lacks a lot of nuance. We stole it from Meta, and figured it will do. Now we have evolved a bit. These are some of the metrics we are measuring developer productivity with at Spotify.

One should read these numbers like this. At Spotify, we have 100% adoption of Backstage, so we divided this into 50% of the more frequent users, and the other one who are less frequent. These numbers are for the more frequent users, the 50%. Some of them, of course, aren't using Backstage all that much. This cohort is still 2.3 times more active in our GitHub repo. They have a 17% shorter cycle time. I think they create code, twice as many changes. The list goes on. You can read more about it on our blog. We published all of our metrics there. They're also, we want to believe, a little bit happier. The last one here, they are 5% more likely to stay with us. Going back to the beginning here, I was pretty unhappy when I joined Spotify, and realized I had such a difficult time adding value.

That makes me also feel that we're doing something good for the engineering community at Spotify, and for the world, that they actually want to stick around a little longer if they use Backstage. This is open source. We're not just standing here talking about something that we happen to have, and nobody else. This is very accessible to all of you. You can go and use it today. One thing I think that I want to leave you with is, if you do recognize these challenges that we were having with scale and speed, and scale slowing you down, these are universal problems. Let's solve them globally. Why not join the open community where you have thousands of other adopters that have similar challenges that your organization may have? I think that's going to speed all of us up.

Resources

If you want to know more, we have these webinars, biyearly, where we release a bunch of stuff, new products, as we have a commercial leg to Backstage as well, next to the open source, of course. Check it out if you're interested.

See more presentations with transcripts

Recorded at:

Sep 05, 2024

Pia Nilsson
Director of Engineering and Platform Developer Experience
Mike Lewis

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?