Pros and Cons of a Distributed System
Hausenblas: Welcome to Profiles, the Missing Pillar: Continuous Profiling In Practice. What are we talking about? If you have a distributed system, you might have derived it from a monolith, breaking up the monolith into a series of microservices that are communicating with each other and the external world, then you have a couple of advantages. For example, feature velocity, so you can iterate faster, because, independently, the teams can work on different microservices and are not blocked by each other. You can also use the program language or a datastore that's best suited for your microservice, so you have a polyglot system in general. You also have partial high availability for the overall app just because a part, for example, the shopping basket microservice is not available, your customers can still browse and put things into the search.
However, there are also a number of limitations or a number of downsides to microservices, be that containerized or not. That is, you're not dealing with a distributed system. Meaning that more often than not, you're having network calls between the different microservices, complexity increases. Most importantly, for our conversation, observability is not an optional thing anymore. If you are not paying attention to observability, you are very likely to fly blind. We want to avoid that.
What Is Observability?
What is observability? In this context, I will define observability as the capability to continuously generate and discover actionable insights based on certain signals from the system under observation. It might be for example a Kubernetes cluster or a bunch of Lambda functions. This system under observation emits certain signals. The goal is that either a human or a piece of software consumes those signals to influence this system under observation. On the right-hand side, you can see this typical way, the telemetry part from the different sources, typically through agents. Signals land in various destinations, where they are consumed by humans and/or software.
Signal Types
Talking about different signal types, we all know about logs. Textual payload usually is consumed by humans, properly indexed. Then there are metrics which are numerical signals, and it can capture things like system health. Also, already widely used. Probably a little bit less widely used in logs and/or metrics are distributed traces that are all about propagating an execution context along a request path in a distributed system.
Profiles
That's it? No, it turns out, there are indeed more than three signal types. Let's get to profiles and continuous profiling. We'll work our way up from the very basics to how certain challenges in continuous profiling is solved, to concrete systems specifically with the open source aspect of it. Profiles really represent certain aspects of the code execution. It's really always in the context of code, you have some source code and you have some running process, and you want to map and explore these two different representations. The source code has all the context in there, what is the function call? What are the parameters? What are return parameters? The process has all the runtime aspects, so resource usage. How much does a process spend on CPU, or how much memory does it consume? What about the I/O? There are instructions on the application level, and then there are also operating system related, so-called syscalls usually. The function calls of interest, what we're interested here are typically the frequency. How often a certain function is called, and the duration, so how much time is spent in a certain function.
Representing Profiles
That was a little bit abstract, maybe, so let's have a look at a concrete example. I use Go here. It doesn't have a full implementation, but you get the idea. You have the main function that has two functions called shortTask and longTask. One way to represent that in a very compact manner would be what is shown on the right-hand side, main;shortTask 10, longTask 90, meaning that 10 units are spent in shortTask, 90 units are spent in longTask. If you've ever heard anything about profiles, continuous profiling, you've probably come across the term flame graphs that's something Brendan Gregg has coined. That's essentially the idea to show that execution as stacked bars, where the width essentially represents something like, in this case, the time spent in a function. It gives you a very quick way to get an idea how your program is behaving, and you can also drill down. There is, in our context, another version of that called Icicle graphs. The only difference is flame graphs grow, like flames do, from the bottom up, whereas Icicle graphs grow from the top, but otherwise are interchangeable.
How to Acquire Profiles
The first question really is, how do we get to those profiles? Because remember, it's about the running process on a certain machine. One set of approaches is around program language specific profiler. There are profilers in various languages, here are a few examples. For example, in Go, you have pprof there. That is something that people have been utilizing already nowadays, in a rather static one-off session. Using pprof to capture the profiles as a one-off. A different approach around the acquisition of profiles is based on eBPF. eBPF is effectively a Linux feature. It's a Linux syscall that essentially implements an in-kernel virtual machine that allows you to write programs in user space, and execute them in the kernel space, extending the kernel functionality. This is something that can, for example, be used for observability reasons or observability use cases, in our case, to capture certain profiles.
Sampling Profiling
What we will focus on the rest of the presentation is really what is called sampling profiling. This is a way to get continuously profiles from a set of processes running on a machine through periodically capturing the call stack that could be tens of hundreds of times per second, with a very low overhead. You might be talking about one, two, three percentage in terms of the overall usage. Maybe some megabytes there. In general, the idea is you can do that all the time. You can do it in production. It's not something that you can only do in development, although it has its use cases there. It's something that is always on because it has such low overhead.
Motivation and Use Cases
You might be asking yourself, why are we doing that? What's there that the logs and the metrics and the traces cannot already answer? It turns out that although with metrics, for example, you can very easily capture how the latency overall for a service is, and evolves over time, under load, for example. If you ask yourself, where do I need to go into the code to tweak something to make it even faster? Or there might be a memory leak, or you might have changed something in your code, and you're asking yourself, did that change make my code faster or not? Then, ultimately, what you need is to be able to go to a specific line of code. Metrics are great. They give you this global view, global in this context might be a microservice, but it doesn't really tell you, it's on line 23, in main.code. There are a couple of things that continuous profiling enables us to do in combination with the other signal types. That's something I really like to highlight. It's not about replacing all the other signal types, and replacing all the other things. It's one useful tool in the toolbox.
Evolution of Continuous Profiling
Really, if we step back a bit, the continuous profiling field is not something that's just been around for a couple of weeks or a month, but really has been around for more than a decade. In fact, there is a very nice seminal paper in IEEE Micro Journal, 2010, by a bunch of Googlers, with the title, "Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers." Where they really make the case for, yes, you can in production continuously capture these profiles and make them available. As you can see, from the figure here from the paper, some of those things that symbolize a MapReduce fashion, gives it away from what period of time that is. Overall, they made that point in an academic context already in 2010. You can imagine that that has been internally presumably for some time in use already.
In the past 10 years or so, we have seen that a number of cloud providers, observability providers have these typically managed offerings. Not very surprisingly we had Google Cloud, where nowadays it's the Google Cloud operations suite. It has the Cloud Profiler. Datadog has the Continuous Profiler. Dynatrace has the Code Profiler. Splunk has the AlwaysOn Profiler. Amazon has the CodeGuru Profiler. The few snapshots here on the right-hand side give you an idea, what you will find there over again, these Icicle graphs, or flame graphs, in some cases. That has been around for quite some time. The APM vendors, the application performance monitoring vendors have been using that, and the name gives it already away a bit, in a very specific context and very specific domain, and that is performance engineering. You want to know how your application is doing. You want to improve latency. You want to improve resource usage. Those kinds of tools would allow you to do that.
Conceptual CP Architecture
One thing has changed in the last couple of years, and that is that number of things came together in this cloud native environment. Now we have a whole new set of open source based solutions in that space. I'm going to walk you first through a conceptual, like a very abstracted architecture, and then through a couple of concrete examples. What you have there, and obviously, given that this is a conceptual architecture, it's very broad and vague. Starting on the left-hand side, you can instrument your app directly and expose certain profiles, or you have an agent that for whatever method gets the profiles for you. Then you have some ingestion, typically takes profiles, cleans them up, normalizes them, whatever. Profiles are really at the core of the solution, but you always need the symbolization, so effectively mapping these machine addresses to human readable labels, function names. You have typically one part that's concerned with the frontend. It allows you to visualize things in a browser. Some storage, where oftentimes the current stage you have mostly in-memory storage, or not necessarily long term, not permanent storage. The projects are working on that. Keep that in mind when we review the concrete examples.
CP Challenges
There are a couple of challenges in the context of continuous profiling. All the projects that I'm going to walk you through in a moment, go about that slightly differently. There are some overlaps. More or less, if you want to get into the business of continuous profiling project, either contributing or writing your own, you should be aware of those. One of the areas is the symbolization. You have to have a strategy in place that allows you to get those and map those symbols from your runtime environment, from your production environment, where very often you don't have these for security and other reasons. You have this debug information in a running process. You have different environments. You might have Linux and Windows. You might have Java virtual machines. Coming up with a generic way to represent, map all these symbols is a very big task. Different projects go about it differently.
Then, because it is a continuous process, you're capturing, and obviously, to store and index a huge volume of those profiles, which means you want to be very effective with how you store them. There are a number of ways to go about. One thing that you find very often is the XOR compression initially was published about and suggested by Facebook engineers. It's widely used and helps for these time-series data to essentially exploit the fact that between different samples, not much is changing. The XOR essentially in there, makes it so that the footprint is really small. Column storage is another thing that you find very often in these open source solutions. Another challenge is that, once indexed, you want to offer your users, the human sitting in front of the CP tool, a very powerful way to query, to say, "I'm interested in this process, on this machine. I'm interested in CPU, give me this time range." That means you need to support very expressive queries. Last but not least, and the first three there's already enough evidence there in the open, there could be more done in the correlation space.
Open-Source CP Solutions - Parca
Let's have a look at three open-source continuous profiling solutions that you can right away, go and download and install and run, try out yourself. The first one up is Parca, which is a simple but powerful CP tool. You can either use in a single node Linux environment or in a Kubernetes cluster. It's sponsored by PolarSignals. It's a very inclusive community. They are on Discord. They have biweekly community meetings, very transparent, and can encourage you to contribute there. I consider myself a casual contributor. I really like how they go about things. [inaudible 00:18:42] really is eBPF there, so the agent uses CO-RE BTF, which means you need a certain kernel level or a certain kernel version. In general, going forward, as we all move up, this will be less of a challenge.
The overall design was really heavily inspired by Prometheus, which doesn't really come as a big surprise, knowing where the founder is, and the main senior engineers come from Red Hat. They've been working in this Prometheus ecosystem for quite some time. If you look at things like service discovery, targets, labels, query language, very much inspired by Prometheus, which also means if you already know Prometheus, then you probably will have very quick on-ramp for Parca. It uses pprof, the Go language defined pprof format to ingest the profiles. It has some really nice dogfooding so it exposes all four signal types, including profiles, so it can monitor and profile Parca itself. It also has very nice security related considerations in terms of how the builds are done, so that you know exactly what's in there, because the agents running in production need quite some capabilities to run to do their work.
On a high level, the Parca agent more or less uses eBPF to capture the profiles in cgroups and exposes them in pprof format to the Parca server. You can obviously also instrument yourself, which usually is Go, with pprof. I believe Rust supports it now as well. As you can see, the service discovery pretty much looks very familiar from Prometheus, and you have the UI and you can immediately let it run for a few moments. You can immediately have a look at that. This is just a screenshot of the demo instance. You can go to demo.parca.dev and have a look at that yourself, slice and dice and see what you can get out of that.
Pyroscope
Pyroscope is a very rich CP tool that comes with broad platform support. It's sponsored by Pyroscope Incorporated. On the client side, it depends, in general, on program language specific profiles, but also supports eBPF. You see that the table here that summarizes these two aspects, the broad platform support and the various languages or environments quite nicely. It comes with a dedicated query language called FlameQL, which is different to what you would use in Parca, but easy enough to learn. It has a very interesting and efficient story design. Also comes with the Grafana plugin. As you can see from the overall setup, roughly comparable with what we had early on with Parca. Also, very straightforward to set up. From the UI perspective, a screenshot from their demo site, demo.pyroscope.io. Again, encourage you to go there and try it out for yourself. You see these interactions of the actual research usage over time. When you click on a certain bar there, a slice where you can actually see the rest of this profile. On the left-hand side you see a tabular version of that. Again, very easy to use, very powerful.
CNCF Pixie
Last but not least in our open-source examples, is CNCF Pixie. CNCF, the Cloud Native Computing Foundation. What happened was that New Relic, the well-known observability provider donated or contributed this project to CNCF. This is now a sandbox project. It also uses eBPF for data collection. It goes beyond profiles, it does a number of things there full body requests, resource, network metrics, profiles, even databases. On the right-hand side, you see in this column, the protocols that are supported there. You can really, beyond compute, get a nice idea on how the request flow. What is going on? Where is time spent? It's not directly comparable to Parca and/or Pyroscope in the sense of that it positions itself really as a Kubernetes observability tool, not purely for CP, not purely for Linux, but really for Kubernetes. All the data is stored in cluster in memory, meaning that, essentially, you're looking at the last hour or whatever, not weeks or days. It comes with PxL, a DSL that's Pythonic. It's both used for scripting and query language. There are a number of examples that come along with that. Pretty easy to read, less easy to write, at least for me. Comes with a little bit of complexity there. At least you should be able to very easily understand what these scripts are doing. It can also integrate with Slack and Grafana. Again, encourage you to have a closer look at that. As you can see from the architecture as well, it's definitely wider in terms of scope compared to the previous two examples that we had a look at. Profiles in Pixie are really just one signal and one resource that it focuses on. You can also use it for that with zero effort.
The Road Ahead - eBPF as the Unified Collection Method
Let's switch gears a little bit and look at the road ahead. What can we expect in that space? What are some areas that you might want to pay attention to? One prediction in this space is that eBPF seems to become the standard, the unified collection method for all kinds of profiles. It really has wide support. Parca, Pyroscope, and Pixie already support eBPF. More compute is eBPF enabled, so from low level, from Linux machines to Kubernetes. It essentially means that it enables zero-effort profile acquisition. You don't need to instrument, you don't need to do anything, you just let the agent run and use eBPF to capture all these profiles. It's also easier every day to write eBPF programs.
Emergence of Profiling Standard
In terms of profiling, this is something where I'm less sure. Currently, how the profiles are represented on the wire in memory, varies between the different projects. I suppose that a standard in this space would be very beneficial, would enable interoperability. You could consume the profiles from different agents. Potentially, pprof, which is already supported beyond the one programming language, Go, where it came from, could be that standard, in any case. I'm definitely keeping my eye on that.
OpenTelemetry Support
Furthermore, in terms of the telemetry part, so if you look at how to get the profiles from where they are produced, where the sources are, the machines, the containers, whatever, to the servers where they are stored, and indexed, and processed. Then obviously, OpenTelemetry being this up and coming standard for all kinds of signals would need to support the profiles. Currently, OpenTelemetry only supports traces, metrics, and logs in that order. Because traces first went GA, metrics are on the way to become GA now, and logs later this year. There is an OTEP in OpenTelemetry, enhancement proposal number 139. It definitely needs more activity, more eyes on it, more suggestions. My hunch would be that once the community is done with logs, which is during this year, in 2022, that after that, also with your support focuses its attention on the profiles in this place.
Correlation
Last but not least, correlation. Profiles are not useful or don't work in a vacuum. They work in combination with other signal types. Imagine this canonical example, you get paged. You look at a dashboard. You get an overall idea what is going on. You might use distributed tracing to figure out what of the services are impacted, or you need to have a closer look at, then you might use the same labels that you used early on to select certain services in your CP solution to have a closer look, what's going on. Maybe identify some error in your code or some leak or whatever. Another way to go about that if we focus our attention on frontends rather than purely the labels, which are usually coming from the Prometheus ecosystem. With frontends, that's a similar story. Think of Grafana as the unified interface, as the unified observability frontend. You will see that again, you can use these environments to very quickly and easily jump between different relevant signals. I wonder if there is maybe a desire for a specification if you think of Exemplars that allow to embed trace IDs in metrics, in the metrics format, that then can be used to correlate metrics with traces. Maybe something like that is necessary or desired in this space.
Summary
I hope I was successful in convincing you that there are more than three signal types. That profiles are these four important signal types that you should pay attention to. That continuous profiling, that CP is a useful tool in the toolbox. It's uniquely positioned to answer certain questions around code execution that can't really be answered with other signal types really. In the last two or three years, there is a number of open source tooling becoming available, in addition to the commercial offerings, which obviously are still here and improving. There is a wonderful website called Profilerpedia by Mark Hansen. I really encourage you to check that out. He's keeping track of all the good tooling and formats, and what-not, what's going on there. It's a great site to look up things. The necessary components to build your own solution or to use them together are now in place. You have eBPF. Symbolization has become more accessible. The profiles are there, although standard would be nice. All the storage groundwork has been done. This is the time. In 2022, this is the year continuous profiling goes mainstream.
Q&A
Abby Bangser [Track Host]: You say that profiling has been around for quite a while, but yet it's still not maybe widely adopted. There are still a lot of people who have not yet experienced bringing this into their tech stack. What would you say is a good place for people to get started?
Hausenblas: Just to make sure we are on the same page, the actual idea of using profiles for certain specific tasks, if you think about performance engineering, that practice has been around. If you follow people like Brendan Gregg, for example, you will see a huge body of work, tooling. So far, it was relatively hard for mainstream if you think about, here are the performance engineer figuring out how to use these tools, maybe not the easiest. You could have used Linux namespaces in cgroups, bundle something up yourself. Docker came along, and everyone can pipe Docker run, and Docker pull and so on. That's the same with continuous profiling now. Setting up Parca, or Pyroscope, or Pixie, and getting that UI and immediately being able to see those profiles and drill into it, has become now so straightforward. The barrier is so low that you really should have a look at that, and then figure it out. Identify certain maybe low-hanging fruits in your environment, if you're a developer. Have a look at your ecosystem, your programming language, if that's supported, which is very likely the case. Give it a try. I think it's easier to really try it out and see the benefits than any theoretical elaboration like the talk, or whatever. It's really an experience worth giving 10 minutes, 15 minutes of your time, and you will see the benefits immediately.
Bangser: Absolutely. It's always about being hands-on and actually getting the benefit in your own experience, in your own company and tech.You mentioned a few different open-source tooling here, but we've got a question that just came in around the Datadog profiler. Is that something you've worked with or have any experience to share?
Hausenblas: I personally have not. I've seen that with our customers. I know Datadog is an awesome platform, end-to-end, very deeply and tightly integrated from the agent collection side to the UI. I'm a huge fan, but I personally don't have hands-on experience with it, no, unfortunately.
Bangser: Sometimes there's just too many tools in the industry to experience with them all. What I found with many other telemetry tools, they've affected not just how we deal with production issues, but all the way down to local developers. In your experience, when teams do bring in continuous profiling, how does that impact the development lifecycle?
Hausenblas: I think, especially for developers, and if you look at the various open-source projects in terms of use cases, you will see they will lead with that. Think of, you're developing something in Python or Go, and you have a pretty good handle on what is the memory CPU usage, so you know, roughly, what is your program doing? You create something new. You fix a bug, or add a feature. With continuous profiling, you can now see, what did these three lines of code that you added there actually have an impact in terms of, it's now using double as much CPU or tripling the memory footprint. In a very simple manner, you can just look at the previous version, current version, you do a diff, and you see immediately, there's new three lines of code that I inserted there. You'll make so much difference. Definitely, in the development, it's absolutely beneficial.
Bangser: It seems like decreasing that cost of experiment is something which allows you to make the business case for even bigger changes and all that. Can you point towards any tooling for automated alerts when the performance decreases due to massive amounts of profiling data, or that came through that profiling data? Or is that something that always needs to be manually defined out of scope?
Hausenblas: That's an area of active development. At least the tooling that I know in the open source space is not there yet fully, like you would probably expect from, if you say, I'm going to put together a Grafana dashboard. Have an alert on that. Where you have these integrations, for example in Pyroscope, of course, you can benefit from the ecosystem. That's definitely a huge area of future work.
Bangser: One of the things you were speaking about with the different tools on the market, is that there are these different profiles, we don't yet have a standard for profiling across. Are there and what are the significant differences between the different profiles created when using eBPF versus the language specific tooling?
Hausenblas: My main argument would be that eBPF is agnostic. It doesn't really care if you have something interpreted, like if you have something like Python, or Java virtual machine and some Java dialect, or the language on top of it, or something like Go, or Rust, or C. You get, in addition to the program language specific things, the operating system level things as well. You get syscalls in the kernel, you get your function calls, all in one view. That's why I'm so bullish on eBPF, taking into account that we still have some way to go to enable eBPF and what is necessary there in terms of requirements that the compute environment needs to support to benefit from it.
Bangser: With the language specific ones, it's just you're not getting that connection to the kernel calls as well as your software quite at the same time.
Hausenblas: Right. The idea really here is that you want, in the same way that if you're using a SaaS, or a cloud provider, or whatever, you want to immediately be able to say, is that problem or whatever in my part in the code that I own, or is it in the operating system, the server's part, or whatever? If it's not in your part, then the best thing you can do is keep an eye on what the provider is doing. If it's in your part, you can immediately start trying to fix that.
Bangser: Honing in with those quick feedback loops. One of the things you mentioned was around as we start improving our versions that we're on, there's going to be less of these problems of trying to integrate this into our systems. There are still lots of very important systems on older software. Here's an example I'm being asked about a piece of software that's not containerized and written in Java 5, is there tools on the market that are supporting that architecture instead of the more containerized Java 8 style, or Java 8 plus?
Hausenblas: Maybe that wasn't very clear when I mentioned it, or maybe I gave the impression that all the things that I've presented here only work in the context of a containerized setup, if you have Kubernetes, or something on Docker. That's not the case. You can download, for example, the Parca server and run it as a binary directly in your environment. It's just that in the context of containers, in the context of Kubernetes, the projects make it easy to install it and you get a lot of these integration points. You get a lot of things for free. That doesn't mean that you can't use it for non-containerized environments, that you can't use it for monoliths, for example. A perfect use case, if you have something written in Java as a monolith, it's the same idea, or the same application. It's really just, you get in this distributed setup in the context of, for example, containerized microservices. Very often, there's the need to correlate, the need to use other signal types. For example, you probably need something like tracing initially to figure out which of the microservice it is, and then you can use profiles to drill down into one specific one. That's the main difference there.
Bangser: You mentioned you might use this in correlation with something like tracing. You trace to get to the right service, then you profile within that service. You were quite optimistic about the connection of continuous profiling data with other types of telemetry data. What do you think is the key missing piece at this point? There's lots of opportunities there. One thing you feel like would really change the landscape and bring a lot more people on board and into continuous profiling.
Hausenblas: I do believe we already see the beginning there. For example, the OTEP in OpenTelemetry, once logs are done for the end of this year, beginning of next year. If you have that end-to-end, from instrumentation where you actually emit certain signal types such as profiles, if your telemetry agent, OpenTelemetry Collector, for example, supports it, if and when the backends are enabled and provide these kinds of storing and querying the profiles properly. Again, it is early days. Yes, it's going mainstream. It is 2022, definitely a year where you can perfectly start with it. In terms of interoperability, in terms of correlation, there's certainly, I think, also more feedback from the community, from practitioners in general are required, like, what are the important bits? What should be prioritized?
This is really more like, up to you out there. You are using it, or maybe you want to use it. As you dig into that, you probably figure out limitations. That will then inform any standardization that doesn't have to be a formal body that could be something like the CNCF working group or whatever that looks at what exactly should we be doing there? Or it might be one of the open source projects that has an initiative around that, that says, let's establish a standard for that, such as we have with Exemplars in the case of metrics and spans or traces.
See more presentations with transcripts