Logz.io released their annual survey and analysis of the DevOps industry with the spotlight this year on observability. The key findings include that DevOps and observability tool sprawl are becoming an issue, and that complex architectures present a key challenge when implementing an observability solution. In the next year, they predict greater investment in observability, with a focus on distributed tracing.
Within the survey, participants were asked to define observability. The majority of respondents view observability as using logs, metrics, and traces together. Around 30% of respondents believe that observability is a measure of how well the system's state can be determined from output data. This corresponds with how Tyler Treat, managing partner at Real Kinetic, thinks of observability as "the ability to interrogate our systems after the fact in a data-rich, high-fidelity way."
The survey found that 63% of respondents are using more than one observability tool, with 14% using five or more tools. Within that tool set, log management and analysis tooling is the most popular with 73% of respondents indicating they leverage those types of tools. 40% of respondents indicated that they leverage infrastructure monitoring and alerting tooling. However, 66% of respondents are not yet using any form of tracing tool to augment their observability data.
The use of these different toolsets aligns to the "three pillars of observability": metrics, logging, and distributed tracing. However, Ben Sigelman, CEO at LightStep, warns that a "common mistake that engineers can make is to look at each of the pillars in isolation, and then simply evaluating and implementing individual solutions to address each pillar."
Within these tool sets, the majority of respondents preferred open-source observability stacks. The ELK stack was the most popular logging tool, Grafana was the most popular metrics tools, and for the respondents who are leveraging tracing, Jaeger was the tool of choice. The survey found that machine learning solutions are gaining momentum with 40% of respondents currently using or considering ML solutions in the upcoming year.
While 41% of respondents indicate they are using some form of serverless architecture, 48% of respondents claimed that serverless was the main technology obstacle to implementing their observability strategy. This agrees with comments from Treat on how microservice architectures can cause a shift in where the complexity of the system arises:
While monoliths can have internal complexity, microservices bring that complexity to the surface. It shifts from being code complexity to operations and architecture complexity, and that’s a whole different type of challenge.
With the maturation of DevOps practices and methodologies, more and more teams have reported increased adoption of microservices, serverless, and container technologies. As the Logz.io team notes:
These numbers indicate that while engineering teams continue to adopt new technologies, these technologies also obstruct visibility into system performance, contributing to a lack of observability. 58% of our respondents confirm this hypothesis, stating that the number one barrier to observability is complex architectures. This is due to the increasing complexity that technologies such as serverless and Kubernetes bring to modern architectures.
With this increase in adoption of DevOps practices, 64% reported that DevOps engineers are primarily responsible for achieving observability. Close to half now report that developers are starting to share this responsibility, with 39% indicating that operations teams are involved as well. However, it is challenging to conclude from these results if this approach aligns with what Treat calls a "culture of observability". Treat explains that a common mistake organizations make as they move towards a culture of observability is to "simply rename an Operations team to an Observability team. This is akin to renaming your operations engineers to DevOps engineers thinking it will flip some switch". This thought lines up with what Charity Majors, CEO of Honeycomb, shared in an interview with InfoQ:
Developers will be owning and operating their own services, and this is a good thing! Our roles as operational experts are to empower and educate and be force amplifiers. And to build the massive world class platforms they can use to build composable infrastructure stacks and pipelines.
From these results, the Logz.io team drew some predictions for the upcoming year. The team felt that companies, especially enterprise organizations, will begin to invest more in observability in the upcoming year, especially in the lagging field of distributed tracing. The other main conclusion is an "increased focus on reducing tool sprawl as vendors consolidate tools for monitoring, troubleshooting, and security." This meshes well with predictions made by Treat that the major monitoring players will look to implement specific capabilities such as "arbitrarily-wide structured events, high-cardinality dimensions without the need for indexes or schemas, and shared context propagated between services in the request path."
These conclusions are shared by the InfoQ editorial team in their 2019 Retrospective article. The team wrote that "next year will hopefully be the year of 'managing complexity.'" The introduction of patterns such as microservices and serverless have allowed for more scalable and better isolated solutions. As they note however, "our ability to comprehend the complex distributed systems we are now building -- along with the availability of related tooling -- has not kept pace with these developments".