The Apache Software Foundation has released version 10 of Apache SkyWalking, an open-source observability platform designed to provide comprehensive monitoring, tracing, and analytics for distributed systems.
New features and enhancements include:
- Service hierarchy, which defines relationships between logically similar services across different layers, allowing users to see inter-layer service connections and metrics summaries enhancing the visibility of complex service architectures.
- eBPF-based Kubernetes network traffic monitoring that uses eBPF to provide insights into service traffic, topology, and TCP/HTTP-level metrics.
- BanyanDB, a native storage solution designed to be the "next-generation" storage solution for medium-scale deployments. BanyanDB has shown potential performance improvements compared to Elasticsearch running at the same scale.
- Support for storing multiple label names in metrics data, allowing Metrics Query Expression (MQE) to query or calculate metrics with multiple label names.
- New monitoring dashboard for ApacheRocketMQ, ActiveMQ and Clickhouse.
- An enhanced gRPC metrics exporter.
SkyWalking aims to enhance the visibility into microservices architectures by collecting data from various sources like logs, metrics, and traces across different components in a system. The tool aims to enable users to understand the topology relationship between services and endpoints, detect API dependencies in a distributed environment, and provide service hierarchy relationships. It consists of four main core components: the agent, the Observability Analysis Platform (OAP) server, storage systems, and a web-based user interface (UI).
The agent is installed on each service instance and collects trace and metric data. This data includes details of each request, as well as metrics such as service response times, throughput, CPU, and memory usage. Optionally, the agent can also collect log data to provide additional context for trace and metric data. The collected data is then sent to the OAP server via HTTP or gRPC protocols. Agents are available for Java, Python, Go, NodeJS, and more.
Upon receiving the data, the OAP server processes it to aggregate and correlate traces, metrics, and logs. This process involves identifying the flow of requests and dependencies between services, summarizing metrics over specified intervals, and constructing a service topology map that visualizes service interactions. The processed data is then stored in the configured backend storage, such as BanyanDB, Elasticsearch, MySQL, or H2, where it can be queried and retrieved for visualization and analysis.
The UI provides various views and dashboards for users to interact with the collected data. These include the service topology, which displays the relationships and interactions between different services; the trace view, which shows detailed traces of individual requests, including latency and error information; and the metrics dashboard, which visualizes various service and system metrics over time.
SkyWalking's distributed tracing capabilities capture the entire lifecycle of requests across multiple services, while its performance monitoring features keep track of key performance indicators such as latency, throughput, and error rates. The platform also visualizes service dependencies to identify potential bottlenecks and failure points.