Logz.io recently announced the addition of Prometheus-as-a-Service to their infrastructure monitoring product. The service incorporates the metrics collection of Prometheus with the Logz.io platform that includes Grafana, ELK, and, also added recently, Jaeger. The data correlation features included within Logz.io allow for connecting metrics, traces, and logs all within a single platform.
Prometheus is an open-source monitoring solution that records real-time metrics into a time series database. It was the second hosted project to join the Cloud Native Computing Foundation in 2016 after Kubernetes. Prometheus's data model stores time series data identified by a metric name and key/value pairs. It includes PromQL as a query language to access and work with this data.
The Prometheus-as-a-Service provides a remote location for storing metrics collected by Prometheus. According to Logz.io, they are leveraging M3 as their long term storage solution. For current Prometheus users, it is possible to switch to Logz.io's service by augmenting the configuration file for each Prometheus service:
remote_write:
- url: http://54.209.186.182:8050
bearer_token: ENfcuKmmnXPtHUPVYhYL
This leverages Prometheus's remote write capability to direct logs to Logz.io's managed service where they will be stored for 18 months by default. The url
and bearer_token
are accessible from the Logz.io account. There is no need to adjust Prometheus autodiscovery or scraping implementations.
Dotan Horovits, product evangelist at Logz.io, noted in a recent InfoQ article that "scalability is Prometheus’s achilles heel. Current monitoring needs a clustered solution that can hold historical data long term, without sacrificing data granularity with aggressive downsampling." Horovits follows this point by indicating that time series databases (TSDB) can be leveraged to overcome this challenge by providing long-term storage. In a similar fashion to the Logz.io managed service, the remote write feature could be used to write to a self-hosted TSDB as long-term storage. Many open-source TSDBs are available including CNCF incubation projects Cortex and Thanos and the previously mentioned M3.
Logz.io has also announced the addition of distributed tracing to their observability platform. This feature is based on Jaeger and, in the same vein as the Prometheus announcement, provides a hosting service for tracing data. Jaeger is an open-source distributed tracing tool and a graduate of the CNCF.
Similar to their Prometheus-as-service, the applications need to first be instrumented to expose and ship the relevant trace data. Logz.io has support for most common tracing protocols including Jaeger, Zipkin, OpenTracing, OpenCensus, and OpenTelemetry.
Once the data is within Logz.io, it can be viewed via Grafana or the Jaeger UI. Logz.io has extended both products to provide additional features. Alerts can be configured using multiple trigger conditions using both metrics and logging data. The data correlation feature allows for quick investigation of logs and traces associated with metrics. This is accessed via the 'Explore in Kibana' button from within the Grafana and Jaeger dashboards.
This allows for viewing the traces as logs within Kibana along with additional logs not displayed within Jaeger. This works by searching for logs with the relevant trace ID within the log management service in Logz.io. By porting the traces into log management, Kibana's visualizations can be leveraged to further explore the data.
Role-based access can also be achieved through Logz.io's sub-accounts feature. The main account is able to create and define additional sub-accounts. Each sub-account can be assigned its own access controls, data volumes, and retention periods.
Both distributed tracing and infrastructure monitoring are available as part of Logz.io's platform. More information can be found on the Logz.io website.