InfoQ Homepage Monitoring Tools Content on InfoQ
-
BasisAI Open Source Boxkite Machine Learning Monitoring Tool
Boxkite is an open source instrumentation library designed to track concept drift in highly available model servers. It integrates with DevOps tools such as Grafana, Prometheus, fluentd and kubeflow, scaling horizontally to multiple replicas without needing changes to code or infrastructure. The project claims to be fast, correct and simple.
-
Artificial Intelligence for IT Operations: an Overview
Artificial intelligence for IT operations (AIOps) combines sophisticated methods from deep learning, data streaming processing, and domain knowledge to analyse infrastructure data from internal and external sources to automate operations and detect anomalies (unusual system behavior) before they impact the quality of service.
-
Grafana Labs Changes Licenses to AGPLv3 for Grafana, Loki, and Tempo
Grafana Labs has recently announced the plan to change the licenses for their core products. They will relicense Grafana, Grafana Loki, and Grafana Tempo from the Apache License 2.0 to the Affero General Public License (AGPL) v3. Plugins, agents, and certain libraries will remain Apache-licensed.
-
AWS Releases Health Aware Providing Automated Health Alerts for Accounts
AWS recently announced the release of AWS Health Aware (AHA), an incident management and communications framework. AHA is an automated notification tool that sends AWS Health Alerts to a variety of endpoints. AHA is able to integrate with AWS Organizations to provide aggregated alerts across all accounts within the organization.
-
PagerDuty Adds AWS DevOps Guru and Microsoft Teams Integrations
PagerDuty has released a number of new updates and enhancements to their incident response platform. This includes new integrations with Amazon DevOps Guru, AWS Control Tower, and Microsoft Teams. Other improvements include improvements to mapping failures back to changes, automatic triggers, and content-based alert grouping.
-
OpenTelemetry Announces Roadmap for Metrics Specification
The OpenTelemetry project announced its roadmap for its metrics specification. The roadmap includes a stable metrics API/SDK, metrics data model and protocol, and compatibility with Prometheus.
-
Lightstep Connects Tracing and Metrics with New Change Intelligence Feature
Lightstep has released a number of improvements to their observability platform. These include native support for OpenTelemetry metrics, a new underlying time series database, and Change Intelligence, a new feature that looks to connect unusual patterns with impacting changes by bringing together system metrics and trace data.
-
Grafana Labs Announces Updates to Its Grafana Cloud with a Free Tier, New Pricing and Features
Grafana Cloud is a fully-managed observability platform from Grafana Labs for applications and infrastructure. The company recently announced a new version for Grafana Cloud, including a free tier version, a different pricing structure, and several significant new features such as enhanced alerting and synthetic monitoring.
-
AWS Introduces Amazon Managed Service for Grafana and Amazon Managed Service for Prometheus
In one of the latest announcements of re:Invent 2020, AWS introduced the preview of Amazon Managed Service for Grafana, a managed Grafana that automatically scales compute and database infrastructure, with automated version updates and security patching. AWS also introduced a preview for Amazon Managed Service for Prometheus.
-
AWS Announces Gateway Load Balancer
AWS Gateway Load Balancer is a new fully-managed network gateway and load balancer. The service is tailored to deploy, scale and manage third-party virtual appliances such as firewalls, intrusion detection, prevention systems and deep packet inspection systems in the cloud.
-
Amazon Cloudwatch Dashboards Supports Sharing
AWS recently introduced the ability to share Amazon CloudWatch Dashboards with users who do not have access to the AWS account. This feature opens up new use cases for dashboards, including sharing metrics and information on big screens, or embedding real-time information in public pages.
-
Netflix Presents Telltale, an Application Health Monitoring Tool
The Netflix Engineering team recently blogged about Telltale, a monitoring and alerting tool that utilizes a variety of data sources to learn the typical health of an application. Telltale shows only the relevant data from application. There's also information about important events, such as nearby deployments and regional traffic evacuations.
-
Brenda - an Artificial Intelligence Team Member
Brenda uses artificial intelligence with machine learning to monitor the infrastructure, do quality assurance checks and support troubleshooting, handle alerts and communicate critical issues, and apply auto-healing. Sree Rama Murthy Pakkala and Collin Mendons from Swisscom will talk about an AI/ML framework named Brenda, who helps their teams to increase quality at Swiss Testing Day 2020.
-
Periskop: SoundCloud's Exception Monitoring Service
SoundCloud's engineering team wrote about their exception monitoring software called Periskop, which collects and aggregates exceptions across servers and reports to a central server for analysis.
-
Logz.io Survey Finds Tool Sprawl and Complex Architecture Key Challenges for Observability
Logz.io released their annual survey of the DevOps industry with the spotlight this year on observability. The key findings include that DevOps and observability tool sprawl is becoming an issue and complex architectures present the key challenge in implementing an observability solution. In the next year, they predict greater investment in observability with a focus on distributed tracing.