OpsRamp, a SaaS platform for datacenter operations management, announced its Fall 2019 release which includes a number of enhancements to its intelligent event management and correlation machine learning models. This release also includes multi-cloud infrastructure monitoring capabilities, synthetic monitoring, and a custom integration framework.
OpsRamp bills its platform, OpsQ, as a "service-centric artificial intelligence" platform with event management, alert correlation, and remediation functionality. This release extends that functionality by adding improved alert correlation features. Alerts can now be correlated not just on time, but also using specific alert attributes. This pattern recognition can links alerts on description, subject, resource hostname, resource type, resource group, and service group.
The addition of the Observed Mode widget builds on the summer release of Observed Mode. In most SaaS platforms leveraging machine learning models, the models themselves are presented as a black box. Observed Mode allows for previewing the inferences made by the alert models without correlating or deduplicating live production alerts. As OpsRamp's State of AIOps report found, "67% of respondents have concerns about the relevance and reliability of the insights delivered by AIOps tools." This coupled with the Inferences Stats widget provides greater insights into the effectiveness of the machine learning predictions for production environments.
InfoQ asked Deepak Jannu, director of product marketing at OpsRamp, to share more details about the release.
InfoQ: How does the fall release help with common operations problems such as alert fatigue?
Deepak Jannu: The Fall 2019 Release delivers superior alert pattern recognition by grouping incoming alerts that share similar properties so that IT teams can take prompt action on critical incidents occurring in their hybrid IT infrastructure. Alert pattern recognition helps reduce overall event noise and operator fatigue by ensuring that correlated alerts are more meaningful and impactful for incident management teams.
InfoQ: The Fall release includes improvements to your OpsQ Inference Models which are geared towards discovering related alerts. How does this deal with false negatives (missing a relation) and false positives (implying a relation when one doesn't exist)?
Jannu: OpsRamp's Alert Inference Models group similar alerts to reduce unnecessary noise, so that IT teams can focus on a single inference rather than analyze and troubleshoot individual alerts. Instead of just relying on machine learning algorithms to learn existing alert sequences and discover hidden connections, the OpsQ event management engine combines native instrumentation, resource topology context, event attributes (subject, alert metric, alert source, host name, IP address and device type) and data science techniques to cut down on large volumes of false alerts and quickly take action with the right event prioritization.
InfoQ: This release includes an Observed Mode widget. Can you walk us through a use-case for this feature?
Jannu: OpsQ Observed Mode helps DevOps teams understand the scope for event noise reduction by simulating alert inferencing in shadow mode. The just introduced Inference Stats widget quantifies the impact of these shadow inferences so that IT practitioners can understand the overall potential for alert volume optimization.
InfoQ: In this release you indicate the ingestion capabilities have been improved. What changes were made here?
Jannu: OpsRamp's Create Alert API can now ingest greater resource context (such as alert source, resource type, service group, and location) for incoming alerts from external tools so that the OpsQ event management engine can correlate third-party events with more meaningful data.
InfoQ: What's next on the roadmap for OpsRamp?
Jannu: The OpsRamp product roadmap for upcoming releases will feature new innovations for hybrid cloud and cloud native monitoring (cloud topology maps for Microsoft Azure and Google Cloud Platform, integrations for AWS AppMesh, Azure Container Services, and Red Hat OpenShift) and service-centric AIOps (automation workflows for closed-loop alert remediation, auto-incident assignment using on-call schedules, and accelerated learning via common cross-client sequences).
More information about the Fall 2019 release for OpsRamp can be found here.