Amazon OpenSearch recently introduced the support of anomaly detection for historical data. The machine learning based feature helps identifying trends, patterns, and seasonality in OpenSearch data.
Tyler Ohlsen, software engineer at AWS, explains how customers can derive insights and take actions to improve applications:
You can leverage anomaly detection to analyse vast amounts of logs in many different ways. Some analytics approaches require real-time detection, such as application monitoring, event detection, and fraud detection. Others involve analysing past data to identify the trends and patterns, isolate the root cause and prevent them from happening again in the future.
The Anomaly detection automatically detects anomalies in near real-time using the v2.0 of the Random Cut Forest (RCF) algorithm. The new feature introduces a unified flow in the OpenSearch dashboards to use anomaly detection for both real-time and historical analysis. The RCF algorithm models a sketch of the incoming data stream to compute an anomaly grade and confidence score value for incoming data points, with the values then used to differentiate an anomaly from normal variations.
Source: https://opensearch.org/docs/latest/monitoring-plugins/ad/index/
Originally created at Amazon for streaming data, new implementations of the RCF algorithm are now developed for density estimation, imputation, and forecasting. Matthew Wilson, VP and distinguished engineer at AWS, adds:
This Random Cut Forest algorithm is OpenSource software that is separate from OpenSearch. Check out the implementations in both Java and Rust. You can track progress toward the enhancements coming in the 3.0 implementation as well.
Explaining how to stream analytics in OpenSearch, Sudipto Guha, principal scientist at AWS, and Joshua Tokle, senior software engineer at Amazon, write:
Anomaly detection is a quintessential search problem. A typical use case corresponds to a high cardinality dataset, where some attributes split the data into a large number of individual, and potentially incomparable, time series and one simultaneously monitors each of those time series. Anomalies are often only explainable in the context of past data specific to each time series.
The new feature is one of the improvements of OpenSearch version 1.1, the Apache 2.0-licensed distribution of Elasticsearch that was forked a year year by AWS. The new version introduces Bucket Level Alerting, policies that evaluate against aggregations, and Cross-Cluster Replication (CCR), to deploy OpenSearch clusters across different servers, data centres, and regions.
The Anomaly Detection OpenSearch Plugin and the Anomaly Detection OpenSearch Dashboards Plugin are available on GitHub.