BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Amazon OpenSearch Zero ETL with S3 and New OR1 Instances

Amazon OpenSearch Zero ETL with S3 and New OR1 Instances

This item in japanese

Amazon has announced the preview of the Amazon OpenSearch Service's zero-extraction, transformation, and loading (ETL) integration with Amazon S3, offering a novel method to analyze operational logs in Amazon S3 and S3-based data lakes without switching between services. This development allows users to seamlessly examine infrequently queried data in cloud object stores while simultaneously leveraging the operational analytics and visualization capabilities of the OpenSearch Service.

Amazon also announced the new OR1 instances for Amazon OpenSearch Service, a solution that allows users to create clusters using Amazon Simple Storage Service (Amazon S3) as the primary storage. With these instances, it is possible to ingest, store, index, and access massive amounts of data, with (as claimed by AWS) a 30% improvement in price/performance compared to existing instance types.

The direct queries with the Amazon S3 feature in the OpenSearch Service eliminate the need for ETL processes, thereby reducing operational complexity associated with duplicating data or managing multiple analytics tools. This integration enables customers to query their operational data directly, saving costs and time. Users can configure this zero-ETL integration within the OpenSearch Service, taking advantage of various log type templates, including predefined dashboards, and tailor data accelerations to specific log types. Templates encompass VPC Flow Logs, Elastic Load Balancing logs, and NGINX logs, while accelerations include skipping indexes, materialized views, and covered indexes.

Direct queries with Amazon S3 empower users to perform intricate queries essential for security forensic and threat analysis, correlating data across multiple sources to investigate service downtime and security events. Following the integration setup, users can initiate queries directly from the OpenSearch Dashboards or OpenSearch API, with the capability to easily audit connections for scalability, cost-efficiency, and security.

To begin utilizing direct queries with Amazon S3, users can create a new data source for OpenSearch Service through the AWS Management Console or the API. Each data source utilizes the AWS Glue Data Catalog to manage tables representing S3 buckets. Configuration of Amazon S3 tables, data indexing, and data querying in OpenSearch Dashboards follows the creation of a data source.

The configuration process involves creating a data source in the OpenSearch Service console, specifying the data source type as Amazon S3 with AWS Glue Data Catalog, and choosing the IAM role for the data source. Subsequently, users can navigate to OpenSearch Dashboards to configure access control, define tables, set up log type-based dashboards, and query data.

For optimal performance with data in Amazon S3, users can set up three accelerations: skipping indexes, materialized views, and covering indexes. Skipping indexes focus on indexing only the metadata of the data in Amazon S3 for quick identification of stored data. Materialized views support complex queries like aggregations for querying or powering dashboard visualizations. Covering indexes, being the most performant of the three types, ingest all data from specified table columns.

Once the tables are set up, users can query their data using the Discover feature in OpenSearch Dashboards, running sample SQL queries for the specified tables in the AWS Glue Data Catalog. Overall, this direct query integration enhances the efficiency, simplicity, and speed of data analysis within Amazon OpenSearch Service and Amazon S3, providing a streamlined experience for users seeking seamless and direct access to operational data.

 

OpenSearch dashboard example

The new OR1 instances for Amazon OpenSearch Service come with eleven nines of data durability and a zero-time Recovery Point Objective (RPO), offering exceptional reliability and resilience. The OR1 instances empower users to perform interactive log analytics, monitor applications in real-time, and much more.

The OR1 instances are available in eight sizes, serving as data nodes for the cluster. Each instance size caters to different computing requirements, providing flexibility based on your specific needs. Here is a breakdown of the available OR1 instances:

Instance Name vCPU Memory  EBS Storage Max (gp3)
or1.medium.search 1 8 GiB 400 GiB
or1.large.search 2 16 GiB 800 GiB
or1.xlarge.search 4 32 GiB 1.5 TiB
or1.2xlarge.search: 8 64 GiB 3 TiB
or1.4xlarge.search 16 128 GiB 6 TiB
or1.8xlarge.search 32 256 GiB 12 TiB
or1.12xlarge.search 48 384 GiB 18 TiB
or1.16xlarge.search 64 512 GiB 24 TiB

Amazon Elastic Block Store (Amazon EBS) volumes are the primary storage, ensuring synchronous data copying to S3 upon arrival. This innovative approach leverages remote-backed storage and segment replication features recently introduced for OpenSearch. The data in S3 is utilized to create replicas and to rehydrate EBS after shard movements resulting from node failures or routine rebalancing operations. The OR1 instances can be selected from the cluster configuration in the Data Nodes panel:

 

OpenSearch OR1 node configuration 

To determine the most suitable instance size, refer to the guidelines in the "Sizing Amazon OpenSearch Service domains" documentation.

There are some important points to take into consideration regarding this new storage option: engine versions, regions, and pricing. The OR1 instances are supported in Amazon OpenSearch Service engines version 2.11 and above. The OR1 instance family is available in multiple AWS Regions, including US East, US West, Asia Pacific, and Europe (refer to the AWS documentation for the complete list). Pay On-Demand or Reserved prices for data nodes and additional costs for EBS storage.

In conclusion, the introduction of OR1 instances marks a significant advancement in Amazon OpenSearch Service, providing users with unparalleled storage capabilities, improved performance, and a robust infrastructure for diverse data-intensive applications.

About the Author

Rate this Article

Adoption
Style

BT