BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Interactive Query Service Amazon Athena Introduces New Engine

Interactive Query Service Amazon Athena Introduces New Engine

AWS recently announced version 3 of the engine for Amazon Athena, the serverless interactive service to query S3 data using standard SQL. The cloud provider claims that the new engine improves performance and supports new use cases thanks to over 50 new SQL functions and 30 new analytics features.

Most of the improvements for Athena engine version 3 are coming from the open-source Trino and PrestoDB projects, with AWS speeding up the integration of enhancements and bug fixes from the community. Blayze Stefaniak, senior solutions architect at AWS, and colleagues write:

One of the most exciting aspects of engine version 3 is its new continuous integration approach to open source software management that will improve currency with the Trino and PrestoDB projects. This approach enables Athena to deliver increased performance and new features at an even faster pace.

Among other new features, Athena now supports T-Digest functions for rank-based statistics and new geospatial functions, with the addition of MATCH_RECOGNIZE for row pattern matching helping identify data patterns in applications such as fraud detection and sensor data analysis.

SELECT m.id AS row_id, m.match, m.val, m.label
FROM (VALUES(1, 90),(2, 80),(3, 70),(4, 70)) t(id, value)
MATCH_RECOGNIZE (
    	ORDER BY id
    	MEASURES match_number() AS match,
    	RUNNING LAST(value) AS val,
    	classifier() AS label
    	ALL ROWS FOR EACH MATCH
    	AFTER MATCH SKIP PAST LAST ROW
    	PATTERN (() | A) DEFINE A AS true
) AS m;

Source: https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference-0003.html

The cloud provider released a guide on how to upgrade the query engine and a document highlighting key differences between version 2 and version 3.

According to AWS, the new engine improves query execution, reducing the amount of data scanned, and provides performance improvement of joins involving comparisons with the <,<=, >,>= operators, queries that contains JOIN, UNION, UNNEST, GROUP BY clauses, and queries using IN predicate. Stefaniak and colleagues add:

We performed benchmark testing on engine version 3 using TPC-DS benchmark queries at 3 TB scale, and observed 20% query performance improvement when compared to the latest release of engine version 2.

Source: https://aws.amazon.com/blogs/big-data/upgrade-to-athena-engine-version-3-to-increase-query-performance-and-access-more-analytics-features/

Not everyone agrees with Michael Wittig, founder at cloudonaut.io, reporting a 10% decrease in performances. AWS acknowledges that a subset of use cases might be negatively affected, writing:

Many queries run faster on Athena engine version 3, but some query plans can differ from Athena engine version 2. As a result, some queries can differ in latency or cost.

Among the limitations, the Trino and Presto connectors are not supported, as well as fault-tolerant execution Trino Tardigrade. The new query engine is available in all regions supporting Athena, excluding the Chinese ones.

About the Author

Rate this Article

Adoption
Style

BT