AWS recently announced version 3 of the engine for Amazon Athena, the serverless interactive service to query S3 data using standard SQL. The cloud provider claims that the new engine improves performance and supports new use cases thanks to over 50 new SQL functions and 30 new analytics features.
Most of the improvements for Athena engine version 3 are coming from the open-source Trino and PrestoDB projects, with AWS speeding up the integration of enhancements and bug fixes from the community. Blayze Stefaniak, senior solutions architect at AWS, and colleagues write:
One of the most exciting aspects of engine version 3 is its new continuous integration approach to open source software management that will improve currency with the Trino and PrestoDB projects. This approach enables Athena to deliver increased performance and new features at an even faster pace.
Among other new features, Athena now supports T-Digest functions for rank-based statistics and new geospatial functions, with the addition of MATCH_RECOGNIZE for row pattern matching helping identify data patterns in applications such as fraud detection and sensor data analysis.
SELECT m.id AS row_id, m.match, m.val, m.label FROM (VALUES(1, 90),(2, 80),(3, 70),(4, 70)) t(id, value) MATCH_RECOGNIZE ( ORDER BY id MEASURES match_number() AS match, RUNNING LAST(value) AS val, classifier() AS label ALL ROWS FOR EACH MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (() | A) DEFINE A AS true ) AS m;
Source: https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference-0003.html
The cloud provider released a guide on how to upgrade the query engine and a document highlighting key differences between version 2 and version 3.
According to AWS, the new engine improves query execution, reducing the amount of data scanned, and provides performance improvement of joins involving comparisons with the <,<=, >,>= operators, queries that contains JOIN, UNION, UNNEST, GROUP BY clauses, and queries using IN predicate. Stefaniak and colleagues add:
We performed benchmark testing on engine version 3 using TPC-DS benchmark queries at 3 TB scale, and observed 20% query performance improvement when compared to the latest release of engine version 2.
Source: https://aws.amazon.com/blogs/big-data/upgrade-to-athena-engine-version-3-to-increase-query-performance-and-access-more-analytics-features/
Not everyone agrees with Michael Wittig, founder at cloudonaut.io, reporting a 10% decrease in performances. AWS acknowledges that a subset of use cases might be negatively affected, writing:
Many queries run faster on Athena engine version 3, but some query plans can differ from Athena engine version 2. As a result, some queries can differ in latency or cost.
Among the limitations, the Trino and Presto connectors are not supported, as well as fault-tolerant execution Trino Tardigrade. The new query engine is available in all regions supporting Athena, excluding the Chinese ones.