AWS recently announced a new feature Provisioned Capacity for Athena, that allows users to run SQL queries on fully-managed compute capacity for a fixed price and no long-term commitments.
Athena is a serverless interactive query service that allows users to analyze data in Amazon Simple Storage Service (Amazon S3) data lakes and 30 different data sources, including on-premises data sources or other cloud systems, using standard SQL queries. Provisioned Capacity is an optional add-on feature of Athena.
With the added Provisioned Capacity feature, users can now pre-purchase query processing capacity for a set duration and choose the number of concurrent queries they want to run - allowing them to manage their query performance and costs more effectively, especially for critical workloads that require consistent and predictable query performance.
Sébastien Stormacq, a principal developer advocate at AWS, explains the Provisioned Capacity in an AWS News blog post:
Behind the scenes, Athena maintains a large pool of compute in each AWS Region that it operates in. You can think of this as one large pool of compute, divided logically across customers. When you reserve capacity in Athena, the capacity is held for your exclusive use. You can choose which queries run on the capacity you provisioned and which run on Athena’s multi-tenant, on-demand capacity. Multiple queries can share the capacity you provisioned.
Users can increase their capacity units anytime to meet their needs or reduce their provisioned capacity after at least eight hours.
The capacity units are based upon the so-called Data Processing Unit (DPU), with a single unit representing four vCPU and 16 Gb RAM. The minimum capacity users may provision is 24 DPU for eight hours, ideally, according to Stormacq, when the spend is $100 or more per month on Athena.
By reserving capacity in advance, users can avoid queuing delays, prioritize queries, and gain more predictable query performance. The company provides guidelines to determine how much capacity users might require.
Through the Athena console, AWS SDK, or CLI, users can set the capacity for their account and select the workgroups whose queries they want to use the capacity. A workgroup is an Athena mechanism that allows users to separate users, teams, applications, or workloads to set limits on the amount of data each query or the entire workgroup can process and track costs.
Queries associated with the designated workgroup will execute using the provisioned capacity. In addition, the capacity can be shared among several workgroups, provided they all utilize the same Athena engine version.
Source: https://aws.amazon.com/blogs/aws/introducing-athena-provisioned-capacity/
Other services similar to Athena are Google BigQuery, Microsoft Azure Synapse Analytics, Snowflake, and Apache Spark. Mustafa Akın, co-founder at Resmo and AWS Community Builder, stated in a tweet on the provisioned capacity in Athena:
Just use Snowflake if you need this
In contrast, Roni Burd, head of engineering (director) - EMR/Athena query engines, wrote in a LinkedIn post:
The new provisioned capacity model is great for customers who want larger scale and/or no-queue latencies and/or full control of the capacity while enjoying the same "just-works" serverless nature of Athena. It also makes it easier to reason about budget allocations, which are important for customer offering data lake queries to their own customers.
Currently, Athena Provisioned Capacity is available in the US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Singapore, Sydney, Tokyo), and Europe (Ireland, Stockholm) AWS Regions. In addition, pricing details of Athena are available on the pricing page.