AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. With AWS Glue, customers don’t have to provision or manage any resources and only pay for resources when the service is running.
Since its general availability, Amazon updated the service with several features, such as support for stream ETL jobs earlier this year, as reported by InfoQ. A few months later, the service received a few updates and is now generally available as a significant new version. These updates include faster Spark ETL job start times, 1-minute minimum billing duration, and a new capability to install Python modules from a wheel file or a repository.
Through the AWS Management Console, users can create new Glue Spark ETL jobs or move their existing Glue Spark ETL jobs to Glue version 2.0. Subsequently, they can run their jobs. The startup time is faster for version 2.0 than 1.0.
Harunobu Kameda, a product marketing evangelist at AWS, wrote in a blog post AWS Glue 2.0:
With Glue version 2.0, job startup delay is more predictable and has less overhead. In addition, AWS Glue version 2.0 Spark jobs will be billed in 1-second increments with a 10x lower minimum billing duration—from a 10-minute minimum to a 1-minute minimum. As a result, customers can now run micro-batch, deadline sensitive, interactive workloads more cost-effectively.
However, Markus Wissing, practice leader enterprise architecture at TecAlliance, said in a tweet:
With version 2.0 @aws improved AWS Glue (faster, cheaper). You should consider V2 for new ETL jobs running less than 10min to benefit from the new pricing model. But be aware: V2 still needs to reach feature parity with older versions.
And a respondent on Reddit thread points out that the new version does not support:
- Development endpoints
- FindMatches machine learning transforms
- AWS Glue streaming ETL jobs
And:
- AWS Glue version 2.0 does not run on Apache YARN, so YARN settings do not apply
- AWS Glue version 2.0 does not have a Hadoop Distributed File System (HDFS)
- AWS Glue version 2.0 does not use dynamic allocation; hence the ExecutorAllocationManager metrics are not available
AWS Glue 2.0 is currently available in various AWS regions in North America, South America, Europe, and the Asia Pacific. Furthermore, the latest documentation is available on the website and the pricing details of the service on the pricing page.