Recently, AWS introduced Amazon Managed Workflows for Apache Airflow (MWAA), a fully-managed service simplifying running open-source versions of Apache Airflow on AWS and build workflows to execute extract-transform-load (ETL) jobs and data pipelines.
Apache Airflow is an open-source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as "workflows." Developers and data engineers use Apache Airflow to manage workflows as scripts, monitor them via the user interface (UI), and extend their functionality through a set of powerful plugins. However, to use Apache Airflow, they need to install, maintain, and scale it manually. Now AWS solves this by offering MWAA for developers and data engineers to build and manage their workflows in the cloud without worrying about managing and scaling their Airflow platform's infrastructure.
In an AWS Press release on MWAA, Jesse Dougherty, vice president, Application Integration, AWS, said:
Customers have told us they really like Apache Airflow because it speeds the development of their data processing and machine learning workflows, but they want it without the burden of scaling, operating, and securing servers. With Amazon MWAA, customers can use the same Apache Airflow platform as they do today with the scalability, availability, and security of AWS.
Amazon MWAA can retrieve input from sources like Amazon Simple Storage Service (S3) using Amazon Athena queries, perform transformations on Amazon EMR clusters, and can use the resulting data to train machine learning models on Amazon SageMaker. Furthermore, developers and data engineers author workflows in Amazon MWAA as Directed Acyclic Graphs (DAGs) using the Python programming language.
Source: https://aws.amazon.com/managed-workflows-for-apache-airflow/
Danilo Poccia, chief evangelist (EMEA) at Amazon Web Services, wrote in an NWAA introduction blog post:
You can use Amazon MWAA with these three steps:
- Create an environment – Each environment contains your Airflow cluster, including your scheduler, workers, and a web server. Developers and data engineers can create a new Amazon MWAA environment from the console, AWS Command Line Interface (CLI), or AWS SDKs.
- Upload your DAGs and plugins to S3 – Amazon MWAA loads the code into Airflow automatically.
- Run your DAGs in Airflow – Run your DAGs from the Airflow UI or command-line interface (CLI) and monitor your environment with CloudWatch.
Also, with MWAA, developers and data engineers can benefit from the open extensibility through plugins which allows them to create tasks that interact with AWS or on-premise resources required for your workflows, including AWS Batch, Amazon CloudWatch, Amazon DynamoDB, AWS Lambda, Amazon Redshift, Amazon Simple Queue Service (SQS), and Amazon Simple Notification Service (SNS).
Note that AWS also has other workflow management systems such as Step Functions and AWS Glue. A respondent on a Hacker News thread explains:
This was developed internally by the Orchestration organization - that builds Step Functions and maintains AWS Simple Workflow. I don't think of Glue as a generic workflow system the way the others are - it's definitely much more optimized for ETL use cases. With time, I'm sure there'll be more detailed guidance on Step Functions vs. Apache Airflow, but the simple guidance might be that Step Functions is a fully AWS-native (and serverless) orchestration engine. Whereas, of course, Apache Airflow is an open-source project with a diverse ecosystem of other plugins.
MWAA is currently available in the following AWS regions: US East (Ohio and N. Virginia), US West (Oregon), EU (Stockholm, Ireland, and Frankfurt), and Asia Pacific (Tokyo, Singapore, and Sydney), with more regions to follow. Furthermore, the service details are available on the document page and pricing details on the pricing page.