After releasing Microsoft Azure Data Factory v2 (ADF) in public preview in September, Microsoft has recently followed up with the announcement of a public preview of new visual tooling for the fully managed cloud-based data integration and ETL service.
With the release of v2 of the ADF service, support was added for:
- New scheduling capabilities supporting run once and complex scheduling scenarios
- Greater control over complex workflows, including conditional looping
- Support for web endpoints for data ingress or egress
- Ability to execute SSIS packages in a cloud-based integration runtime
However, for the September release of the service Visual tooling was not available, making it a manual process to create ADF v2 components and pipelines. The recent release of the visual tooling brings the service more in line with the previous version.
The tooling is web-based and is launched in the Azure portal from within the deployed Azure Data Factory.
Once started it is possible to:
- Create a new ADF pipeline – this allows the user to set up a processing pipeline via a drag-and-drop visual interface and supports complex branching, computing components such as HDInsight and Azure Data Lake Analytics, and includes the new web-based data sources as well as more traditional ones such as Azure SQL Database and files
- Create a new ADF copy pipeline – this allows the user to pick a source data set and target data set for quick creation of a copy pipeline and is the same wizard used in ADF v1; currently 33 data sources are supported for data sources, such as Amazon Redshift, Oracle and SAP HANA, and 13 data sources are supported for data targets, including some Azure services but also Oracle and Salesforce
- Configure a new SSIS Integration Runtime – this allows the user to create a new SSIS Integration Runtime in an Azure SQL Database to support SSIS packages for lift-and-shift scenarios; Microsoft claims that on premises SSIS packages should execute as normal within Azure subject to data source connectivity and availability
- Configure a Git Repository – this allows the user to configure a connection between the ADF instance and a Visual Studio Team Services account; GitHub hosted repositories are not currently supported
To support copy activities and offloading of computing tasks, there is an additional type of Integration Runtime component that is either Azure-based or Self-Hosted. Deciding on which to use depends on the location of the data sources and computing resource.
Image source: https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime
Integration with on-premises resources is supported through the Self-Hosted Integration Runtime, which takes the place of the On-Premises Data Management Gateway of the previous version. It can be downloaded and installed on Windows-based systems -- no Linux version is currently available. For high availability of the on-premises component, at least two nodes are required, but up to four can be association with an Azure-based Integration Runtime component hosted in the ADF v2 service.
With the release of the visual tooling providing the usability missing previously, Microsoft has a solution that goes up against similar big data management solutions such as Software AG webMethods, Talend Big Data Platform or Hitachi Pentaho, whilst providing an alternative to using Integration Platform as a Service (iPaaS) tools such as Azure Logic Apps, Mulesoft or Dell Boomi that are less well suited to large scale and batch data movement.
By providing a migration path for SSIS packages with the Azure hosted SSIS Integration Runtime, Microsoft has introduced the capability to continue to leverage on premises investments using cloud platform services.
Azure Data Factory v2 workloads can currently be deployed to the East US, East US2, West Europe data centers whilst the service is still in public preview.
Further information on using the service can be found in the Microsoft documentation, including a comparison between v1 and v2, and Microsoft has published a video to their media channel to show the basic use of the tools.