In a recent blog post, Google announced the beta of Cloud AI Platform Pipelines, which provides users with a way to deploy robust, repeatable machine learning pipelines along with monitoring, auditing, version tracking, and reproducibility.
With Cloud AI Pipelines, Google can help organizations adopt the practice of Machine Learning Operations, also known as MLOps – a term for applying DevOps practices to help users automate, manage, and audit ML workflows. Typically, these practices involve data preparation and analysis, training, evaluation, deployment, and more.
Google product manager Anusha Ramesh and staff developer advocate Amy Unruh wrote in the blog post:
When you're just prototyping a machine learning (ML) model in a notebook, it can seem fairly straightforward. But when you need to start paying attention to the other pieces required to make an ML workflow sustainable and scalable, things become more complex.
Moreover, when complexity grows, building a repeatable and auditable process becomes more laborious.
Cloud AI Platform Pipelines - which runs on a Google Kubernetes Engine (GKE) Cluster and is accessible via the Cloud AI Platform dashboard – has two major parts:
- The infrastructure for deploying and running structured AI workflows integrated with GCP services such as BigQuery, Dataflow, AI Platform Training and Serving, Cloud Functions, and
- The pipeline tools for building, debugging and sharing pipelines and components.
With the Cloud AI Platform Pipelines users can specify a pipeline using either the Kubeflow Pipelines (KFP) software development kit (SDK) or by customizing the TensorFlow Extended (TFX) Pipeline template with the TFX SDK. The latter currently consists of libraries, components, and some binaries and it is up to the developer to pick the right level of abstraction for the task at hand. Furthermore, TFX SDK includes a library ML Metadata (MLMD) for recording and retrieving metadata associated with the workflows; this library can also run independently.
Google recommends using KPF SDK for fully custom pipelines or pipelines that use prebuilt KFP components, and TFX SDK and its templates for E2E ML Pipelines based on TensorFlow. Note that over time, Google stated in the blog post that these two SDK experiences would merge. The SDK, in the end, will compile the pipeline and submit it to the Pipelines REST API; the AI Pipelines REST API server stores and schedules the pipeline for execution.
An open-source container-native workflow engine for orchestrating parallel jobs on Kubernetes called Argo runs the pipelines, which includes additional microservices to record metadata, handle components IO, and schedule pipeline runs. The Argo workflow engine executes each pipeline on individual isolated pods in a GKE cluster – allowing each pipeline component to leverage Google Cloud services such as Dataflow, AI Platform Training and Prediction, BigQuery, and others. Furthermore, pipelines can contain steps that perform sizeable GPU and TPU computation in the cluster, directly leveraging features like autoscaling and node auto-provisioning.
Source: https://cloud.google.com/blog/products/ai-machine-learning/introducing-cloud-ai-platform-pipelines
AI Platform Pipeline runs include automatic metadata tracking using the MLMD - and logs the artifacts used in each pipeline step, pipeline parameters, and the linkage across the input/output artifacts, as well as the pipeline steps that created and consumed them.
With Cloud AI Platform Pipelines, according to the blog post customers will get:
- Push-button installation via the Google Cloud Console
- Enterprise features for running ML workloads, including pipeline versioning, automatic metadata tracking of artifacts and executions, Cloud Logging, visualization tools, and more
- Seamless integration with Google Cloud managed services like BigQuery, Dataflow, AI Platform Training and Serving, Cloud Functions, and many others
- Many prebuilt pipeline components (pipeline steps) for ML workflows, with easy construction of your own custom components
The support for Kubeflow will allow a straightforward migration to other cloud platforms, as a respondent on a Hacker News thread on Google AI Cloud Pipeline stated:
Cloud AI Platform Pipelines appear to use Kubeflow Pipelines on the backend, which is open-source and runs on Kubernetes. The Kubeflow team has invested a lot of time on making it simple to deploy across a variety of public clouds, such as AWS, and Azure. If Google were to kill it, you could easily run it on any other hosted Kubernetes service.
The release of AI Cloud Pipelines shows Google's further expansion of Machine Learning as a Service (MLaaS) portfolio - consisting of several other ML centric services such as Cloud AutoML, Kubeflow and AI Platform Prediction. The expansion is necessary to allow Google to further capitalize on the growing demand for ML-based cloud services in a market which analysts expect to reach USD 8.48 billion by 2025, and to compete with other large public cloud vendors such as Amazon offering similar services like SageMaker and Microsoft with Azure Machine Learning.
Currently, Google plans to add more features for AI Cloud Pipelines. These features are:
- Easy cluster upgrades
- More templates for authoring ML workflows
- More straightforward UI-based setup of off-cluster storage of backend data
- Workload identity, to support transparent access to GCP services, and
- Multi-user isolation – allowing each person accessing the Pipelines cluster to control who can access their pipelines and other resources.
Lastly, more information on Google's Cloud AI Pipeline is available in the getting started documentation.