Google Cloud recently announced the preview of Batch, a managed service to run batch jobs at scale. The new service supports the latest T2A Arm-based instances and Spot VMs for large batch jobs utilizing task parallelization.
Batch handles dynamic resource provisioning and autoscaling, executing requests in parallel, supports scripts and containerized workloads and can leverage native Google Cloud services and batch tools. Shamel Jacobs, product manager at Google, and Bolian Yin, software engineer at Google, write:
Batch processing is as old as computing itself, with the term 'batch' dating back to the punchcards used by early mainframes (...) Batch jobs are especially prevalent in areas such as research, simulation, genomics, visual effects, fintech, manufacturing and EDA.
The new service supports common job types like arrays of jobs and multi-node MPI applications. Jacobs and Yin highlight that Batch is not the only service on Google Cloud to handle batch processing:
Batch is a general-purpose batch job service and the latest in a long list of products we’ve created over the years that process jobs to help enterprises migrate their workloads to the cloud. These services include Cloud Life Sciences (formerly Google Genomics), Dataflow, and Cloud Run Jobs.
Source: https://cloud.google.com/blog/products/compute/new-batch-service-processes-batch-jobs-on-google-cloud
The key concepts of the new service are job, the execution of a piece of run-to-completion computation work, tasks that run on Compute Engine instances, the array job, multiple tasks in a job that simultaneously execute the same executable and resources, for example Compute Engine instances, Cloud Storage or NFS mounts. Lewis Carroll, director at AMD, comments:
T2D Tau VMs with batch should be a monster for large scale life sciences, chemistry, derivatives pricing, risk, and other large scale parallel distributed computing jobs.
The cloud provider released a media transcoding tutorial, which leverages Batch to transcode H.264 video files to VP9. Busybox, a project to run a container as a Batch job, primegen, an end-to-end sample of using Workflows and Cloud Build with Batch and wrf, a sample application for running the Weather Research and Forecasting Model in a Batch Job with MPIB, are other examples available on GitHub.
Developers can access Batch through the API, the command line tool, workflow engines, or the console, defining priorities for the jobs and establishing retry strategies. The service can be run in the HPC Toolkit, the Google Cloud open-source project to deploy high performance computing environments, with the cloud provider explaining:
Using Google Cloud Batch with the HPC Toolkit simplifies the setup needed to provision and run more complex scenarios, for example, setting up a shared file system and installing software to be used by Google Cloud Batch jobs. It also makes it possible to share tested infrastructure solutions that work with Google Cloud Batch via HPC Toolkit blueprints.
Currently in preview, Batch is available in a subset of Google Cloud regions: Iowa, South Carolina, Oregon, and Finland. There are no additional charges for using Batch, customers pay for the resources used to run the jobs.