At the recent Microsoft Ignite conference, the company announced the public preview of Azure Container Apps with serverless GPUs powered by NVIDIA. This feature allows customers to utilize NVIDIA A100 GPUs and NVIDIA T4 GPUs in a serverless environment, providing scaling and flexibility for real-time custom model inferencing and other machine-learning tasks.
Azure Container Apps is a fully-managed serverless container service that allows developers to deploy, run, and scale containerized applications without managing infrastructure. With serverless GPUs, they can run GPU-powered applications without managing the underlying infrastructure and benefit from scale-to-zero capabilities; resources can dynamically scale based on demand, reducing idle costs. In addition, they can benefit from per-second billing for GPU usage with data governance that keeps information within container boundaries, flexible options with NVIDIA A100 and T4 GPUs, and a managed serverless platform for deploying their own AI models.
According to the company, Azure’s serverless GPUs excel in use cases like real-time AI inferencing, machine learning model deployments, and high-performance computing tasks. The platform ensures smooth integration into existing Azure workflows.
(Source: Azure Blogs on Apps blog post)
During an Ignite Session of Azure Functions Flex Consumption and GPUs, Simon Jakesch, principal product manager Azure Container Apps at Microsoft, said:
Anyone who has used serverless or in combination with Azure Container Apps has found it to be extremely powerful. This technology brings the same power to GPU use, making GPUs easily accessible.
Microsoft is not the sole provider of GPU capabilities for accelerating workloads such as real-time AI inferencing and machine learning model deployments. Others are Modal, RunPod, Replicate, Baseten, Koyeb and Fal. Furthermore, Google Cloud Run supports NVIDIA L4 GPUs for real-time AI inferencing.
Lars Wurm, a platform leader in Core Infrastructure at Inter Ikea, posted on LinkedIn:
With the introduction of serverless GPUs using Azure Container Apps, several new workloads and usage scenarios are enabled, shaping the offering into a one-stop shop for container workloads. This is particularly beneficial when workloads do not rely on committed ACA instances.
And in an NVIDIA corporate blog post, Dave Salvator wrote:
Serverless GPUs allow development teams to focus more on innovation and less on infrastructure management. With per-second billing and scale-to-zero capabilities, customers pay only for the compute they use, helping ensure resource utilization is both economical and efficient. NVIDIA is also working with Microsoft to bring NVIDIA NIM microservices to serverless NVIDIA GPUs in Azure to optimize AI model performance.
Serverless GPUs are available in a select set of Azure regions during the public preview phase. More information is available directly on Azure's platform in documentation, tutorials, and pricing details.