AWS recently announced the general availability (GA) of Amazon EC2 P5 instances powered by the latest NVIDIA H100 Tensor Core GPUs suitable for users that require high performance and scalability in AI/ML and HPC workloads. The GA is a follow-up to the earlier announcement of the development of the infrastructure.
The Amazon EC2 P5 instances stem from a long-time collaboration between AWS and NVIDIA and are the 11th version of an instance for visual computing, AI, and high-performance computing (HPC) clusters. These instances are equipped with 8 x NVIDIA H100 Tensor Core GPUs, boasting 640 GB of high-bandwidth GPU memory, powered by 3rd Gen AMD EPYC processors, offering 2 TB of system memory, and featuring 30 TB of local NVMe storage. Additionally, P5 instances deliver an aggregate network bandwidth of 3200 Gbps using the second-generation Elastic Fabric Adaptor (EFA) technology, supporting GPUDirect RDMA, enabling lower latency and efficient scale-out performance bypassing the CPU during internode communication.
The company claims the instances will reduce up to 6 times in training time (from days to hours) compared to previous generation GPU-based instances and up to 40 percent lower training costs.
An infographic showing the P5 instances and NVIDIA H100 Tensor Core GPUs compare to previous instances and processors (Source: AWS News Blog)
With Amazon EC2 P5 instances, users can leverage them for training and running inference for increasingly complex Large Language Models (LLMs) and computer vision models behind generative AI applications such as question answering, code generation, video and image generation, and speech recognition. In addition, they can use the instances for high-performance computing workloads like pharmaceutical discovery, seismic analysis, weather forecasting, and financial modeling.
Furthermore, P5 instances are deployable in hyperscale clusters called EC2 UltraClusters. These UltraClusters combine high-performance computing, advanced networking, and storage capabilities in the cloud. Each EC2 UltraCluster functions as a robust supercomputer, allowing users to execute complex AI training and distributed HPC workloads across multiple interconnected systems.
Dave Salvator, a director of accelerated computing products at NVIDIA, stated in an NVIDIA blog post:
Customers can run at-scale applications that require high levels of communications between compute nodes; the P5 instance sports petabit-scale non-blocking networks powered by AWS EFA, a 3,200 Gbps network interface for Amazon EC2 instances.
In addition, Satish Bora, an International GM at nOps.io, commented on a Linked post of Jeff Barr:
It appears a small datacenter in an instance; what a power.
AWS competitors Microsoft and Google have similar offerings for AI/ML and HPC workloads. For instance, recently, Microsoft made Azure Managed Lustre generally available. The EC2 UltraClusters use Amazon FSx for Lustre, also a fully-managed shared storage built on Lustre file system, an open-source parallel filesystem. In addition, Microsoft released Azure HBv4 and HX Series virtual machines for HPC workloads. Furthermore, Google, for instance, released the Compute Engine C3 machine series optimized for high-performance computing.
Lastly, Amazon EC2 P5 instances are currently available in the US East (N. Virginia) and US West (Oregon) regions, and pricing details can be found on the EC2 pricing page.