InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News PyTorch 2.5 Release Includes Support for Intel GPUs

AI, ML & Data Engineering

PyTorch 2.5 Release Includes Support for Intel GPUs

This item in japanese

Oct 29, 2024 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

The PyTorch Foundation recently released PyTorch version 2.5, which contains support for Intel GPUs. The release also includes several performance enhancements, such as the FlexAttention API, TorchInductor CPU backend optimizations, and a regional compilation feature which reduces compilation time. Overall, the release contains 4095 commits since PyTorch 2.4.

The Intel GPU support was previewed at the recent PyTorch conference. Intel engineers Eikan Wang and Min Jean Cho described the PyTorch changes made to support the hardware. This included generalizing the PyTorch runtime and device layers which makes it easier to integrate new hardware backends. Intel specific backends were also implemented for torch.compile and torch.distributed. According to Kismat Singh, Intel's VP of engineering for AI frameworks:

We have added support for Intel client GPUs in PyTorch 2.5 and that basically means that you'll be able to run PyTorch on the Intel laptops and desktops that are built using the latest Intel processors. We think it's going to unlock 40 million laptops and desktops for PyTorch users this year and we expect the number to go to around 100 million by the end of next year.

The release includes a new FlexAttention API which makes it easier for PyTorch users to experiment with different attention mechanisms in their models. Typically, researchers who want to try a new attention variant need to hand-code it directly from PyTorch operators. However, this could result in "slow runtime and CUDA OOMs." The new API supports writing these instead with "a few lines of idiomatic PyTorch code." The compiler then converts these to an optimized kernel "that doesn’t materialize any extra memory and has performance competitive with handwritten ones."

Several performance improvements have been released in beta status. A new backend Fused Flash Attention provides "up to 75% speed-up over FlashAttentionV2" for NVIDIA H100 GPUs. A regional compilation feature for torch.compile reduces the need for full model compilation; instead, repeated nn.Modules, such as Transformer layers, are compiled. This can reduce compilation latency while incurring only a few percent performance degradation. There are also several optimizations to the TorchInductor CPU backend.

Flight Recorder, a new debugging tool for stuck jobs, was also included in the release. Stuck jobs can occur during distributed training, and could have many root causes, including data starvation, network issues, or software bugs. Flight Recorder uses an in-memory circular buffer to capture diagnostic info. When it detects a stuck job, it dumps the diagnostics to a file; the data can then be analyzed using a script of heuristics to identify the root cause.

In discussions about the release on Reddit, many users were glad to see support for Intel GPUs, calling it a "game changer." Another user wrote:

Excited to see the improvements in torch.compile, especially the ability to reuse repeated modules to speed up compilation. That could be a game-changer for large models with lots of similar components. The FlexAttention API also looks really promising - being able to implement various attention mechanisms with just a few lines of code and get near-handwritten performance is huge. Kudos to the PyTorch team and contributors for another solid release!

The PyTorch 2.5 code and release notes are available on GitHub.

About the Author

Anthony Alford

Anthony is a Senior Director, Development at Genesys where he is working on several AI and ML projects related to customer experience. He has over 20 years experience in designing and building scalable software. Anthony holds a Ph.D. degree in Electrical Engineering with specialization in Intelligent Robotics Software and has worked on various problems in the areas of human-AI interaction and predictive analytics for SaaS business optimization.

Show moreShow less

This content is in the AI, ML & Data Engineering topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

PyTorch 2.5 Release Includes Support for Intel GPUs

Write for InfoQ

About the Author

Anthony Alford

This content is in the AI, ML & Data Engineering topic

Related Topics:

Popular in AI, ML & Data Engineering

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter