PyTorch, Facebook's open-source deep-learning framework, announced the release of version 1.9 which includes improvements for scientific computing, mobile support, and distributed training. Overall, the new release contains more than 3,400 commits since the 1.8 release.
The PyTorch team highlighted the major features of the release in a recent blog post. The new release moves the Complex Autograd feature, a key component for audio processing, and the torch.linalg module, which includes implementations of NumPy's linear algebra module, to stable status. There are several new features in beta status, including the torch.special module, which is based on SciPy’s special module, as well as improvements to distributed training, including TorchElastic, ZeroRedundancyOptimizer, and CUDA support in RPC. The release includes several updates for mobile applications, including a beta Mobile Interpreter, a lightweight version of the PyTorch runtime, as well as a new Inference Mode API, which provides "significant speed-up" when using models for inference.
PyTorch's linalg module is designed to be a "drop-in" replacement for NumPy's linear algebra module, but with PyTorch's acceleration and autograd support. The module has 26 operators that replicate the NumPy API, as well as "faster and easier to use" versions of existing PyTorch operators. Because NumPy is the "de-facto standard" for matrix computing in Python, developers working with existing scientific computing code can take advantage of the PyTorch implementation, co-developed by NVIDIA, that leverages CUDA and GPUs for faster execution of matrix algebra. The new special module likewise implements a popular scientific computing API, SciPy's special module, which includes implementations of many advanced mathematical functions, such as Bessel functions and elliptic functions, as well as several common probability distributions.
The new Mobile Interpreter was "one of the top requested features" of the framework. The Interpreter reduces the size of deep learning models that are deployed to resource-limited platforms, such as mobile and edge devices. Model size can be reduced up to 75%; for example, the PyTorch team compressed a MobileNetV2 model, which contains millions of parameters, down to 2.6 MB. The 1.9 release provides pre-built Interpreter libraries for iOS and Android, simplifying integration into an app. The release also includes a TorchVision library for mobile as well as several demo apps for iOS and Android.
The TorchElastic library for managing distributed PyTorch worker processes has been moved from a standalone project to PyTorch core. TorchElastic has been used in several high-scale PyTorch projects, including DeepSpeed and PyTorch Lightning. The new release also introduces beta implementations of Microsoft's ZeroRedundancyOptimizer, which reduces the memory requirements for large models, and CUDA RPC, a more efficient communication channel between workers in a distributed cluster.
In a discussion on the release on Reddit, one user expressed disappointment with the new Inference Mode, stating,
I was really hoping that that inference mode would be something akin to TensorRT, but it looks like it isn't, and is restricted to C++ for the time being... I'd really appreciate an easier way to get the kinds of acceleration that TensorRT is capable of, maybe moving something like Torch2TRT into Torch itself.
The PyTorch code and version 1.9 release notes are available on GitHub.