As part of the recent macOS Big Sur release, Apple has included the ML Compute framework. ML Compute provides optimized mathematical libraries to improve training on CPU and GPU on both Intel and M1-based Macs, with up to a 7x improvement in training times using the TensorFlow deep-learning library.
Apple's Machine Learning blog gave a high-level overview of the ML Compute framework. ML Compute improves the performance of compute-graph-based deep-learning libraries such as TensorFlow by optimizing the graph itself and executing its primitives via accelerated libraries such as BNNS for CPU training and Metal Performance Shaders for the GPU. To take full advantage of ML Compute, Apple has provided a version of TensorFlow binaries targeted for the platform. Tests of the optimized TensorFlow library on several popular neural network benchmarks show "dramatically faster" training times compared to the standard code, with an up to 7x improvement for the optimized library on Apple's new M1 hardware.
Most deep-learning systems use a compute-graph-based framework such as TensorFlow or PyTorch. These systems describe a neural network as a series of linear algebra operations on multidimensional numeric arrays, or tensors. These operations can often be sped up markedly by the use of low-level, hardware-specific implementations. Even more performance gains are available by using GPU hardware to perform the operations, as GPUs are designed specifically for large-scale linear algebra.
However, GPU support in the deep-learning frameworks must be coded for a specific hardware platform's acceleration libraries. The two most popular deep-learning frameworks, TensorFlow and PyTorch, support NVIDIA's GPUs for acceleration via the CUDA toolkit. This poses a problem for deep-learning development on Macs. Apple has used Intel hardware for the integrated GPUs in the Mac since 2010, and while macOS does support an external GPU (eGPU), Apple's official documentation only recommends AMD-based hardware. Support for NVIDIA hardware is spotty, with reports of bugs and slow performance after OS upgrades. There are other solutions for deep-learning acceleration on the Mac, including PlaidML, but these have drawbacks such as difficult setup or lack of support for low-level TensorFlow APIs.
Recently Apple released the new M1 "system on a chip," which not only contains a built-in GPU, but also includes a 16-core "Neural Engine" which supports 11 trillion operations per second. Apple claims the Neural Engine will support up to 15x improvement in ML computation. Around the same time, Apple released Big Sur, the latest version of macOS, which contains the ML Compute framework. ML Compute wraps several low-level API calls for performing neural-network operations: BNNS on CPU and Metal Performance Shaders on GPU.
Using a version of the TensorFlow framework compiled to target these APIs, Apple trained several common neural network models and compared training times to those obtained using a standard version of TensorFlow. Experiments were conducted on an Intel-based 2019 Mac Pro with an AMD GPU and on both an Intel-Based 2020 MacBook Pro with Intel GPU and an M1-based 2020 MacBook Pro. Apple did not provide raw numbers, but claims it is "up to 7x faster" on the M1 compared to the Intel MacBook, and the accompanying graphs show speedup of approximately an order of magnitude for some networks on the 2019 Mac Pro.
Image Source: https://blog.tensorflow.org/2020/11/accelerating-tensorflow-performance-on-mac.html
Users on Hacker News welcomed the announcement, noting the difficulty with workarounds such as PlaidML. On Twitter, machine-learning consultant Stephen Purpura reported good performance with the new M1 hardware, comparing it to NVIDIA's GPUs:
On my models currently in test, it’s like using a 1080 or 1080 TI. Given the benchmarks, it should be a little faster, so I will take a look at what might be causing the slowdown.
Apple's binaries for the accelerated TensorFlow library are available on GitHub. Although Apple has not yet released the source code, the TensorFlow blog mentions plans to integrate Apple's fork into the TensorFlow open-source mainline. The PyTorch team has not announced any plans for support of ML Compute or the M1 chip, although users have suggested this via GitHub issues.