Google AI is open-sourcing GPipe, a TensorFlow library for accelerating the training of large deep-learning models.
Deep-neural-networks (DNN) are the tool of choice for solving many AI tasks, such as natural-language processing and visual object detection. New methods for the latter are often benchmarked against winners of the ImageNet challenge. Each year's winning entry has performed better than the last; however, there is a corresponding increase in model complexity. The 2014 winner, GoogLeNet, achieved 74.8% top-1 accuracy with 4 million model parameters. 2017's winner, Squeeze-and-Excitation Networks, reached 82.7% top-1 accuracy with 145.8 million parameters.
The increase in model size poses a problem when training the networks. In order to train the networks in a reasonable time, much of the computation is delegated to accelerators: special-purpose hardware such as GPUs or TPUs. But these devices have limited memory, which restricts the size of the model that can be trained. There are ways to reduce the memory requirements, such as swapping out the data in the accelerator's memory, but these can slow down training. Another solution is to partition the model, so that multiple accelerators can be used in parallel. The most obvious partition scheme in a sequential DNN is to split the model by layers and have each layer trained by a different accelerator. But the sequential nature of training multiple layers can result in only one accelerator working while others are idle waiting for results to come from higher or lower in the stack.
GPipe solves this problem by splitting training batches into "micro-batches" and pipelining them through the layers. Accelerators for successive layers can begin processing a micro-batch result from a previous layer without waiting for the full batch to be finished.
Using GPipe and 8 TPUv2s, Google's researchers were able to train visual object detection models with 1.8 billion parameters: 5.6 times the parameters that could be trained on a single TPUv2. Training these large models resulted in 84.7% top-1 accuracy on the ImageNet validation data, beating the 2017 winner's score.
In addition to supporting larger models, GPipe's model partitioning allows for faster training of a given model simply by running more accelerators in parallel. Researchers reported that using "four times more accelerators achieved 3.5 times speedup."
GPipe is available as part of the Lingvo framework for building sequential neural-network models in TensforFlow.