A team from Microsoft Research and Carnegie Mellon University has open-sourced Project Petridish, a neural architecture search algorithm that automatically builds deep-learning models that are optimized to satisfy a variety of constraints. Using Petridish, the team achieved state-of-the-art results on the CIFAR-10 benchmark with only 2.2M parameters and five GPU-days of search time.
The algorithm and experiments are described in a paper accepted at the Neural Information Processing Systems (NeurIPS) 2020 Conference. Petridish uses a technique called forward search that starts with a human-designed "parent model" and adds layers to it. The new model is partially trained; if the new model exhibits better performance than the parent, it is fully trained and several metrics about the model are recorded---including accuracy, computation requirements, and latency---which allow users to choose a model that best fits the constraints of a problem. According to the paper's authors:
With Petridish, we seek to increase efficiency and speed in finding suitable neural architectures, making the process easier for those in the field, as well as those without expertise interested in machine learning solutions.
Neural architecture search (NAS) is a subfield of automated machine-learning (AutoML). Much of the recent NAS research focuses on manually designing a search space and then sampling architectures from it; sometimes randomly, sometimes guided by optimization algorithms. NAS algorithms are usually classified as macro-search or cell-search. In the latter, the search algorithm optimizes small blocks of network layers called cells, and the final network is composed of repeated copies of the cells. Macro-search, by contrast, tries out many different overall network structures. Petridish supports both kinds of search.
Petridish's algorithm is inspired by gradient boosting techniques, where "weak" models are iteratively added to an ensemble to produce a stronger model; the next weak model chosen is the one that best matches the gradient of the prediction error in the current model. Likewise, Petridish starts with an initial model and iteratively improves it by adding layers chosen from a set of candidate layers; the layers chosen are those that best reduce the training loss. Once candidate layers are identified, the modified model is trained to convergence and becomes the new "parent" and the process repeats.
After Petridish trains a model to completion, it records performance metrics such as accuracy on a test dataset, vs. cost metrics such as the number of model parameters or number of computations. AI models designers usually must make tradeoffs between cost and performance, and Petridish produces a scatterplot of the models showing their location in this space; the plot also shows the Pareto frontier, where the best models are found for a given value of one of the metrics.
To test the effectiveness of their algorithm, the team used Petridish to design networks for the CIFAR-10 image-recognition task. Using macro-search, Petridish found a model that achieved a 2.85% error rate using only 2.2M parameters, which according to the team is "significantly better than previous macro search results." Cell-search produced similar results, and the average search time for models was around 10 GPU-days.
Petridish is implemented in TensorFlow and the source code is available on GitHub. The team notes that the code is "under active development," and they plan to release a version based on PyTorch in the near future.