Researchers from Alibaba Group and Peking University have open-sourced Kernel Neural Architecture Search (KNAS), an efficient automated machine learning (AutoML) algorithm that can evaluate proposed architectures without training. KNAS uses a gradient kernel as a proxy for model quality, and uses an order of magnitude less compute power than baseline methods.
The algorithm and a set of experiments were described in a paper published in Proceedings of Machine Learning Research. Unlike many AutoML algorithms which require that proposed models undergo the full training process, KNAS can predict which model architectures will perform well without actually training them, thereby potentially saving many hours of compute time. When evaluated on the NAS-Bench-201 computer vision benchmark, KNAS achieved a 25x speedup while producing results with "competitive" accuracies compared to other AutoML methods. When evaluated on text classification tasks, KNAS produced models that achieved better accuracy than a pre-trained RoBERTA baseline model.
Neural architecture search (NAS) is a branch of AutoML that attempts to find the best deep-learning model architecture for a task; more specifically, given a task dataset and a search space of possible architectures, NAS attempts to find an architecture that will achieve the best performance metric on the task dataset. However, this typically requires that each proposed model be fully trained on the dataset. Some data scientists have estimated that the compute power required for such a search "can emit as much carbon as five cars in their lifetimes." This has led many researchers to investigate techniques for improving the search algorithms so that fewer models must be trained.
The Alibaba researchers instead chose to investigate whether architectures could be evaluated without training. The team proposed the following hypothesis: "Gradients can be used as a coarse-grained proxy of downstream training to evaluate randomly-initialized architectures." In particular, the team found a gradient kernel, in this case the mean of the Gram matrix (MGM) of gradients, which they showed has a strong correlation with a model's accuracy. The KNAS algorithm then consists of computing the MGM for each proposed model architecture, keeping only the best few, calculating the model accuracy for those candidates, and selecting the model with the highest accuracy as final result.
The researchers compared the performance of KNAS on NAS-Bench-201 to several other families of NAS algorithms, including random search, reinforcement learning (RL) search, evolutionary search, hyper-parameter search, differentiable algorithms, as well as two other "training-free" algorithms. The team found that KNAS was faster than all other methods, except the two training-free algorithms. It also produced a model with the best accuracy for one benchmark dataset, and produced competitive results on the other two: in one case, outperforming the human-crafted ResNet baseline model.
Several other research organizations have also investigated NAS algorithms which require little to no model training. A group from Samsung recently open-sourced Zero-Cost-NAS, which uses a single mini-batch of data to evaluate models. Researchers from the University of Texas released Training-freE Neural Architecture Search (TE-NAS) which uses two training-free proxies for scoring models, and a team from the University of Edinburgh published Neural Architecture Search Without Training, which looks at the "overlap of activations between data points in untrained networks" to evaluate models. Late last year, InfoQ reported on an algorithm open-sourced by Facebook that uses NAS to initialize deep-learning model parameters without training.
The KNAS code is available on GitHub.