A team of researchers from University of Aberdeen, MIT, and several other institutions have released a dataset of historical compute demands for machine learning (ML) models. The dataset contains the compute required for training 123 important models, and an analysis shows that since the year 2010 the trend has significantly increased.
The analysis was presented in a paper published on arXiv. The dataset contains three times as many models as previous efforts, and covers notable ML models that have advanced the state of the art. The goal of the research was to help predict the progress of ML, by using training compute demand as a proxy. The team identified three distinct eras in ML, which they termed the Pre-Deep Learning Era, the Deep Learning Era, and the Large-Scale Era. They also calculated doubling trends (akin to Moore's Law) for each era: around 21 months, 6 months, and 10 months respectively. According to the researchers,
We hope our work will help others better understand how much recent progress in ML has been driven by increases in scale and improve our forecasts for the development of advanced ML systems.
Recent research has shown that deep learning model performance improves with increased training compute according to a power law; that is, each doubling of training compute decreases the model's loss by a constant factor. This means that projecting compute demand into the future can give planners and policy makers an idea of how well models may perform in the future.
To determine historic trends in ML compute requirements, the team collected data from ML research papers that showed experimental results and advanced the state-of-the-art. The papers were also deemed "milestone models," meaning that they were of historical importance or were widely cited. Since not all papers included data on compute resource requirements, the research team developed a heuristic to estimate the compute from the number of model parameters and the training data size.
In analyzing the data, the team noticed three distinct trends or eras across time. The first, beginning with Claude Shannon's 1952 maze-solving robot Theseus and lasting until around 2010, is the Pre-Deep Learning Era. Compute usage in this grew slowly, doubling just under every two years, roughly matching Moore's Law. The next era, the Deep Learning Era, exhibits a much faster growth, doubling about every 6 months. However, there is not a sharp transition between the two eras, and the team noted that their results "barely change" when placing the start anywhere from 2010 to 2012.
While the Deep Learning Era trend still continues even into 2022, the team noticed an additional Large Scale Era trend beginning in 2015, which consists of very large models such as AlphaGo and GPT-3. With these models, the compute demand is currently one or two orders of magnitude higher than "regular-scale" models; however, the demand appears to be doubling only every 9 or 10 months.
The team also investigated whether models in different domains exhibit different trends. They analyzed models in four categories: language, vision, games, and "other" (which multimodal systems as well as autonomous vehicles and robots). For language, vision, and other, trends were "fairly consistent" with the overall dataset. However, the researchers could find no consistent trend for games.
In a discussion on Twitter, a user noted it would be interesting to see compute requirements for model inference. Co-author Lennart Heim replied,
Unfortunately - same with training - we are bottlenecked by the available data. Sharing inference compute is even less of a norm than training compute.
The ML compute trend dataset is available as a shared Google Doc, and the authors' visualization code is available on GitHub. An animated view of the trend data is available on the Our World in Data website.