Researchers at MIT's Computer Science & Artificial Intelligence Lab (CSAIL) have open-sourced Multiply-ADDitioN-lESS (MADDNESS), an algorithm that speeds up machine learning using approximate matrix multiplication (AMM). MADDNESS requires zero multiply-add operations and runs 10x faster than other approximate methods and 100x faster than exact multiplication.
The team described MADDNESS and a set of experiments in a paper presented at the recent International Conference on Machine Learning (ICML). Unlike many other AMM algorithms, MADDNESS uses no multiply-add operations. Instead, MADDNESS uses a set of efficient learned hash functions that achieve coding rates of 100GB/second using only one CPU thread. The hash functions map input data to an index into a lookup table that contains pre-computed dot-products. Although the algorithm does introduce some output error, the authors show the algorithm has a theoretical upper-bound on the error, which can be traded-off against speed. In a set of experiments comparing the performance of MADDNESS against other algorithms in an image classifier, the researchers found that MADDNESS has a better speed-accuracy tradeoff, and achieves "virtually the same accuracy as exact multiplication" while being 10x faster.
Matrix multiplication is a fundamental operation in machine learning, and is one of the most time-consuming, due to the extensive use of multiply-add instructions. Because GPU and TPU chips can execute many multiply-adds in parallel, they perform matrix multiplication faster than CPUs, making them attractive for ML applications; however, they may be too expensive or even unavailable for researchers with a limited budget, or in resource-constrained applications like IoT. Thus, many researchers have investigated AMM algorithms, which trade off accuracy of matrix multiplication for speed.
The MADDNESS algorithm makes several assumptions about the matrices being multiplied: "tall, relatively dense, and resident in a single machine’s memory," which occur in a wide variety of machine learning applications. In particular, one matrix contains fixed values; this might represent the weights in an image classifier model. The other matrix represents the input data; for example, the pixels of an image to be classified. MADDNESS is based on a vector quantization algorithm called product quantization (PQ). In PQ, a large set of input vectors is analyzed to create a small number of prototype vectors. The products of each prototype vector with the fixed weight vector are pre-computed. Then, any new input vector is mapped to its most similar prototype using a hash function, which gives an index to a pre-computed product.
The key innovation with MADDNESS is using a pre-processing step to produce very fast hash functions that do not require multiply-add operations. The functions are based on binary regression trees; each tree has 16 leaves which represent hash buckets. Mapping an input vector to a prototype requires only comparison to the threshold values of the tree splits. MADDNESS includes two additional performance improvements: a prototype optimization which chooses a set of prototypes that minimizes the reconstruction error of the original matrix, and a fast 8-bit aggregation which combines multiple products using hardware-specific averaging instructions instead of addition.
The researchers compared MADDNESS to six other AMM algorithms, including principal component analysis (PCA) and Bolt, their own previous AMM algorithm, as well as exact matrix multiplication using BLAS. The AMM algorithms were used as part of an image classifier trained and evaluated on the CIFAR datasets. MADDNESS outperformed all other methods, achieving a "far better speed-accuracy tradeoff." The team also compared MADDNESS to the other algorithms using kernel classifiers trained on the UCR Time Series Archive datasets; in these experiments, MADDNESS was "significantly faster than alternatives at a given level of accuracy."
Lead author Davis Blalock joined a Reddit discussion to answer questions about the MADDNESS paper. When asked about extending the algorithm to non-linear functions, Blalock clarified:
Working on extending it to other linear functions (e.g., convolution) and intelligently swapping out linear ops with an overall neural network. So in the sense that neural nets are nonlinear functions, yes. Not working on approximating the nonlinearities directly since they're cheap to just apply to the output of the linear ops (especially if just writing a fused kernel that does both ops at once).
The MADDNESS source code is available on GitHub.