Facebook AI Research announced the open-source release of a deep-learning recommendation model, DLRM, that achieves state-of-the-art accuracy in generating personalized recommendations. The code is available on GitHub, and includes versions for the PyTorch and Caffe2 frameworks.
A team of over twenty researchers published a paper in late May describing the DLRM model and its performance. The model segregates its input features into dense and categorical. Each categorical feature has an embedding that converts it to a vector of a pre-defined length. The dense input features are collected into a separate vector and fed into a multilayer perceptron (MLP) to produce another vector that is the same length as the categorical embeddings. Finally, the dot-product of all pairs of these vectors is computed, concatenated with the original dense feature vector, and post-processed with yet another MLP to produce the final output. To evaluate the model accuracy, the researchers trained it on the Criteo Ad Kaggle data set and compared it to another recommendation system, Deep and Cross network (DCN). DLRM outperformed DCN slightly; the authors pointed out that "this is without extensive tuning of model hyperparameters," implying that with such tuning the model might achieve even better performance.
The open-source release of DLRM includes a benchmark script and a set of tools for generating synthetic training datasets. In a blog post, Facebook researchers Maxim Naumov and Dheevatsa Mudigere explain that benchmarking allows researchers to explore the performance of the algorithm using different frameworks and hardware platforms. The synthetic datasets give disparate users of the algorithm the ability to compare their performance without having to use the same real-world datasets; such sharing of data represents a major privacy concern.
The Facebook team is also using DLRM benchmarking as a tool in hardware system co-design. At the OCP Global Summit keynote in March, Facebook's director of technology and strategy Vijay Rao unveiled the Zion AI-training hardware platform. This platform, developed in cooperation with Microsoft, Intel, NVIDIA, and other partners, was designed to address the needs of AI algorithms such as DLRM: for example, the embeddings require more memory bandwidth, while the MLPs require more compute. The DLRM benchmark tools help system designers identify hardware bottlenecks, which can then inform the next generation of hardware design.
DLRM is one of several recent attempts to address the technical challenges of the recommendation problem. The first problem is that categorical features---non-numeric data, such as a user's country---must be converted to a numeric representation. This is usually done using a "one-hot" encoding. The disadvantage of that is the high dimensionality of the encoded feature.
Another challenge is that the best model result is achieved by considering not just the features themselves, but also the interaction between features. This is typically done by taking the cross-product of each feature with every other feature; obviously, this grows as the square of the number of features. Further, because of the sparsity of feature values, the training data may not contain any examples of some feature pairs, resulting in a lack of generalization.
As noted above, the DRLM team compared their model's performance with the Deep and Cross network (DCN) model developed by Google. Wide and Deep learning (WDL) is another recommendation model created by Google. All these systems use embeddings to reduce the dimensionality of categorical features and have a strategy for including feature interactions. For example, WDL uses a generalized linear model that combines the cross-products of all binary features. DCN, by contrast, uses a multi-layer network that learns weights for higher-order products of the features.
Both of the Google models are implemented on TensorFlow, while DLRM runs on either PyTorch or Caffe2. One drawbacks of DLRM is that to run efficiently, it needs both model and data parallelism. This combined parallelism isn't supported on PyTorch or Caffe2, so the Facebook team designed a custom implementation. They intend to "provide its detailed performance study in forthcoming work."