Intel has open-sourced BigDL, a distributed deep learning library that runs on Apache Spark. It leverages existing Spark clusters to run deep learning computations and simplifies the data loading from big datasets stored in Hadoop.
Tests show a significant speedup performance running on Xeon servers compared to other open source frameworks Caffe, Torch or TensorFlow. The speed is comparable with a mainstream GPU and BigDL is able to scale to tens of Xeon servers.
The BigDL library supports Spark versions 1.5, 1.6 and 2.0 and allows for deep learning to be embedded in existing Spark based programs. It contains methods to convert Spark RDDs to a BigDL DataSet and can be used directly with Spark ML Pipelines.
For model training, BigDL applies a synchronous mini-batch SGD (Stochastic Gradient Descent) executed in a single Spark task across multiple executors. Each executor runs a multi-threaded engine and processes a part of the micro-batch data. In the current version, all the training and validation data is loaded into memory.
BigDL is implemented in Scala and is modeled after Torch. Like Torch, it provides a Tensor class, that uses Intel MKL library for computations. Intel MKL, short for Math Kernel Library, consists of a library with a set of routines optimized for calculations, ranging from FFT (Fast Fourier Transform) to matrix multiplications, that are heavily used for deep learning model training. Other concepts borrowed from Torch are Module, inspired on Torch’s nn package, that represents individual neural network layers, Table and Criterion.
BigDL provides an AWS EC2 image and examples for text classification using convolutional neural networks, image classification and how to load models pre-trained in Torch or Caffe into Spark for predictions computation. The main community requests are Python support and MKL-DNN, deep learning extensions for MKL.