At a presentation during Google I/O 2019, Google announced TensorFlow Graphics, a library for building deep neural networks for unsupervised learning tasks in computer vision. The library contains 3D-rendering functions written in TensorFlow, as well as tools for learning with non-rectangular mesh-based input data.
Deep-learning models for computer vision have made great strides in tasks such as object recognition and localization, and this is a key technology for many domains, including autonomous vehicles. However, these models depend on the existence of large datasets of labelled images; that is, images where a human has identified and located objects in the source images. For datasets containing millions of images, this is a labor intensive process to say the least. In addition, for some vision tasks, the mere presence or absence of an object in an image is not enough; often detecting the position and orientation, or even the pose, of an object or person is the goal.
One solution for this problem is an unsupervised learning technique called "analysis by synthesis." In this technique, similar to an autoencoder, the goal is for the neural network to simultaneously learn an encoder that converts an input into an intermediate representation, and a decoder that converts that intermediate representation into an output that, ideally, is exactly the same as the input. Of course it won’t be exact until the model is trained, and the difference between input and output, or loss, is used by the training algorithm called backpropagation to adjust the network parameters. Once the whole system is trained, the encoder can be used by itself as a computer vision system.
If the encoder learns a representation of an image that consists of a set of objects and their 3D-locations and orientations, then there are many 3D-graphics libraries such as OpenGL that can render this representation back into a high-quality image. Chaining the computer-vision encoder with the 3D-graphics rendering decoder provides an opportunity for unsupervised learning for computer vision using unlabelled datasets.
Source: https://github.com/tensorflow/graphics
The problem is that it is not possible to simply drop in just any 3D-graphics library. The key to training a neural network is backpropagation, and this requires that every layer in the network support automatic differentiation.
The good news is that 3D-graphics rendering is based on the same linear algebra operations that TensorFlow was written to optimize---this is, after all, the very reason that deep learning gets such a boost from graphics processing units or GPUs, the specialized hardware created to accelerate 3D-graphics rendering. The TensorFlow Graphics library provides several rendering functions implemented using TensorFlow linear algebra code, making them differentiable "for free." These functions include:
- Transformations that define rotation and translation of objects from their original 3D-position
- Material models that define how light interacts with objects to change their appearance
- Camera models that define projections of objects onto the image plane
Besides the graphics rendering functions for the decoder, the library also includes new tools for the encoder section. Not all recognition tasks operate on rectangular grids of images; many 3D-sensors such as LIDARs provide data as a "point cloud" or mesh of connected points. This is challenging for common computer-vision neural network architectures such as convolutional neural networks (CNNs), which expect input data to be a map of a rectangular grid of points. TensorFlow Graphics provides new convolutional layers for mesh inputs. To assist developers in debugging their models, there is also a new TensorBoard plugin for visualizing point clouds and meshes.
Commenters on Hacker News reacted positively:
Nice! I played with OpenDR (http://files.is.tue.mpg.de/black/papers/OpenDR.pdf) a few years ago, and got really excited about it. Unfortunately it uses a custom autodiff implementation that made it hard it to integrate with other deep learning libraries. Pytorch still seems to be lagging in this area, bit there's some interesting repos on github (e.g. https://github.com/daniilidis-group/neural_renderer).
Dimitri Diakopoulos, a researcher at Oculus, said on Twitter:
This codebase is a perfect compliment [sic] to Tzu-Mao Li's recently published PhD thesis on Differentiable Visual Computing. His work brings the reader through the foundations of differentiable rendering theory through recent state-of-the-art. https://arxiv.org/abs/1904.12228
TensorFlow Graphics is available on GitHub.