BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Google Announces Tensor2Tensor for TensorFlow

Google Announces Tensor2Tensor for TensorFlow

This item in japanese

The TensorFlow (TF) community and the Google Brain team announced a significant extension to the TF API's with Tensor2Tensor.

Tensor2Tensor (T2T) addresses the challenge of modularity and portability for models being trained and executed on TF. It does so by abstracting commonly-used deep-learning model pipelines into an extensible object model with standardized APIs for components needed in TF training. One of the goals for T2T is to reduce the cost of repeatability for model training pipelines and their environments. Another is to avoid the engineering effort required for common operations done with TF's existing APIs, which might not be easily replicated between users, or that may have only worked on a specific architecture or problem.

T2T operates on the existing TF libraries for model architectures, optimizers, learning rate decay, schemes, and hyperparameters. It also comes with a number of pre-trained models and sample data sets representing the modalities used by TF. The T2T abstractions around the core TF python apis provide layers of object interfaces for guarantees about components of the TF pipeline, like data serialization and compression. It also has model specification defaults and control methods for things like hyperparameters and modality. This will reportedly let users more easily repeat experiments, compare and exchange results, and focus on a research topic rather than orchestrating TF pipeline environments.

Datasets are standardized on TFRecord protobuf files. Training data sets can be generated with user-defined subclasses of Problem, or with a registry approach that uses python decorators and direct function invocation without class instantiation. Problems are composed of training-time hyperparameters, their input and output modalities and data sets. Problem methods handle encoders, file paths, input and output targets, hyperparameters, default attribute values,. Model metrics like model accuracy are also encapsulated in Problem. Hyperparameter sets are Hparams objects registered using the registry decorator.

A training executable allows configurable synchronous and asynchronous training. The TF_CONFIG environment variable configures the master and parameter server hosts, with support for grpc and gpu groups, as well as the logical clustering of compute resources within each server node in the group like gpus per parameter server.

BT