Facebook AI Research is open-sourcing PyText, a natural-language-processing (NLP) modeling framework that is used in the Portal video-calling device and M Suggestions in Facebook Messenger.
NLP is a technology for parsing and handling human languages and is a key component of chatbot or smart-assistant applications. Engineers developing NLP algorithms often turn to deep-learning systems to build their solutions, such as Facebook's PyTorch platform. PyText builds on top of PyTorch by providing a set of interfaces and models specifically tuned for NLP. Internally, Facebook is using PyText to power NLP in their Portal video-calling device and in their Messenger app's M Suggestion feature.
PyText addresses a common problem for NLP projects: the tradeoff between rapid experimentation and scalability in production. Researchers experiment with new ideas, rapidly tweaking models to achieve performance goals. In this experimentation phase, models are developed on frameworks such as PyTorch or Tensorflow’s eager execution. These frameworks have simple APIs and advanced features such as dynamic graphs, where the structure of the network can change at runtime. In contrast, to handle the requirements of production loads, engineers often turn to deep-learning frameworks such as TensorFlow or Caffe2, which are optimized for high throughput and only support static computation graphs. PyText allows researchers to develop models in PyTorch, then export them via ONNX to the Caffe2 framework for deployment on diverse production platforms, including mobile devices.
PyText can utilize multiple GPUs for distributed training and can train multiple models at once, reducing the overall training time. The PyText code also comes with pre-trained models for several common NLP tasks, including text classification, named-entity recognition, and joint intent-determination and slot-filling, which is a staple of chatbot development. However, this set of models is based on Facebook’s use cases and does not include many NLP models, including machine comprehension and coreference resolution, that are available in other frameworks such as AllenNLP.
PyText source code is available on GitHub.