Meta AI Research recently announced Project CAIRaoke, an end-to-end deep-learning model for digital assistants. Project CAIRaoke is currently being used in Meta's Portal device and outperforms a previous conversational model when evaluated on a reminder task.
Meta announced the model during their recent Inside the Lab event. Unlike most conversational models that are composed of a pipeline of four distinct components, Project CAIRaoke consists of a single neural network that is trained to perform task-oriented dialog. According to Meta, this allows developers to add new task domains to the model with less work, and requires only a single set of training data. The model is currently used by Portal to manage reminders, and provides a "significant" improvement over the previous approach's completion success rate. According to Meta,
We believe that the progress made with Project CAIRaoke will enable us to deliver richer communication between people and AI that will be an essential tool as we build for the metaverse.
A typical architecture for a digital assistant, sometimes known as a task-oriented dialog system, consists of a pipeline of four modules: natural language understanding (NLU), dialog state tracking (DST), dialog policy (DP) management, and natural language generation (NLG). This allows developers to take advantage of pre-built components; for example, BERT for NLU or GPT-2 for NLG. However, according to the Meta team, this architecture poses some problems. For example, adding support for a new task domain requires sequentially re-training each component. Also, errors in an upstream component can propagate in unexpected ways.
Fig 1: Conventional pipelined dialoge system (image source: https://ai.facebook.com/blog/project-cairaoke)
Meta's approach is to train a single deep-learning model that replaces the four modules. Although Meta has not released many technical details, they claim that they are using technology developed for BlenderBot 2.0 to include knowledge scraped from the internet in conversations, reducing hallucination. Project CAIRaoke also includes BlenderBot's safeguards for preventing offensive generated speech. To improve the robustness of their model, Meta used data augmentation techniques, which can help models perform better in the presence of distribution shift or even adversarial attacks. However, Meta notes that debugging their end-to-end model is a "complicated challenge."
Fig 2: CAIRaoke end-to-end dialog system (image source: https://ai.facebook.com/blog/project-cairaoke)
While Meta says that Project CAIRaoke is currently in use on Portal devices, much of the capability of the system seems aspirational at this time. Meta plans to expand its role to assist in personalized shopping and hopes to eventually deploy it to edge devices such as AR glasses and VR headsets; however, they admit they have "more work to do to fully realize this vision," including increasing the number of languages the model supports.
Chatbot and digital assistants are an active research area. InfoQ previously reported on Meta's open-source BlenderBot 2.0, as well as Baidu's open-source chatbot PLATO-XL. InfoQ also covered Amazon's Alex Prize SocialBot challenge, where university students develop conversational AI models. Google recently published a paper on their work developing large Transformer-based models for open-domain dialog, a slightly different problem from task-oriented dialog, where instead of performing tasks for a user, an AI is expected to converse on arbitrary topics.
The source code for Meta's BlenderBot is available on GitHub as part of Metat's ParlAI open-source chatbot framework.