Amazon Alexa AI's Natural Language Understanding group released Multilingual Amazon SLURP (SLU resource package) for Slot Filling, Intent Classification, and Virtual-Assistant Evaluation (MASSIVE), a dataset for training natural language understanding (NLU) AI models that contains one million annotated samples from 51 languages. The release also includes code and tools for using the data.
The release was announced on the Amazon Science blog. MASSIVE was compiled by translating phrases from the English-only SLURP dataset, using translators hired from Amazon Mechanical Turk. The researchers used the resulting dataset to fine-tune two pre-trained baseline models, XLM-R and mT5, and evaluated them on a test-set portion of the data. The fine-tuned models showed a 20-point improvement in exact-match accuracy compared to "zero-shot" results. In addition to releasing the dataset and model code, Amazon has announced a competition, Massively Multilingual NLU 2022 (MMNLU-22), where competitors will test their models on a held-out portion of the MASSIVE dataset. The winners will be announced in August and will be invited to present at a workshop to be held at the upcoming Conference on Empirical Methods in Natural Language Processing (EMNLP). According to Prem Natarajan, vice president of Alexa AI Natural Understanding:
We hope that this dataset will enable researchers across the world to drive new advances in multilingual language understanding that expand the availability and reach of conversational-AI technologies.
Virtual assistants such as Alexa rely on NLU models to act on a user's spoken instructions. Although these models have made great progress in recent years, training them requires large datasets containing annotated examples of speech data; for example, the SLURP dataset which contains almost 20,000 utterances. Collecting these datasets can be expensive and time-consuming, and consequently most virtual assistants support only a handful of languages; for example, Alexa can understand only Arabic, German, English, French, Hindi, Italian, Japanese, Portuguese, and Spanish.
To help researchers train and evaluate NLU models for a larger set of languages, the Amazon team had the SLURP dataset translated into 50 additional languages. The researchers began by hiring translators from Mechanical Turk, choosing those who demonstrated fluency in their target language. The team also used Mechanical Turk to hire judges to assess the quality of the translations, with the judges' results included as annotations in the dataset. Overall, MASSIVE contains 587k examples for training, 104k for dev, 152k for test, and 153k that are un-released, which will be used during the competition.
Lead author Jack FitzGerald joined a Hacker News discussion about the release. In response to some comments about the quality of the translations, FitzGerald replied:
Quality control was nontrivial, to put it succinctly, but we certainly always want to be better...Though we re-collected some utterances with low scores, we didn't have the budget to get perfect scores for all utterances. As such, we decided to include all utterances along with the scores from the 3 raters, such that users can perform filtering as they'd like. Some may want to keep the noise intact to help with training.
Multilingual AI models are an active research topic at many large tech companies. Earlier this year, InfoQ covered Meta's multilingual speech-recognition model XLS-R, which was trained on data from 128 languages. InfoQ also previously covered both baseline models evaluated on the MASSIVE dataset, XLM-R and mT5, developed by Meta and Google respectively, as well as models developed by Microsoft.
Tools and modeling code for the MASSIVE dataset are available on GitHub.