InfoQ Homepage Natural Language Processing Content on InfoQ
-
Meta's Research SuperCluster for Real-Time Voice Translation AI Systems
A recent article from Engineering at Meta reveals how the company is building Research SuperCluster (RSC) infrastructure that is used for advancements in real-time voice translations, language processing, computer vision, and augmented reality (AR).
-
Amazon Brings AI Assistant to Software Development as Part of Amazon Q Suite
Amazon has recently released Amazon Q Developer Agent, an AI-powered assistant that uses natural language input from developers to generate features, bug fixes, and unit tests within an integrated development environment (IDE). It employs large language models and generative AI to understand a developer's natural language request, and then generate the necessary code changes.
-
Google Text Embedding Model Gecko Distills Large Language Models for Improved Performance
Gecko is a text embedding model that Google created by distilling knowledge from large language models into a general-purpose model. Gecko is trained using a novel approach on a variety of tasks including document retrieval, semantic similarity, and classification, and aims to be as general-purpose as it goes as well as highly performant.
-
Amazon Announces One Billion Parameter Speech Model BASE TTS
Amazon Science recently published their work on Big Adaptive Streamable TTS with Emergent abilities (BASE TTS). BASE TTS supports voice-cloning and outperforms baseline TTS models when evaluated by human judges. Further, Amazon's experiments show that scaling model and data size improves the subjective quality of the model's output.
-
Google Open-Sources AI Fine-Tuning Method Distilling Step-by-Step
A team from the University of Washington and Google Research recently open-sourced Distilling Step-by-Step, a technique for fine-tuning smaller language models. Distilling Step-by-Step requires less training data than standard fine-tuning and results in smaller models that can outperform few-shot prompted large language models (LLMs) that have 700x the parameters.
-
Meta Open-Sources Multilingual Translation Foundation Model SeamlessM4T
Meta recently open-sourced Massively Multilingual & Multimodal Machine Translation (SeamlessM4T), a multilingual translation AI that can translate both speech audio and text data across nearly 100 languages. SeamlessM4T is trained on 1 million hours of audio data and outperforms the current state-of-the-art speech-to-text translation model.
-
Meta's Voicebox Outperforms State-of-the-Art Models on Speech Synthesis
Meta recently announced Voicebox, a speech generation model that can perform text-to-speech (TTS) synthesis in six languages, as well as edit and remove noise from speech recordings. Voicebox is trained on over 50k hours of audio data and outperforms previous state-of-the-art models on several TTS benchmarks.
-
Google's Speech AI AudioPaLM Performs Translation with Voice Transfer
Researchers at Google announced AudioPaLM, a large language model (LLM) that performs text-to-speech (TTS), automated speech recognition (ASR), and speech-to-speech translation (S2ST) with voice transfer. AudioPaLM is based on the PaLM-2 LLM and outperforms OpenAI's Whisper on translation benchmarks.
-
Meta's Open-Source Massively Multilingual Speech AI Handles over 1,100 Languages
Meta AI open-sourced the Massively Multilingual Speech (MMS) model, which supports automatic speech recognition (ASR) and text-to-speech synthesis (TTS) in over 1,100 languages and language identification (LID) in over 4,000 languages. MMS can outperform existing models and covers nearly 10x the number of languages.
-
Google's Universal Speech Model Performs Speech Recognition on Hundreds of Languages
Google Research announced Universal Speech Model (USM), a 2B parameter automated speech recognition (ASR) model trained on over 12M hours of speech audio. USM can recognize speech in over 100 languages, including low-resource languages, and achieves new state-of-the-art performance on several benchmarks.
-
Stability AI Open-Sources 7B Parameter Language Model StableLM
Stability AI released two sets of pre-trained model weights for StableLM, a suite of large language models (LLM). The models are trained on 1.5 trillion text tokens and are licensed for commercial use under CC BY-SA-4.0.
-
Microsoft Semantic Kernel Enables LLM Integration with Conventional Programs
Microsoft has open sourced Semantic Kernel (SK), a lightweight SDK enabling the integration of large language models (LLMs) with conventional programs which can leverage prompt templating, vectorized memory, intelligent planning, and other capabilities.
-
Microsoft Open Sources AI Prompt Optimization Toolkit LMOps
Microsoft Research open sourced LMOps, a collection of tools for improving text prompts used as input to generative AI models. The toolkit includes Promptist, which optimizes a user's text input for text-to-image generation, and Structured Prompting, a technique for including more examples in a few-shot learning prompt for text generation.
-
Generating Text Inputs for Mobile App Testing Using GPT-3
A group of researchers from the Chinese Academy of Sciences and Monash University have presented a new approach to text input generation for mobile app testing based on a pre-trained large language model (LLM). Dubbed QTypist, the approach was evaluated on 106 Android apps and automated test tools, showing a significant improvement of testing performance.
-
Google Publishes Technique for AI Language Model Self-Improvement
Researchers at Google and University of Illinois at Urbana-Champaign (UIUC) have published a technique called Language Model Self-Improved (LMSI), which fine-tunes a large language model (LLM) on a dataset generated by that same model. Using LMSI, the researchers improved the performance of the LLM on six benchmarks and set new state-of-the-art accuracy records on four of them.