InfoQ Homepage Natural Language Processing Content on InfoQ
-
Mistral AI Introduces Saba: Regional Language Model for Arabic and South Indian Language
Mistral AI has introduced Mistral Saba, a 24-billion-parameter language model designed to improve AI performance in Arabic and several Indian-origin languages, particularly South Indian languages like Tamil.
-
UC Berkeley's Sky Computing Lab Introduces Model to Reduce AI Language Model Inference Costs
UC Berkeley's Sky Computing Lab has released Sky-T1-32B-Flash, an updated reasoning language model that addresses the common issue of AI overthinking. The model, developed through the NovaSky (Next-generation Open Vision and AI) initiative, "slashes inference costs on challenging questions by up to 57%" while maintaining accuracy across mathematics, coding, science, and general knowledge domains.
-
NVIDIA Unveils Hymba 1.5B: a Hybrid Approach to Efficient NLP Models
NVIDIA researchers have unveiled Hymba 1.5B, an open-source language model that combines transformer and state-space model (SSM) architectures to achieve unprecedented efficiency and performance. Designed with NVIDIA’s optimized training pipeline, Hymba addresses the computational and memory limitations of traditional transformers while enhancing the recall capabilities of SSMs.
-
Meta's Research SuperCluster for Real-Time Voice Translation AI Systems
A recent article from Engineering at Meta reveals how the company is building Research SuperCluster (RSC) infrastructure that is used for advancements in real-time voice translations, language processing, computer vision, and augmented reality (AR).
-
Amazon Brings AI Assistant to Software Development as Part of Amazon Q Suite
Amazon has recently released Amazon Q Developer Agent, an AI-powered assistant that uses natural language input from developers to generate features, bug fixes, and unit tests within an integrated development environment (IDE). It employs large language models and generative AI to understand a developer's natural language request, and then generate the necessary code changes.
-
Google Text Embedding Model Gecko Distills Large Language Models for Improved Performance
Gecko is a text embedding model that Google created by distilling knowledge from large language models into a general-purpose model. Gecko is trained using a novel approach on a variety of tasks including document retrieval, semantic similarity, and classification, and aims to be as general-purpose as it goes as well as highly performant.
-
Amazon Announces One Billion Parameter Speech Model BASE TTS
Amazon Science recently published their work on Big Adaptive Streamable TTS with Emergent abilities (BASE TTS). BASE TTS supports voice-cloning and outperforms baseline TTS models when evaluated by human judges. Further, Amazon's experiments show that scaling model and data size improves the subjective quality of the model's output.
-
Google Open-Sources AI Fine-Tuning Method Distilling Step-by-Step
A team from the University of Washington and Google Research recently open-sourced Distilling Step-by-Step, a technique for fine-tuning smaller language models. Distilling Step-by-Step requires less training data than standard fine-tuning and results in smaller models that can outperform few-shot prompted large language models (LLMs) that have 700x the parameters.
-
Meta Open-Sources Multilingual Translation Foundation Model SeamlessM4T
Meta recently open-sourced Massively Multilingual & Multimodal Machine Translation (SeamlessM4T), a multilingual translation AI that can translate both speech audio and text data across nearly 100 languages. SeamlessM4T is trained on 1 million hours of audio data and outperforms the current state-of-the-art speech-to-text translation model.
-
Meta's Voicebox Outperforms State-of-the-Art Models on Speech Synthesis
Meta recently announced Voicebox, a speech generation model that can perform text-to-speech (TTS) synthesis in six languages, as well as edit and remove noise from speech recordings. Voicebox is trained on over 50k hours of audio data and outperforms previous state-of-the-art models on several TTS benchmarks.
-
Google's Speech AI AudioPaLM Performs Translation with Voice Transfer
Researchers at Google announced AudioPaLM, a large language model (LLM) that performs text-to-speech (TTS), automated speech recognition (ASR), and speech-to-speech translation (S2ST) with voice transfer. AudioPaLM is based on the PaLM-2 LLM and outperforms OpenAI's Whisper on translation benchmarks.
-
Meta's Open-Source Massively Multilingual Speech AI Handles over 1,100 Languages
Meta AI open-sourced the Massively Multilingual Speech (MMS) model, which supports automatic speech recognition (ASR) and text-to-speech synthesis (TTS) in over 1,100 languages and language identification (LID) in over 4,000 languages. MMS can outperform existing models and covers nearly 10x the number of languages.
-
Google's Universal Speech Model Performs Speech Recognition on Hundreds of Languages
Google Research announced Universal Speech Model (USM), a 2B parameter automated speech recognition (ASR) model trained on over 12M hours of speech audio. USM can recognize speech in over 100 languages, including low-resource languages, and achieves new state-of-the-art performance on several benchmarks.
-
Stability AI Open-Sources 7B Parameter Language Model StableLM
Stability AI released two sets of pre-trained model weights for StableLM, a suite of large language models (LLM). The models are trained on 1.5 trillion text tokens and are licensed for commercial use under CC BY-SA-4.0.
-
Microsoft Semantic Kernel Enables LLM Integration with Conventional Programs
Microsoft has open sourced Semantic Kernel (SK), a lightweight SDK enabling the integration of large language models (LLMs) with conventional programs which can leverage prompt templating, vectorized memory, intelligent planning, and other capabilities.