InfoQ Homepage Deep Learning Content on InfoQ
-
QCon SF: Large Scale Search and Ranking Systems at Netflix
Moumita Bhattacharya spoke at QCon SF 2024 about state-of-the-art search and ranking systems. She gave an overview of the typical structure of these systems and followed with a deep dive into how Netflix created a single combined model to handle both tasks.
-
Meta AI Introduces Thought Preference Optimization Enabling AI Models to Think before Responding
Researchers from Meta FAIR, the University of California, Berkeley, and New York University have introduced Thought Preference Optimization (TPO), a new method aimed at improving the response quality of instruction-fine tuned LLMs.
-
PyTorch 2.5 Release Includes Support for Intel GPUs
The PyTorch Foundation recently released PyTorch version 2.5, which contains support for Intel GPUs. The release also includes several performance enhancements, such as the FlexAttention API, TorchInductor CPU backend optimizations, and a regional compilation feature which reduces compilation time. Overall, the release contains 4095 commits since PyTorch 2.4.
-
Meta Releases Llama 3.2 with Vision, Voice, and Open Customizable Models
Meta recently announced Llama 3.2, the latest version of Meta's open-source language model, which includes vision, voice, and open customizable models. This is the first multimodal version of the model, which will allow users to interact with visual data in ways like identifying objects in photos or editing images with natural language commands among other use cases.
-
Google Develops Voice Transfer AI for Restoring Voices
A team at Google Research developed a zero-shot voice transfer (VT) model that can be used to customize a text-to-speech (TTS) with a specific person's voice. This allows speakers who have lost their voice, for example from Parkinson's disease or ALS, to use a TTS device to replicate their original voice. The model also works across languages.
-
Google Announces Game Simulation AI GameNGen
A research team from Google recently published a paper on GameNGen, a generative AI model that can simulate the video game Doom. GameNGen can simulate the game at 20 frames-per-second (FPS) and in human evaluations was preferred only slightly less often than the actual game.
-
University Researchers Create New Type of Interpretable Neural Network
Researchers from Massachusetts Institute of Technology, California Institute of Technology, and Northeastern University created a new type of neural network: Kolmogorov–Arnold Networks (KAN). KAN models outperform larger perceptron-based models on physics modeling tasks and provide a more interpretable visualization.
-
University of Pennsylvania Researchers Develop Processorless Learning Circuitry
Researchers from the University of Pennsylvania have designed an electrical circuit, similar to a neural network, that can learn tasks such as nonlinear regression. The circuit operates at low power levels and can be trained without a computer.
-
Google's JEST Algorithm Automates AI Training Dataset Curation and Reduces Training Compute
Google DeepMind recently published a new algorithm for curating AI training datasets: multimodal contrastive learning with joint example selection (JEST), which uses a pre-trained model to score the learnability of batches of data. Google's experiments show that image-text models trained with JEST-curated data require 10x less computation than baseline methods.
-
Google Open Sources 27B Parameter Gemma 2 Language Model
Google DeepMind recently open-sourced Gemma 2, the next generation of their family of small language models. Google made several improvements to the Gemma architecture and used knowledge distillation to give the models state-of-the-art performance: Gemma 2 outperforms other models of comparable size and is competitive with models 2x larger.
-
OpenAI's CriticGPT Catches Errors in Code Generated by ChatGPT
OpenAI recently published a paper about CriticGPT, a version of GPT-4 fine-tuned to critique code generated by ChatGPT. When compared with human evaluators, CriticGPT catches more bugs and produces better critiques. OpenAI plans to use CriticGPT to improve future versions of their models.
-
Meta's Chameleon AI Model Outperforms GPT-4 on Mixed Image-Text Tasks
The Fundamental AI Research (FAIR) team at Meta recently released Chameleon, a mixed-modal AI model that can understand and generate mixed text and image content. In experiments rated by human judges, Chameleon's generated output was preferred over GPT-4 in 51.6% of trials, and over Gemini Pro in 60.4%.
-
Meta Open-Sources MEGALODON LLM for Efficient Long Sequence Modeling
Researchers from Meta, University of Southern California, Carnegie Mellon University, and University of California San Diego recently open-sourced MEGALODON, a large language model (LLM) with an unlimited context length. MEGALODON has linear computational complexity and outperforms a similarly-sized Llama 2 model on a range of benchmarks.
-
OpenAI Publishes GPT Model Specification for Fine-Tuning Behavior
OpenAI recently published their Model Spec, a document that describes rules and objectives for the behavior of their GPT models. The spec is intended for use by data labelers and AI researchers when creating data for fine-tuning the models.
-
University of Washington AI-Powered Headphones Let Users Listen to a Single Person in a Crowd
"Target speech hearing" is a new deep-learning algorithm developed at the University of Washington to allow users to "enroll" a speaker and cancel all environmental noise surrounding their voice.