InfoQ Homepage Deep Learning Content on InfoQ
-
Meta's Chameleon AI Model Outperforms GPT-4 on Mixed Image-Text Tasks
The Fundamental AI Research (FAIR) team at Meta recently released Chameleon, a mixed-modal AI model that can understand and generate mixed text and image content. In experiments rated by human judges, Chameleon's generated output was preferred over GPT-4 in 51.6% of trials, and over Gemini Pro in 60.4%.
-
Meta Open-Sources MEGALODON LLM for Efficient Long Sequence Modeling
Researchers from Meta, University of Southern California, Carnegie Mellon University, and University of California San Diego recently open-sourced MEGALODON, a large language model (LLM) with an unlimited context length. MEGALODON has linear computational complexity and outperforms a similarly-sized Llama 2 model on a range of benchmarks.
-
OpenAI Publishes GPT Model Specification for Fine-Tuning Behavior
OpenAI recently published their Model Spec, a document that describes rules and objectives for the behavior of their GPT models. The spec is intended for use by data labelers and AI researchers when creating data for fine-tuning the models.
-
University of Washington AI-Powered Headphones Let Users Listen to a Single Person in a Crowd
"Target speech hearing" is a new deep-learning algorithm developed at the University of Washington to allow users to "enroll" a speaker and cancel all environmental noise surrounding their voice.
-
Stanford AI Index 2024 Report: Growth of AI Regulations and Generative AI Investment
Stanford University’s Institute for Human-Centered Artificial Intelligence (HAI) has published its 2024 AI Index annual report. The report identifies top trends in AI, such as 8x growth in Generative AI investment since 2022.
-
OpenAI Announces New Flagship Model GPT-4o
OpenAI recently announced the latest version of their GPT AI foundation model, GPT-4o. GPT-4o is faster than the previous version of GPT-4 and has improved capabilities in handling speech, vision, and multilingual tasks, outperforming all models except Google's Gemini on several benchmarks.
-
Apple Open-Sources One Billion Parameter Language Model OpenELM
Apple released OpenELM, a Transformer-based language model. OpenELM uses a scaled-attention mechanism for more efficient parameter allocation and outperforms similarly-sized models while requiring fewer tokens for training.
-
Meta Releases Llama 3 Open-Source LLM
Meta AI released Llama 3, the latest generation of their open-source large language model (LLM) family. The model is available in 8B and 70B parameter sizes, each with a base and instruction-tuned variant. Llama3 outperforms other LLMs of the same parameter size on standard LLM benchmarks.
-
Ines Montani at QCon London: Economies of Scale Can’t Monopolise the AI Revolution
During her presentation at QCon London, Ines Montani, co-founder and CEO of explosion.ai (the maker of spaCy), stated that economies of scale are not enough to create monopolies in the AI space and that open-source techniques and models will allow everybody to keep up with the “Gen AI revolution”.
-
OpenAI Releases New Fine-Tuning API Features
OpenAI announced the release of new features in their fine-tuning API. The features will give model developers more control over the fine-tuning process and better insight into their model performance.
-
Stability AI Releases 3D Model Generation AI Stable Video 3D
Stability AI recently released Stable Video 3D (SV3D), an AI model that can generate 3D mesh object models from a single 2D image. SV3D is based on the Stable Video Diffusion model and produces state-of-the-art results on 3D object generation benchmarks.
-
Google Trains User Interface and Infographics Understanding AI Model ScreenAI
Google Research recently developed ScreenAI, a multimodal AI model for understanding infographics and user interfaces. ScreenAI is based on the PaLI architecture and achieves state-of-the-art performance on several tasks.
-
NVIDIA Announces Next-Generation AI Superchip Blackwell
NVIDIA recently announced their next generation GPU architecture, Blackwell. Blackwell is the largest GPU ever built, with over 200 billion transistors, and can train large language models (LLMs) up to 4x faster than previous generation hardware.
-
Meta Unveils 24k GPU AI Infrastructure Design
Meta recently announced the design of two new AI computing clusters, each containing 24,576 GPUs. The clusters are based on Meta's Grand Teton hardware platform, and one cluster is currently used by Meta for training their next-generation Llama 3 model.
-
Researchers Open-Source LLM Jailbreak Defense Algorithm SafeDecoding
Researchers from the University of Washington, the Pennsylvania State University, and Allen Institute for AI have open-sourced SafeDecoding, a technique for protecting large language models (LLMs) against jailbreak attacks. SafeDecoding outperforms baseline jailbreak defenses without incurring significant computational overhead.