InfoQ Homepage Deep Learning Content on InfoQ
-
Microsoft Announces Small Language Model Phi-2
Microsoft Research announced Phi-2, a 2.7 billion-parameter Transformer-based language model. Phi-2 is trained on 1.4T tokens of synthetic data generated by GPT-3.5 and outperforms larger models on a variety of benchmarks.
-
Apple Open-sources Apple Silicon-Optimized Machine Learning Framework MLX
Apple's MLX combines familiar APIs, composable function transformations, and lazy computation to create a machine learning framework inspired by NumPy and PyTorch that is optimized for Apple Silicon. Implemented in Python and C++, the framework aims to provide a user-friendly and efficient solution to train and deploy machine learning models on Apple Silicon.
-
Microsoft's Orca 2 LLM Outperforms Models That Are 10x Larger
Microsoft Research released its Orca 2 LLM, a fine-tuned version of Llama 2 that performs as well as or better than models that contain 10x the number of parameters. Orca 2 uses a synthetic training dataset and a new technique called Prompt Erasure to achieve this performance.
-
Google Launches New Multi-Modal Gemini AI Model
On December 6, Alphabet released the first phase of its next-generation AI model, Gemini. Gemini was overseen and driven by its CEO, Sundar Pichai and Google DeepMind. Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the performance of language models.
-
Stability AI Open-Sources Video Generation Model Stable Video Diffusion
Stability AI released the code and model weights for Stable Video Diffusion (SVD), a video generation AI model. When given an input image as context, the model can generate 25 video frames at a resolution of 576x1024 pixels.
-
Meta Announces Generative AI Models Emu Video and Emu Edit
Meta AI Research announced two new generative AI models: Emu Video, which can generate short videos given a text prompt, and Emu Edit, which can edit images given text-based instructions. Both models are based on Meta's Emu foundation model and exhibit state-of-the-art performance on several benchmarks.
-
Spotify Open-Sources Voyager Nearest-Neighbor Search Library
Spotify Engineering recently open-sourced Voyager, an approximate nearest-neighbor (ANN) search library. Voyager is based on the hierarchical navigable small worlds (HNSW) algorithm and is 10 times faster than Spotify's previous ANN library, Annoy.
-
xAI Introduces Large Language Model Grok
xAI, the AI company founded by Elon Musk, recently announced Grok, a large language model. Grok can access current knowledge of the world via the X platform and outperforms other LLMs of comparable size, including GPT-3.5, on several benchmarks.
-
AWS Unveils Gemini, a Distributed Training System for Swift Failure Recovery in Large Model Training
AWS and Rice University have introduced Gemini, a new distributed training system to redefine failure recovery in large-scale deep learning models. According to the research paper, Gemini adopts a daring strategy by utilizing CPU memory to ensure previously unheard-of speeds in failure recovery, overcoming obstacles related to high recovery costs and constrained checkpoint storage capacity.
-
Microsoft Releases DeepSpeed-FastGen for High-Throughput Text Generation
Microsoft has announced the alpha release of DeepSpeed-FastGen, a system designed to improve the deployment and serving of large language models (LLMs). DeepSpeed-FastGen is the synergistic composition of DeepSpeed-MII and DeepSpeed-Inference . DeepSpeed-FastGen is based on the Dynamic SplitFuse technique. The system currently supports several model architectures.
-
PyTorch 2.1 Release Supports Automatic Dynamic Shape Support and Distributed Training Enhancements
PyTorch Conference 2023 presented an overview of PyTorch 2.1. ExecuTorch was introduced to enhance PyTorch's performance on mobile and edge devices. The conference also had a focus on community with new members added to the PyTorch Foundation and a Docathon announced.
-
TorchServe Potentially Exposed to Remote Code Execution
Israeli-based security company Oligo has uncovered multiple vulnerabilities in TorchServe, the tool used to serve PyTorch models, that could allow an attacker to run arbitrary code on vulnerable systems. The vulnerabilities have been promptly fixed in TorchServe version 0.82.
-
Stability AI Releases Generative Audio Model Stable Audio
Harmonai, the audio research lab of Stability AI, has released Stable Audio, a diffusion model for text-controlled audio generation. Stable Audio is trained on 19,500 hours of audio data and can generate 44.1kHz quality audio in realtime using a single NVIDIA A100 GPU.
-
Unpacking How Ads Ranking Works @ Pinterest: Aayush Mudgal at QCon San Francisco
At QCon San Francisco, Aayush Mudgal gave a talk on Pinterest's ad ranking strategy. Pinterest does both candidate retrieval and ranking, supported by user interaction data and what they are currently watching. They use neural networks to create embeddings for ads and users, where ads which are close to the user should be relevant. They train and deploy models on a daily basis.
-
Meta Open-Sources Multilingual Translation Foundation Model SeamlessM4T
Meta recently open-sourced Massively Multilingual & Multimodal Machine Translation (SeamlessM4T), a multilingual translation AI that can translate both speech audio and text data across nearly 100 languages. SeamlessM4T is trained on 1 million hours of audio data and outperforms the current state-of-the-art speech-to-text translation model.