InfoQ Homepage Deep Learning Content on InfoQ
-
AMD Introduces Its Deep-Learning Accelerator Instinct MI200 Series GPUs
In its recent Accelerated Data Center Premiere Keynote, AMD unveiled its MI200 accelerator series Instinct MI250x and slightly lower-end Instinct MI250 GPUs. Designed with CDNA-2 architecture and TSMC’s 6nm FinFET lithography, the high-end MI250X provides 47.9 TFLOPs peak double precision performance and memory that will allow training larger deep networks by minimizing model sharding.
-
Facebook Open-Sources GHN-2 AI for Fast Initialization of Deep-Learning Models
A team from Facebook AI Research (FAIR) and the University of Guelph have open-sourced an improved Graph HyperNetworks (GHN-2) meta-model that predicts initial parameters for deep-learning neural networks. GHN-2 executes in less than a second on a CPU and predicts values for computer vision (CV) networks that achieve up to 77% top-1 accuracy on CIFAR-10 with no additional training.
-
PyTorch 1.10 Release Includes CUDA Graphs APIs, Compiler Improvements, and Android NNAPI Support
PyTorch, Facebook's open-source deep-learning framework, announced the release of version 1.10 which includes an integration with CUDA Graphs APIs and JIT compiler updates to increase CPU performance, as well as beta support for the Android Neural Networks API (NNAPI). New versions of domain-specific libraries TorchVision and TorchAudio were also released.
-
QCon Plus ML Panel Discussion: ML in Production - What's Next?
The recent QCon Plus online conference featured a panel discussion titled "ML in Production - What's Next?" Some key takeaways were that many ML projects fail in production because of poor engineering infrastructure and a lack of intra-disciplinary communication, and that both model explainability and ML for edge computing are important technologies that are still not mature.
-
Roland Meertens on the Unreasonable Effectiveness of Zero Shot Learning
At the recent QCon Plus online conference, Roland Meertens gave a talk on developing AI-based applications titled "The Unreasonable Effectiveness of Zero Shot Learning." He demonstrated two examples of using foundation models and zero shot learning to rapidly deploy prototype applications and gain feedback without needing to gather large datasets and train models.
-
Francesca Lazzeri on What You Should Know before Deploying ML in Production
At the recent QCon Plus online conference, Dr. Francesca Lazzeri gave a talk on machine learning operations (MLOps) titled "What You Should Know before Deploying ML in Production." She covered four key topics, including MLOps capabilities, open source integrations, machine-learning pipelines, and the MLFlow platform.
-
BigScience Research Workshop Releases AI Language Model T0
BigScience Research Workshop released T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. T0 can often outperform models 6x larger on the BIG-bench benchmark, and can outperform the 16x larger GPT-3 on several other NLP benchmarks.
-
Amazon Releases DL1 Instances Powered by Gaudi Accelerators
Amazon recently announced the general availability of the EC2 DL1 instances powered by Gaudi accelerators from Habana Labs. The new instances promise better price performances in training deep learning models for use cases such as computer vision, natural language processing, autonomous vehicle perception and recommendation engines.
-
Baidu Announces 11 Billion Parameter Chatbot AI PLATO-XL
Baidu recently announced PLATO-XL, an AI model for dialog generation, which was trained on over a billion samples collected from social media conversations in both English and Chinese. PLATO-XL achieves state-of-the-art performance on several conversational benchmarks, outperforming currently available commercial chatbots.
-
IBM Develops Hardware-Based Vector-Symbolic AI Architecture
IBM Research recently announced a memory-augmented neural network (MANN) AI system consisting of a neural network controller and phase-change memory (PCM) hardware. By performing analog in-memory computation on high-dimensional (HD) binary vectors, the system learns few-shot classification tasks on the Omniglot benchmark with only 2.7% accuracy drop compared to 32-bit software implementations.
-
Google's Gated Multi-Layer Perceptron Outperforms Transformers Using Fewer Parameters
Researchers at Google Brain have announced Gated Multi-Layer Perceptron (gMLP), a deep-learning model that contains only basic multi-layer perceptrons. Using fewer parameters, gMLP outperforms Transformer models on natural-language processing (NLP) tasks and achieves comparable accuracy on computer vision (CV) tasks.
-
TensorFlow Similarity Supports Fast Query Search Index on Pre-trained Models
Francois Chollet and his team recently released a Python library for TensorFlow, called TensorFlow Similarity. Similarity learning is the process of finding similar items, from similar clothes in images to person identification using face pictures. Deep-learning models have used a method called contrastive learning to increase accuracy and efficiency in learning similarity between images.
-
Google's Dev Library is a Curated Collection of Projects about Google Tech
Google has launched a new initiative aimed at creating a curated collection of open source projects related to Google technologies. Google's Dev Library will not only contain code repositories, but also articles, tools, and tutorials collected from various Internet sources.
-
MIT Researchers Open-Source Approximate Matrix Multiplication Algorithm MADDNESS
Researchers at MIT's Computer Science & Artificial Intelligence Lab (CSAIL) have open-sourced Multiply-ADDitioN-lESS (MADDNESS), an algorithm that speeds up machine learning using approximate matrix multiplication (AMM). MADDNESS requires zero multiply-add operations and runs 10x faster than other approximate methods and 100x faster than exact multiplication.
-
Stanford Research Center Studies Impacts of Popular Pretrained Models
Stanford University recently announced a new research center, the Center for Research on Foundation Models (CRFM), devoted to studying the effects of large pretrained deep networks (e.g. BERT, GPT-3, CLIP) in use by a surge of machine-learning research institutions and startups.