InfoQ Homepage GPU Content on InfoQ
-
AWS and NVIDIA to Collaborate on Next-Gen EC2 P5 Instances for Accelerating Generative AI
AWS and NVIDIA announced the development of a highly scalable, on-demand AI infrastructure that is specifically designed for training large language models and creating advanced generative AI applications. The collaboration aims to create the most optimized and efficient system of its kind, capable of meeting the demands of increasingly complex AI tasks.
-
NVIDIA Kubernetes Device Plug-in Brings Temporal GPU Concurrency
Starting from the v12 release, the Nvidia GPU device plug-in framework started supporting time-sliced sharing between CUDA workloads on Kubernetes. This feature aims to prevent under-utilization of GPU units and make it easier to scale applications by leveraging concurrently-executing CUDA contexts.
-
Asahi Linux Gets Alpha GPU Drivers on Apple Silicon
After two years of work to reverse engineer Apple Silicon GPU instruction set and to implement the kernel driver, Asahi Linux has finally got an alpha-quality release of its GPU driver that is already good enough to run a smooth desktop experience and some games, Asahi developers Alyssa Rosenzweig and Asahi Lina say.
-
Meta Has Developed an AITemplate Which Transforms Deep Neural Networks into C++ Code
Meta AI has developed AITemplate (AIT), a unified open-source system with separate acceleration back ends for both AMD and NVIDIA GPU hardware technology. AITemplate (AIT) is a two-part Python framework for AI models that transforms them into much faster C++ code. It has a front-end that optimizes models through graph transformations and optimizations.
-
PrefixRL: Nvidia's Deep-Reinforcement-Learning Approach to Design Better Circuits
Nvidia has developed PrefixRL, an approach based on reinforcement learning (RL) to designing parallel-prefix circuits that are smaller and faster than those designed by state-of-the-art electronic-design-automation (EDA) tools.
-
NVIDIA Announces Next Generation AI Hardware H100 GPU and Grace CPU Superchip
At the recent GTC conference, NVIDIA announced their next generation processors for AI computing, the H100 GPU and the Grace CPU Superchip. Based on NVIDIA's Hopper architecture, the H100 includes a Transformer engine for faster training of AI models. The Grace CPU Superchip features 144 Arm cores and outperforms NVIDIA's current dual-CPU offering on the SPECrate 2017_int_base benchmark.
-
Ten Lessons from Three Generations of Tensor Processing Units
A recent report published by Google’s TPU group highlights ten takeaways from developing three generations of tensor processing units. The authors also discuss how their previous experience will affect the development of future tensor processing units.
-
Microsoft Introduces NVads A10 V5 Azure VMs in Preview for Graphics-Heavy Workloads
Microsoft recently announced the NVads A10 v5 series in preview. These virtual machines (VMs) are powered by NVIDIA A10 GPUs and AMD EPYC 74F3V(Milan) CPUs with a base frequency of 3.2 GHz and an all-core peak frequency of 4.0 GHz.
-
AMD Introduces Its Deep-Learning Accelerator Instinct MI200 Series GPUs
In its recent Accelerated Data Center Premiere Keynote, AMD unveiled its MI200 accelerator series Instinct MI250x and slightly lower-end Instinct MI250 GPUs. Designed with CDNA-2 architecture and TSMC’s 6nm FinFET lithography, the high-end MI250X provides 47.9 TFLOPs peak double precision performance and memory that will allow training larger deep networks by minimizing model sharding.
-
AWS Announces the Availability of EC2 Instances (G5) with NVIDIA A10G Tensor Core GPUs
Recently AWS announced the availability of new G5 instances, which feature up to eight NVIDIA A10G Tensor Core GPUs. These instances are powered by second-generation AMD EPYC processors.
-
Amazon Releases DL1 Instances Powered by Gaudi Accelerators
Amazon recently announced the general availability of the EC2 DL1 instances powered by Gaudi accelerators from Habana Labs. The new instances promise better price performances in training deep learning models for use cases such as computer vision, natural language processing, autonomous vehicle perception and recommendation engines.
-
OpenAI Releases Triton, Python-Based Programming Language for AI Workload Optimization
OpenAI released their newest language, Triton, an open-source programming language that enables researchers to write highly efficient GPU code for AI workloads. Triton is Python-compatible and allows new users to achieve expert-quality results in only 25 lines of code. The code is written in Python using Triton’s libraries, which are then JIT-compiled to run on the GPU.
-
Microsoft Announces the General Availability of Azure ND A100 V4 Cloud GPU Instances
Recently Microsoft announced the general availability of the Azure ND A100 v4 Cloud GPU instances—powered by NVIDIA A100 Tensor Core GPUs. These Virtual Machines (VMs) are targeted at customers with high performance and demanding workloads like Artificial Intelligence (AI) and Machine Learning (ML) workloads.
-
Deno 1.8 Ships with WebGPU Support, Dynamic Permissions, and More
Deno 1.8 recently shipped with plenty of new features, including WebGPU support, internationalization APIs, stabilized import maps, support for fetching private modules, and more. Deno permissions, links, and symlinks are now stable. Deno 1.8 additionally ships with TypeScript 4.2.
-
Is Julia Production Ready? Q&A with Bogumił Kamiński
On the heels of JuliaCon 2020, SGH Warsaw School of Economics professor and DataFrames.jl maintainer Bogumił Kamiński summarized the status of the language and its ecosystem and stated that Julia is finally production-ready. InfoQ has taken the chance to speak with professor Kamiński.