InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage GPU Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

Microsoft Releases DeepSpeed-FastGen for High-Throughput Text Generation

Microsoft has announced the alpha release of DeepSpeed-FastGen, a system designed to improve the deployment and serving of large language models (LLMs). DeepSpeed-FastGen is the synergistic composition of DeepSpeed-MII and DeepSpeed-Inference . DeepSpeed-FastGen is based on the Dynamic SplitFuse technique. The system currently supports several model architectures.

Andrew Hoblitzell
on Nov 07, 2023
Development

Python-Like Numerical Computation Library MatX Brings Transforms as Operators and Other Features

Developed by Nvidia for its own GPUs, MatX is a C++ library that aims to bring near-native performance in numerical computing using a high-level syntax not far from those available in Python scipy or MATLAB. Its latest release brings a number of new features, including the possibility to use transforms as operators, new operators such as upsample, downsample, pwelch, and more.

Sergio De Simone
on Oct 23, 2023
DevOps

Google Cloud Ops Agent Can Now Monitor Nvidia GPUs

Google Cloud announced that Ops Agent, the agent for collecting telemetry from Compute Engine instances, can now collect and aggregate metrics from NVIDIA GPUs on VMs.

Claudio Masolo
on Oct 17, 2023
Cloud

Azure Previews ND H100 V5 Virtual Machines to Accelerate Generative AI

Azure recently announced the preview of the ND H100 v5, virtual machines that integrate the latest Nvidia H100 Tensor Core GPUs and support Quantum-2 InfiniBand networking. According to Microsoft, the new option will offer AI developers improved performance and scaling across thousands of GPUs.

Renato Losio
on Apr 08, 2023
AI, ML & Data Engineering

AWS and NVIDIA to Collaborate on Next-Gen EC2 P5 Instances for Accelerating Generative AI

AWS and NVIDIA announced the development of a highly scalable, on-demand AI infrastructure that is specifically designed for training large language models and creating advanced generative AI applications. The collaboration aims to create the most optimized and efficient system of its kind, capable of meeting the demands of increasingly complex AI tasks.

Daniel Dominguez
on Mar 24, 2023
AI, ML & Data Engineering

NVIDIA Kubernetes Device Plug-in Brings Temporal GPU Concurrency

Starting from the v12 release, the Nvidia GPU device plug-in framework started supporting time-sliced sharing between CUDA workloads on Kubernetes. This feature aims to prevent under-utilization of GPU units and make it easier to scale applications by leveraging concurrently-executing CUDA contexts.

Sabri Bolkar
on Dec 19, 2022
Development

Asahi Linux Gets Alpha GPU Drivers on Apple Silicon

After two years of work to reverse engineer Apple Silicon GPU instruction set and to implement the kernel driver, Asahi Linux has finally got an alpha-quality release of its GPU driver that is already good enough to run a smooth desktop experience and some games, Asahi developers Alyssa Rosenzweig and Asahi Lina say.

Sergio De Simone
on Dec 11, 2022
AI, ML & Data Engineering

Meta Has Developed an AITemplate Which Transforms Deep Neural Networks into C++ Code

Meta AI has developed AITemplate (AIT), a unified open-source system with separate acceleration back ends for both AMD and NVIDIA GPU hardware technology. AITemplate (AIT) is a two-part Python framework for AI models that transforms them into much faster C++ code. It has a front-end that optimizes models through graph transformations and optimizations.

Daniel Dominguez
on Oct 24, 2022
AI, ML & Data Engineering

PrefixRL: Nvidia's Deep-Reinforcement-Learning Approach to Design Better Circuits

Nvidia has developed PrefixRL, an approach based on reinforcement learning (RL) to designing parallel-prefix circuits that are smaller and faster than those designed by state-of-the-art electronic-design-automation (EDA) tools.

Claudio Masolo
on Aug 04, 2022
AI, ML & Data Engineering

NVIDIA Announces Next Generation AI Hardware H100 GPU and Grace CPU Superchip

At the recent GTC conference, NVIDIA announced their next generation processors for AI computing, the H100 GPU and the Grace CPU Superchip. Based on NVIDIA's Hopper architecture, the H100 includes a Transformer engine for faster training of AI models. The Grace CPU Superchip features 144 Arm cores and outperforms NVIDIA's current dual-CPU offering on the SPECrate 2017_int_base benchmark.

Anthony Alford
on May 03, 2022
AI, ML & Data Engineering

Ten Lessons from Three Generations of Tensor Processing Units

A recent report published by Google’s TPU group highlights ten takeaways from developing three generations of tensor processing units. The authors also discuss how their previous experience will affect the development of future tensor processing units.

Sabri Bolkar
on Apr 19, 2022
Cloud

Microsoft Introduces NVads A10 V5 Azure VMs in Preview for Graphics-Heavy Workloads

Microsoft recently announced the NVads A10 v5 series in preview. These virtual machines (VMs) are powered by NVIDIA A10 GPUs and AMD EPYC 74F3V(Milan) CPUs with a base frequency of 3.2 GHz and an all-core peak frequency of 4.0 GHz.

Steef-Jan Wiggers
on Mar 31, 2022
AI, ML & Data Engineering

AMD Introduces Its Deep-Learning Accelerator Instinct MI200 Series GPUs

In its recent Accelerated Data Center Premiere Keynote, AMD unveiled its MI200 accelerator series Instinct MI250x and slightly lower-end Instinct MI250 GPUs. Designed with CDNA-2 architecture and TSMC’s 6nm FinFET lithography, the high-end MI250X provides 47.9 TFLOPs peak double precision performance and memory that will allow training larger deep networks by minimizing model sharding.

Sabri Bolkar
on Dec 03, 2021
Cloud

AWS Announces the Availability of EC2 Instances (G5) with NVIDIA A10G Tensor Core GPUs

Recently AWS announced the availability of new G5 instances, which feature up to eight NVIDIA A10G Tensor Core GPUs. These instances are powered by second-generation AMD EPYC processors.

Steef-Jan Wiggers
on Nov 19, 2021
Cloud

Amazon Releases DL1 Instances Powered by Gaudi Accelerators

Amazon recently announced the general availability of the EC2 DL1 instances powered by Gaudi accelerators from Habana Labs. The new instances promise better price performances in training deep learning models for use cases such as computer vision, natural language processing, autonomous vehicle perception and recommendation engines.

Renato Losio
on Nov 07, 2021

Newer News

Older News

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News