InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News NVIDIA NIM Now Available on Hugging Face with Inference-as-a-Service

AI, ML & Data Engineering

NVIDIA NIM Now Available on Hugging Face with Inference-as-a-Service

This item in japanese

Aug 11, 2024 1 min read

Write & Win: InfoQ Contest

Join the contest to:

Win a conference ticket
Boost your profile
Help the community

Send your article proposal

Hugging Face has announced the launch of an inference-as-a-service capability powered by NVIDIA NIM. This new service will provide developers easy access to NVIDIA-accelerated inference for popular AI models.

The new service allows developers to rapidly deploy leading large language models such as the Llama 3 family and Mistral AI models with optimization from NVIDIA NIM microservices running on NVIDIA DGX Cloud. This will help developers quickly prototype with open-source AI models hosted on the Hugging Face Hub and deploy them in production.

The Hugging Face inference-as-a-service on NVIDIA DGX Cloud powered by NIM microservices offers easy access to compute resources that are optimized for AI deployment. The NVIDIA DGX Cloud platform is purpose-built for generative AI and provides scalable GPU resources that support every step of AI development, from prototype to production.

To use the service, users must have access to an Enterprise Hub organization and a fine-grained token for authentication. The NVIDIA NIM Endpoints for supported Generative AI models can be found on the model page of the Hugging Face Hub.

Currently, the service only supports the chat.completions.create and models.list APIs, but Hugging Face is working on extending this while adding more models. Usage of Hugging Face Inference-as-a-Service on DGX Cloud is billed based on the compute time spent per request, using NVIDIA H100 Tensor Core GPUs.

Hugging Face is also working with NVIDIA to integrate the NVIDIA TensorRT-LLM library into Hugging Face's Text Generation Inference (TGI) framework to improve AI inference performance and accessibility. In addition to the new Inference-as-a-Service, Hugging Face also offers Train on DGX Cloud, an AI training service.

Clem Delangue, CEO at Hugging Face, posted on his X account:

Very excited to see that Hugging Face is becoming the gateway for AI compute!

And Kaggle Master Rohan Paul shared a post on X saying:

So, we can use open models with the accelerated compute platform of NVIDIA DGX Cloud for inference serving. Code is fully compatible with OpenAI API, allowing you to use the openai’s sdk for inference.

At SIGGRAPH, NVIDIA also introduced generative AI models and NIM microservices for the OpenUSD framework to accelerate developers’ abilities to build highly accurate virtual worlds for the next evolution of AI.

About the Author

Daniel Dominguez

Daniel is the Managing Partner at SamXLabs an AWS Partner Network company. He has over 13 years of experience in software product development for startups and Fortune 500 companies. Daniel holds a Machine Learning specialization from the University of Washington. He is passionate about leveraging AI and cloud computing to create innovative solutions. As an AWS Community Builder in the Machine Learning tier, Daniel is committed to sharing knowledge and driving innovation in software products.

Show moreShow less

Write Your Way to a QCon or InfoQ Dev Summit!

Join the InfoQ article competition to win a complimentary ticket to QCon or InfoQ Dev Summit! We're seeking in-depth technical articles written by software developers for software developers.

Send your proposal

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

NVIDIA NIM Now Available on Hugging Face with Inference-as-a-Service

Write & Win: InfoQ Contest

About the Author

Daniel Dominguez

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

Write Your Way to a QCon or InfoQ Dev Summit!

The InfoQ Newsletter