InfoQ Homepage Model Inference Content on InfoQ

News

RSS Feed

AI, ML & Data Engineering

Hugging Face Expands Serverless Inference Options with New Provider Integrations

Hugging Face has launched the integration of four serverless inference providers Fal, Replicate, SambaNova, and Together AI, directly into its model pages. These providers are also integrated into Hugging Face's client SDKs for JavaScript and Python, allowing users to run inference on various models with minimal setup.

Daniel Dominguez
on Feb 04, 2025
AI, ML & Data Engineering

Nvidia Announces Arm-Powered Project Digits, Its First Personal AI Computer

Capable of running 200B-parameter models, Nvidia Project Digits packs the new Nvidia GB10 Grace Blackwell chip to allow developers to fine-tune and run AI models on their local machines. Starting at $3,000, Project Digits targets AI researchers, data scientists, and students to allow them to create their models using a desktop system and then deploy them on cloud or data center infrastructure.

Sergio De Simone
on Jan 13, 2025
DevOps

Meta Optimises AI Inference by Improving Tail Utilisation

Meta (formerly Facebook) has reported substantial improvements in the efficiency and reliability of its machine-learning model serving infrastructure by focusing on optimising tail utilisation.

Matt Saunders
on Aug 02, 2024
Java

JLama: The First Pure Java Model Inference Engine Implemented With Vector API and Project Panama

Karpathy's 700-line llama.c inference interface demystified how developers can interact with LLMs. Even before that, JLama started its journey of becoming the first pure Java-implemented inference engine for any Hugging Face model, from Gemma to Mixtral. Leveraging the new Vector API and PanamaTensorOperations class with native fallback the library is available in Maven Central.

Olimpiu Pop
on May 29, 2024

Topics

Practical Benchmarking: How to Detect Performance Changes in Noisy Results

If Architectural Experimentation Is So Great, Why Aren’t You Doing It?

Understanding What Really Matters for Developer Productivity: A Conversation with Lizzie Matusov

The (Not So) Hidden Social Drivers behind the Highest Performing Engineering Teams

Checklist for Kubernetes in Production: Best Practices for SREs

Helpful links

Choose your language

News

Hugging Face Expands Serverless Inference Options with New Provider Integrations

Nvidia Announces Arm-Powered Project Digits, Its First Personal AI Computer

Meta Optimises AI Inference by Improving Tail Utilisation

JLama: The First Pure Java Model Inference Engine Implemented With Vector API and Project Panama

OpenSSF Publishes Security Baseline for Open-Source Projects

Practical Benchmarking: How to Detect Performance Changes in Noisy Results

A Zero Trust Future for Applications: Practical Implementation and Pitfalls

If Architectural Experimentation Is So Great, Why Aren’t You Doing It?

How Monzo Bank Built a Cost-Effective, Unorthodox Backup System to Ensure Resilient Banking

Facilitating Software Architecture with Andrew Harmel-Law

The (Not So) Hidden Social Drivers behind the Highest Performing Engineering Teams

Investing in Open Source: The Open Source Pledge and Why it Matters

Using Artificial Intelligence for Analysis of Automated Testing Results

Understanding What Really Matters for Developer Productivity: A Conversation with Lizzie Matusov

OpenAI Introduces Software Engineering Benchmark

Google DeepMind Enhances AMIE for Long-Term Disease Management

GitLab Launches Support for Self-Hosted AI Platforms

Checklist for Kubernetes in Production: Best Practices for SREs

How Engineering Teams Are Tackling AI, Platform Engineering & DevEx: InfoQ Dev Summit Boston 2025

QCon London

InfoQ Dev Summit Boston

InfoQ Dev Summit Munich

QCon San Francisco

InfoQ Dev Summit New York

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News