InfoQ Homepage Hugging Face Content on InfoQ
-
Daggr Introduced as an Open-Source Python Library for Inspectable AI Workflows
The Gradio team has released Daggr, a new open-source Python library designed to simplify the construction and debugging of multi-step AI workflows. Daggr allows developers to define workflows programmatically in Python while automatically generating a visual canvas that exposes intermediate states, inputs, and outputs for each step in the pipeline.
-
Google BigQuery Adds SQL-Native Managed Inference for Hugging Face Models
Google has launched SQL-native managed inference for 180,000+ Hugging Face models in BigQuery. The preview release collapses the ML lifecycle into a unified SQL interface, eliminating the need for separate Kubernetes or Vertex AI management. Key features include automated resource governance via endpoint_idle_ttl and secure identity-based execution using existing data warehouse permissions.
-
Hugging Face Releases FineTranslations, a Trillion-Token Multilingual Parallel Text Dataset
Hugging Face has released FineTranslations, a large-scale multilingual dataset containing more than 1 trillion tokens of parallel text across English and 500+ languages. The dataset was created by translating non-English content from the FineWeb2 corpus into English using Gemma3 27B, with the full data generation pipeline designed to be reproducible and publicly documented.
-
NVIDIA Releases Open Models, Datasets, and Tools across AI, Robotics, and Autonomous Driving
NVIDIA has released a set of open models, datasets, and development tools covering language, agentic systems, robotics, autonomous driving, and biomedical research. The update expands several existing NVIDIA model families and makes accompanying training data and reference implementations available through GitHub, Hugging Face, and NVIDIA’s developer platforms.
-
Transformers v5 Introduces a More Modular and Interoperable Core
Hugging Face has released the first candidate for Transformers v5, marking a significant evolution from v4 five years ago. The library has grown from a specialized model toolkit to a critical resource in AI development, achieving over three million installations daily and more than 1.2 billion total installs.
-
AnyLanguageModel: Unified API for Local and Cloud LLMs on Apple Platforms
Developers on Apple platforms often face a fragmented ecosystem when using language models. Local models via Core ML or MLX offer privacy and offline capabilities, while cloud services like OpenAI, Anthropic, or Google Gemini provide advanced features. AnyLanguageModel, a new Swift package, simplifies integration by offering a unified API for both local and remote models.
-
Meta and Hugging Face Launch OpenEnv, a Shared Hub for Agentic Environments
Meta's PyTorch team and Hugging Face have launched OpenEnv, an open-source platform for standardizing AI agent environments. The OpenEnv Hub features secure sandboxes that define the necessary tools and APIs for safe, predictable AI operation. Developers can explore, contribute, and refine environments, paving the way for scalable agent development in the open-source RL ecosystem.
-
Hugging Face Introduces RTEB, a New Benchmark for Evaluating Retrieval Models
Hugging Face unveils the Retrieval Embedding Benchmark (RTEB), a pioneering framework to assess embedding models' real-world retrieval accuracy. By merging public and private datasets, RTEB narrows the "generalization gap," ensuring models perform reliably across critical sectors. Now live and inviting collaboration, RTEB aims to set a community standard in AI retrieval evaluation.
-
Hugging Face Introduces mmBERT, a Multilingual Encoder for 1,800+ Languages
Hugging Face has released mmBERT, a new multilingual encoder trained on more than 3 trillion tokens across 1,833 languages. The model builds on the ModernBERT architecture and is the first to significantly improve upon XLM-R, a long-time baseline for multilingual understanding tasks.
-
Baidu’s PP-OCRv5 Released on Hugging Face, Outperforming VLMs in OCR Benchmarks
Baidu has released PP-OCRv5 on Hugging Face, a new optical character recognition (OCR) model built to outperform large vision-language models (VLMs) in specialized text recognition tasks. Unlike general-purpose architectures such as Gemini 2.5 Pro, Qwen2.5-VL, or GPT-4o, which handle OCR as part of broader multimodal workflows, PP-OCRv5 is purpose-built for accuracy, efficiency, and speed.
-
Hugging Face Releases FinePDFs: a 3-Trillion-Token Dataset Built from PDFs
Hugging Face has unveiled FinePDFs, the largest publicly available corpus built entirely from PDFs. The dataset spans 475 million documents in 1,733 languages, totaling roughly 3 trillion tokens. At 3.65 terabytes in size, FinePDFs introduces a new dimension to open training datasets by tapping into a resource long considered too complex and expensive to process.
-
Hugging Face Introduces AI Sheets, a No-Code Tool for Dataset Transformation
Hugging Face has released AI Sheets, an open-source application designed to let users build, transform, and enrich datasets using AI models through a spreadsheet-like interface. The tool, available both on the Hub and for local deployment, allows users to experiment with thousands of open models, including OpenAI’s gpt-oss, without requiring code.
-
Hugging Face Releases Trackio, a Lightweight Open-Source Experiment Tracking Library
Hugging Face has introduced Trackio, a new open-source Python library for experiment tracking designed to be lightweight, transparent, and easy to integrate. Built as a drop-in replacement for Weights & Biases (wandb), Trackio offers local dashboards by default and seamless syncing with Hugging Face Spaces for sharing and collaboration.
-
Hugging Face Launches Reachy Mini Robots for Human-Robot Interaction
Hugging Face has launched its Reachy Mini robots, now available for order. Designed for AI developers, researchers, and enthusiasts, the robots offer an exciting opportunity to experiment with human-robot interaction and AI applications.
-
MiniMax Releases M1: a 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks
MiniMax has introduced MiniMax-M1, a new open-weight reasoning model built to handle extended contexts and complex problem-solving with high efficiency. Built on top of the earlier MiniMax-Text-01, M1 features a hybrid Mixture-of-Experts (MoE) architecture and a novel “lightning attention” mechanism.