InfoQ Homepage Open Source Content on InfoQ
-
350PB, Millions of Events, One System: Inside Uber’s Cross-Region Data Lake and Disaster Recovery
Uber’s HiveSync is a sharded, cross-region batch replication system keeping Hive/HDFS data consistent across multiple regions. Handling 5M daily Hive events and 8PB of data replication, it uses event-driven jobs, hybrid RPC and DistCp strategies, DAG-based orchestration, and dynamic sharding, enabling disaster recovery, horizontal scaling, and 99.99% cross-region data accuracy.
-
Google Releases Gemma Scope 2 to Deepen Understanding of LLM Behavior
Gemma Scope 2 is a suite of tools designed to interpret the behavior of Gemini 3 models, enabling researchers to analyze emergent model behaviors, audit and debug AI agents, and devise mitigation strategies against security issues like jailbreaks, hallucinations and sycophancy.
-
NVIDIA Releases Open Models, Datasets, and Tools across AI, Robotics, and Autonomous Driving
NVIDIA has released a set of open models, datasets, and development tools covering language, agentic systems, robotics, autonomous driving, and biomedical research. The update expands several existing NVIDIA model families and makes accompanying training data and reference implementations available through GitHub, Hugging Face, and NVIDIA’s developer platforms.
-
Microsoft Research Develops Novel Approaches to Enforce Privacy in AI Models
A team of AI researchers at Microsoft introduces two novel approaches for enforcing contextual integrity in large language models: PrivacyChecker, an open-source lightweight module that acts as a privacy shield during inference, and CI-CoT + CI-RL, an advanced training method designed to teach models to reason about privacy.
-
Open-Source Agent Sandbox Enables Secure Deployment of AI Agents on Kubernetes
The Agent Sandbox is an open-source Kubernetes controller that provides a declarative API for managing a single, stateful pod with stable identity and persistent storage. It is particularly well suited for creating isolated environments to execute untrusted, LLM-generated code, as well as for running other stateful workloads.
-
MinIO GitHub Repository in Maintenance Mode: What's Next for the Open Source Object Storage?
After a contentious license change and the removal of administrator functionalities from the console, the company behind the popular open-source object storage server Minio recently announced that the project will now enter maintenance mode. The change has raised discussion in the community about the need for a fork, the challenges of open source projects, and the current alternatives.
-
Lessons Learned from Migrating a Legacy Test Suite to Gauge with Kotlin
Liran Yushinsky shared how his team replaced brittle bash and kubectl tests with a unified Kotlin + Gauge framework. Using Fabric8, Terraform, and Ansible, they automated their test environments. Feedback loops dropped from hours to minutes, developers joined testing efforts, and shared ownership boosted quality and release speed.
-
MySQL Repository Analysis Reveals Declining Development and Shrinking Contributor Base
A recent report has analyzed the repository statistics of the MySQL server to evaluate the project's status, Oracle's commitment to MySQL, and the future of the community edition.
-
Google Launches Agent Development Kit for Go
Google has added support for the Go language to its Agent Development Kit (ADK), enabling Go developers to build and manage agents in an idiomatic way that leverages the language's strong concurrency and typing features.
-
Google Brings Colab Integration to Visual Studio Code
Google has announced the availability of a new Visual Studio Code extension that connects local notebooks to a Colab runtime. This allows developers to unify their previously separate local development setup and web-based Colab environment.
-
Google Launches Code Wiki, an AI-Driven System for Continuous, Interactive Code Documentation
Google has introduced Code Wiki, a new platform designed to keep software documentation continuously synchronized with the code it describes. The system generates a structured wiki for each repository, automatically updates it after every change, and powers an integrated chat interface that understands the entire codebase.
-
After Seven Years, Google Reinvents Android Navigation with Jetpack Navigation 3
Google has released the new Jetpack Navigation 3 library, which redesigns from the ground up notification handling in Android apps. The new library gives full control on the back stack and integrates seamlessly with Jetpack Compose's state management.
-
Embedding Atlas: Apple’s Open-Source Tool for Exploring Large-Scale Embeddings Locally
Apple has introduced Embedding Atlas, a new open-source tool for visualizing and exploring large-scale embeddings interactively. Designed for researchers, data scientists, and developers, the platform provides a fast and intuitive way to analyze complex, high-dimensional data—from text embeddings to multimodal representations—without requiring any backend infrastructure or external data upload.
-
NVIDIA Introduces OmniVinci, a Research-Only LLM for Cross-Modal Understanding
NVIDIA has introduced OmniVinci, a large language model designed to understand and reason across multiple input types — including text, vision, audio, and even robotics data. The project, developed by NVIDIA Research, aims to push machine intelligence closer to human-like perception by unifying how models interpret the world across different sensory streams.
-
DeepSeek AI Unveils DeepSeek-OCR: Vision-Based Context Compression Redefines Long-Text Processing
DeepSeek AI has developed DeepSeek-OCR, an open-source system that uses optical 2D mapping to compress long text passages. This approach aims to improve how large language models (LLMs) handle text-heavy inputs.