InfoQ Homepage Gemini Content on InfoQ
-
Google Supercharges Gemini 3 Flash with Agentic Vision
Google has added agentic vision to Gemini 3 Flash, combining visual reasoning with code execution to "ground answers in visual evidence". According to Google, this not only improves accuracy, but more importantly unlocks entirely new AI-driven behaviors.
-
Google Introduces TranslateGemma Open Models for Multilingual Translation
Google has released TranslateGemma, a set of open translation models based on the Gemma 3 architecture, offering 4B, 12B, and 27B parameter variants designed to support machine translation across 55 languages and to run on platforms ranging from mobile and edge devices to consumer hardware and cloud accelerators.
-
Google and Retail Leaders Launch Universal Commerce Protocol to Power Next‑Generation AI Shopping
Google launched the Universal Commerce Protocol (UCP), an open standard co-developed with Shopify, Target, and others, enabling AI-driven shopping agents to complete tasks end-to-end from product discovery to checkout and post-purchase management. UCP aims to standardize commerce capabilities, support multiple payment providers, and expand globally. Shaping the next generation of agentic commerce.
-
FACTS Benchmark Suite Introduced to Evaluate Factual Accuracy of Large Language Models
A new industry benchmark aimed at systematically evaluating the factual accuracy of LLMs has been released with the launch of the FACTS Benchmark Suite. Developed by the FACTS team in collaboration with Kaggle, the suite expands earlier work on factual grounding and introduces a broader, multi-dimensional framework for measuring how reliably language models produce factually correct responses.
-
AlphaEvolve Enters Google Cloud as an Agentic System for Algorithm Optimization
Google Cloud announced the private preview of AlphaEvolve, a Gemini-powered coding agent designed to discover and optimize algorithms for complex engineering and scientific problems. The system is now available through an early access program on Google Cloud, targeting use cases where traditional brute-force or manual optimization methods struggle due to vast search spaces.
-
Replit Introduces New AI Integrations for Multi-Model Development
Replit has introduced Replit AI Integrations, a feature that lets users select third-party models directly inside the IDE and automatically generate the code needed to run inference.
-
Google Unveils Project Suncatcher, Envisioning AI Models Running in Space
Google has unveiled Project Suncatcher, a research initiative exploring how solar powered satellite constellations equipped with Tensor Processing Units TPUs could one day enable large scale artificial intelligence computation in space.
-
Android GenAI Prompt API Enables Natural Language Requests with Gemini Nano
The ML Kit GenAI Prompt API, now available in alpha, enables Android developers to send natural language and multimodal requests to Gemini Nano running on-device, extending the text summarization and image description capabilities introduced with the initial GenAI release.
-
Genkit Extension for Gemini CLI Brings Framework-Aware AI Assistance to the Terminal
Introducing Google's Genkit Extension for Gemini CLI: a groundbreaking tool that delivers framework-aware AI assistance directly to the terminal. Streamline your Genkit application development with context-aware code generation, debugging, and best practices—all without leaving the command line. Unleash productivity and innovation in building generative AI applications.
-
Google DeepMind Launches Gemini 2.5 Computer Use Model to Power UI-Controlling AI Agents
Google DeepMind has recently released the Gemini 2.5 Computer Use model, a specialized variant of its Gemini 2.5 Pro system designed to enable AI agents to interact directly with graphical user interfaces. The new model allows developers to build agents that can click, type, scroll, and manipulate interactive elements on web pages.
-
xAI Releases Grok 4 Fast with Lower Cost Reasoning Model
xAI has introduced Grok 4 Fast, a new reasoning model designed for efficiency and lower cost.
-
Baidu’s PP-OCRv5 Released on Hugging Face, Outperforming VLMs in OCR Benchmarks
Baidu has released PP-OCRv5 on Hugging Face, a new optical character recognition (OCR) model built to outperform large vision-language models (VLMs) in specialized text recognition tasks. Unlike general-purpose architectures such as Gemini 2.5 Pro, Qwen2.5-VL, or GPT-4o, which handle OCR as part of broader multimodal workflows, PP-OCRv5 is purpose-built for accuracy, efficiency, and speed.
-
Kaggle Introduces Game Arena to Benchmark AI Models in Strategic Games
Kaggle, in collaboration with Google DeepMind, has introduced Kaggle Game Arena, a platform designed to evaluate artificial intelligence models by testing their performance in strategy-based games.
-
Android Studio Narwhal Extends Gemini AI Capabilities
The latest Android Studio Narwhal 3 Feature Drop introduces enhancements aimed at boosting developer productivity, including support for resizable Compose previews, new app Backup & Restore tools, and expanded Gemini capabilities such as automatic code generation from UI screenshots.
-
Google Launches Gemini 2.5 Flash Image with Advanced Editing and Consistency Features
Google released Gemini 2.5 Flash Image (nicknamed nano-banana), its newest image generation and editing model. The system introduces several upgrades over earlier Flash models, including character consistency across prompts, multi-image fusion, precise prompt-based editing, and integration of world knowledge for semantic understanding.