InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Google Researchers Propose Bayesian Teaching Method for Large Language Models
Google Research has proposed a training method that teaches large language models to approximate Bayesian reasoning by learning from the predictions of an optimal Bayesian system. The approach focuses on improving how models update beliefs as they receive new information during multi-step interactions.
-
DoorDash Builds LLM Conversation Simulator to Test Customer Support Chatbots at Scale
DoorDash engineers built a simulation and evaluation flywheel to test large language model customer support chatbots at scale. The system generates multi-turn synthetic conversations using historical transcripts and backend mocks, evaluates outcomes with an LLM-as-judge framework, and enables rapid iteration on prompts, context, and system design before production deployment.
-
AWS Launches Strands Labs for Experimental AI Agent Projects
Amazon Web Services has introduced Strands Labs, a new GitHub organization created to host experimental projects related to agent-based AI development.
-
Claude Opus 4.6 Introduces Adaptive Reasoning and Context Compaction for Long-Running Agents
Anthropic’s Claude Opus 4.6 introduces "Adaptive Thinking" and a "Compaction API" to solve context rot in long-running agents. The model supports a 1M token context window with 76% multi-needle retrieval accuracy. While leading benchmarks in agentic coding, independent tests show a 49% detection rate for binary backdoors, highlighting the gap between SOTA claims and production security.
-
AI-Powered Bot Compromises GitHub Actions Workflows across Microsoft, DataDog, and CNCF Projects
AI-powered bot hackerbot-claw exploited GitHub Actions workflows across Microsoft, DataDog, and CNCF projects over 7 days using 5 attack techniques. Bot achieved RCE in 5 of 7 targets, stole GitHub token from awesome-go (140k stars), and fully compromised Aqua Security's Trivy. Campaign included first documented AI-on-AI attack where bot attempted prompt injection against Claude Code.
-
GitLab Suggests AI Can Detect Vulnerabilities But it's AI Governance That Determines Risk
Artificial intelligence is rapidly transforming how software vulnerabilities are detected, but questions about who governs the risks AI exposes, and how those risks are acted on, are becoming increasingly urgent, according to a new blog post by GitLab.
-
Cloudflare Releases Experimental Next.js Alternative Built with AI Assistance
Cloudflare released vinext, an experimental Next.js reimplementation built on Vite by one engineer, with AI guidance over one week, for $1,100. Early benchmarks show 4.4x faster builds, but Cloudflare cautions it's untested at scale. Missing static pre-rendering. HN reaction skeptical, noting Vite does the heavy lifting. Already running on CIO.gov despite experimental status.
-
Google BigQuery Previews Cross-Region SQL Queries for Distributed Data
Google Cloud has recently announced the preview of a global queries feature for BigQuery. The new option lets developers run SQL queries across data stored in different geographic regions without first moving or copying the data to aggregate the results.
-
Scaling Human Judgment: How Dropbox Uses LLMs to Improve Labeling for RAG Systems
To improve the relevance of responses produced by Dropbox Dash, Dropbox engineers began using LLMs to augment human labelling, which plays a crucial role in identifying the documents that should be used to generate the responses. Their approach offers useful insights for any system built on retrieval-augmented generation (RAG).
-
New Research Reassesses the Value of AGENTS.md Files for AI Coding
Despite widespread industry recommendations, a new ETH Zurich paper concludes that AGENTS.md files may often hinder AI coding agents. The researchers recommend omitting LLM-generated context files entirely and limiting human-written instructions to non-inferable details, such as highly specific tooling or custom build commands.
-
OpenAI Secures AWS Distribution for Frontier Platform in $110B Multi-Cloud Deal
OpenAI's $110B funding includes AWS as the exclusive third-party distributor for the Frontier agent platform, introducing an architectural split: Azure retains stateless API exclusivity; AWS gains stateful runtime environments via Bedrock. Deal expands the existing $38B AWS agreement by $100B and commits 2GW of Trainium capacity.
-
QCon AI Boston’s Early Program Focuses on the Engineering Work behind Production AI
As teams move AI from pilots to production, the hard problems shift from demos to dependability. The first confirmed talks for QCon AI Boston (June 1–2) focus on context engineering, agent explainability, reasoning beyond basic RAG, evaluation, governance, and platform infrastructure needed to run AI reliably under real-world constraints.
-
GitHub Data Shows AI Tools Creating "Convenience Loops" That Reshape Developer Language Choices
GitHub’s Octoverse 2025 report reveals a "convenience loop" where AI coding assistants drive language choice. TypeScript’s 66% surge to the #1 spot highlights a shift toward static typing, as types provide essential guardrails for LLMs. While Python leads in AI research, the industry is consolidating around stacks that minimize AI friction, creating a barrier for new, niche languages.
-
Cloudflare Debuts Markdown for Agents and Content Signals to Guide AI Crawlers
Cloudflare has introduced “Markdown for Agents,” a feature that lets AI crawlers request Markdown versions of web pages. The company pairs the feature with a proposed “Content Signals” mechanism that lets publishers declare whether their content may be used for AI training, search indexing or inference.
-
Google Launches Automated Review Feature in Gemini CLI Conductor
Google has enhanced its Gemini CLI extension, Conductor, by adding support for automated reviews. The company says this update allows Conductor "to go beyond just planning and execution into validation", enabling it to check AI-generated code for quality and adherence to guidelines, strengthening confidence, safety, and control in AI-assisted development workflows.