InfoQ Homepage DevOps Content on InfoQ
-
How Meta Uses Precision Time Protocol to Handle Leap Seconds
For systems that require strict synchronization—like distributed databases, telemetry pipelines, or event-driven architectures—handling leap seconds incorrectly can lead to data loss, duplication, or inconsistencies. As such, managing leap seconds accurately ensures system reliability and consistency across environments that depend on high-precision time.
-
QCon London 2025: Insights from 20+ Years in Mission-Critical Infrastructure
Matthew Liste, head of infrastructure at American Express, shared insights at QCon London 2025 on building robust cloud platforms in financial services. With 20+ years of experience, he emphasized stability, security, scalability, the value of interchangeable components, and long-term sustainability, urging professionals to maintain focus and foster a strong team culture for platform engineering.
-
Lessons on How to Get Timeouts, Retries and Idempotency Right from Sam Newman at QCon London
At QCon London, Sam Newman - the architect who has attributed the coining of the term microservices, went back to the basics to underline the three critical things to get right when working with distributed systems: timeouts, retries and idempotency. Through the talk, he provided mechanisms allowing distributed systems to be more robust.
-
QCon London 2025: Distributed Event-Driven Architectures across Multi-Cloud Boundaries
At QCon London 2025, Teena Idnani from Microsoft addressed the rise of multi-cloud adoption, revealing that 89% of organizations embrace this strategy. Using the fictional FinBank, she showcased practical strategies to overcome latency, resilience, event ordering, and duplication challenges, emphasizing the importance of security, observability, and continuous team education.
-
Optimize AI Workloads: Google Cloud’s Tips and Tricks
Google Cloud has announced a suite of new tools and features designed to help organizations reduce costs and improve efficiency of AI workloads across their cloud infrastructure. The announcement comes as enterprises increasingly seek ways to optimize spending on AI initiatives while maintaining performance and scalability.
-
QCon London: a Three-Step Blueprint for Managing Open Source Risk
At QCon London 2025, Johnson Matthey's vulnerability manager, Celine Pypaert, discussed managing open-source dependency risks while maintaining momentum in innovation. She described a three-part blueprint for handling the security challenges that arise with the now widespread use of open-source dependencies.
-
QCon London 2025: Kraken Technology's Approach to Renewable Energy Management
Kevin Bowman of Kraken Technology unveiled how serverless cloud solutions are revolutionizing the UK's power grid management amid a 40% surge in renewable energy. By leveraging intelligent control systems, battery storage and microservices, Kraken optimizes energy flow and grid stability while advocating for consumer cooperation and continued investment in cloud technologies for future resilience.
-
QCon London: Monzo's Recipe for Developer Experience: Assemble, Build, Communicate
Fabien Deshayes spoke on how Monzo has created and optimised their developer experience teams in a talk at QCon London 2025. Deshayes outlined some techniques for building an effective Developer Experience platform, focusing on three key aspects: assembling effective teams, building impactful products, and communicating value across the organisation.
-
How Meta is Using a New Metric for Developers: Diff Authoring Time
Diff Authoring Time (DAT). DAT is a new metric developed by engineers at Meta to measure the duration required for developers to submit changes, known as "diffs," to the codebase. By tracking the time from the initiation of a code change to its submission, DAT offers insights into the efficiency of the development process and helps identify areas for improvement.
-
AWS CodeBuild Adds Parallel Test Execution for Faster CI
AWS CodeBuild now supports parallel test execution, significantly reducing build times by allowing concurrent test suite runs across multiple environments. This feature addresses long CI pipeline cycles that impede productivity and increase costs. With intelligent test distribution and automatic result merging, developers can enhance efficiency and streamline feedback loops.
-
How SREs and GenAI Work Together to Decrease eBay's Downtime: an Architect's Insights at KubeCon EU
During his KubeCon EU keynote, Vijay Samuel, Principal MTS Architect at eBay, shared his team’s experience of enhancing incident response capabilities by incorporating ML and LLM building blocks. They realised that GenAIs are not a silver bullet but can help engineers through complex incident investigations through logs, traces, and dashboard explanations.
-
How Observability Can Improve the UX of LLM Based Systems: Insights of Honeycomb's CEO at KubeCon EU
During her KubeCon Europe keynote, Christine Yen, CEO and co-founder of Honeycomb, provided insights on how observability can help cope with the rapid shifts introduced by the integration of LLMs in software systems, which transformed not only the way we develop software but also the release methodology. She explained how to adapt your development feedback loop based on production observations.
-
Cloudflare Enables Remote Hosting for Model Context Protocol (MCP) Servers
Cloudflare revolutionizes AI integration with the launch of remote Model Context Protocol (MCP) servers, enhancing accessibility for developers. This advancement allows seamless AI interactions with external services, offering simplified deployment, built-in OAuth for security, and expanded use cases. Embrace innovative workflows and empower your applications on a global scale.
-
GitLab Introduces GitLab Duo with Amazon Q
The integration of Amazon Q Developer with GitLab, introduced as GitLab Duo with Amazon Q, embeds generative AI capabilities directly into GitLab, enabling developers to receive AI-driven assistance for tasks such as feature development, code upgrades, reviews, and unit testing.
-
Azure Container Apps Serverless GPUs Reach General Availability with NVIDIA NIM Support
Azure has launched Serverless GPUs for Azure Container Apps, enabling scalable, on-demand execution of AI workloads using NVIDIA A100 and T4 GPUs. This groundbreaking feature supports NVIDIA NIM microservices, simplifying deployment and management while optimizing costs. Developers can focus on applications, as Azure manages infrastructure, offering a flexible solution for diverse AI scenarios.