InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage DevOps Content on InfoQ

News

RSS Feed

Newer Older

DevOps

How Meta Uses Precision Time Protocol to Handle Leap Seconds

For systems that require strict synchronization—like distributed databases, telemetry pipelines, or event-driven architectures—handling leap seconds incorrectly can lead to data loss, duplication, or inconsistencies. As such, managing leap seconds accurately ensures system reliability and consistency across environments that depend on high-precision time.

Craig Risi
on Apr 11, 2025
Cloud

QCon London 2025: Insights from 20+ Years in Mission-Critical Infrastructure

Matthew Liste, head of infrastructure at American Express, shared insights at QCon London 2025 on building robust cloud platforms in financial services. With 20+ years of experience, he emphasized stability, security, scalability, the value of interchangeable components, and long-term sustainability, urging professionals to maintain focus and foster a strong team culture for platform engineering.

Steef-Jan Wiggers
on Apr 10, 2025
Architecture & Design

Lessons on How to Get Timeouts, Retries and Idempotency Right from Sam Newman at QCon London

At QCon London, Sam Newman - the architect who has attributed the coining of the term microservices, went back to the basics to underline the three critical things to get right when working with distributed systems: timeouts, retries and idempotency. Through the talk, he provided mechanisms allowing distributed systems to be more robust.

Olimpiu Pop
on Apr 09, 2025
Cloud

QCon London 2025: Distributed Event-Driven Architectures across Multi-Cloud Boundaries

At QCon London 2025, Teena Idnani from Microsoft addressed the rise of multi-cloud adoption, revealing that 89% of organizations embrace this strategy. Using the fictional FinBank, she showcased practical strategies to overcome latency, resilience, event ordering, and duplication challenges, emphasizing the importance of security, observability, and continuous team education.

Steef-Jan Wiggers
on Apr 09, 2025
DevOps

Optimize AI Workloads: Google Cloud’s Tips and Tricks

Google Cloud has announced a suite of new tools and features designed to help organizations reduce costs and improve efficiency of AI workloads across their cloud infrastructure. The announcement comes as enterprises increasingly seek ways to optimize spending on AI initiatives while maintaining performance and scalability.

Claudio Masolo
on Apr 09, 2025
DevOps

QCon London: a Three-Step Blueprint for Managing Open Source Risk

At QCon London 2025, Johnson Matthey's vulnerability manager, Celine Pypaert, discussed managing open-source dependency risks while maintaining momentum in innovation. She described a three-part blueprint for handling the security challenges that arise with the now widespread use of open-source dependencies.

Matt Saunders
on Apr 08, 2025
Cloud

QCon London 2025: Kraken Technology's Approach to Renewable Energy Management

Kevin Bowman of Kraken Technology unveiled how serverless cloud solutions are revolutionizing the UK's power grid management amid a 40% surge in renewable energy. By leveraging intelligent control systems, battery storage and microservices, Kraken optimizes energy flow and grid stability while advocating for consumer cooperation and continued investment in cloud technologies for future resilience.

Steef-Jan Wiggers
on Apr 08, 2025
DevOps

QCon London: Monzo's Recipe for Developer Experience: Assemble, Build, Communicate

Fabien Deshayes spoke on how Monzo has created and optimised their developer experience teams in a talk at QCon London 2025. Deshayes outlined some techniques for building an effective Developer Experience platform, focusing on three key aspects: assembling effective teams, building impactful products, and communicating value across the organisation.

Matt Saunders
on Apr 07, 2025
DevOps

How Meta is Using a New Metric for Developers: Diff Authoring Time

Diff Authoring Time (DAT). DAT is a new metric developed by engineers at Meta to measure the duration required for developers to submit changes, known as "diffs," to the codebase. By tracking the time from the initiation of a code change to its submission, DAT offers insights into the efficiency of the development process and helps identify areas for improvement.

Craig Risi
on Apr 07, 2025
Cloud

AWS CodeBuild Adds Parallel Test Execution for Faster CI

AWS CodeBuild now supports parallel test execution, significantly reducing build times by allowing concurrent test suite runs across multiple environments. This feature addresses long CI pipeline cycles that impede productivity and increase costs. With intelligent test distribution and automatic result merging, developers can enhance efficiency and streamline feedback loops.

Steef-Jan Wiggers
on Apr 05, 2025
DevOps

How SREs and GenAI Work Together to Decrease eBay's Downtime: an Architect's Insights at KubeCon EU

During his KubeCon EU keynote, Vijay Samuel, Principal MTS Architect at eBay, shared his team’s experience of enhancing incident response capabilities by incorporating ML and LLM building blocks. They realised that GenAIs are not a silver bullet but can help engineers through complex incident investigations through logs, traces, and dashboard explanations.

Olimpiu Pop
on Apr 05, 2025
DevOps

How Observability Can Improve the UX of LLM Based Systems: Insights of Honeycomb's CEO at KubeCon EU

During her KubeCon Europe keynote, Christine Yen, CEO and co-founder of Honeycomb, provided insights on how observability can help cope with the rapid shifts introduced by the integration of LLMs in software systems, which transformed not only the way we develop software but also the release methodology. She explained how to adapt your development feedback loop based on production observations.

Olimpiu Pop
on Apr 03, 2025
Cloud

Cloudflare Enables Remote Hosting for Model Context Protocol (MCP) Servers

Cloudflare revolutionizes AI integration with the launch of remote Model Context Protocol (MCP) servers, enhancing accessibility for developers. This advancement allows seamless AI interactions with external services, offering simplified deployment, built-in OAuth for security, and expanded use cases. Embrace innovative workflows and empower your applications on a global scale.

Steef-Jan Wiggers
on Apr 03, 2025
AI, ML & Data Engineering

GitLab Introduces GitLab Duo with Amazon Q

The integration of Amazon Q Developer with GitLab, introduced as GitLab Duo with Amazon Q, embeds generative AI capabilities directly into GitLab, enabling developers to receive AI-driven assistance for tasks such as feature development, code upgrades, reviews, and unit testing.

Craig Risi
on Apr 02, 2025
Cloud

Azure Container Apps Serverless GPUs Reach General Availability with NVIDIA NIM Support

Azure has launched Serverless GPUs for Azure Container Apps, enabling scalable, on-demand execution of AI workloads using NVIDIA A100 and T4 GPUs. This groundbreaking feature supports NVIDIA NIM microservices, simplifying deployment and management while optimizing costs. Developers can focus on applications, as Azure manages infrastructure, offering a flexible solution for diverse AI scenarios.

Steef-Jan Wiggers
on Apr 01, 2025

Newer News

Older News

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News