BT

InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

View an example

We protect your privacy.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Logo - Back to homepage

News Articles Presentations Podcasts Guides

Topics

Development

Featured in Development

How to Compute Without Looking: A Sneak Peek into Secure Multi-Party Computation

This article shows how you can compute a function across multiple parties that do not trust each other without forcing them to share their individual inputs. This technique can be used to split secrets among parties, perform logical operations, or count votes in a way that ensures data privacy is preserved.

All in development

Architecture & Design

Featured in Architecture & Design

OpenSearch Cluster Topologies for Cost Saving Autoscaling

Amitai Stern discusses cost-saving autoscaling topologies for OpenSearch. He explains the inherent challenges in autoscaling unstructured data systems like OpenSearch and Elasticsearch, using analogies to illustrate the complexities beyond simply adding nodes. He shares architectural patterns (burst indexes, burst clusters) to optimize resource utilization and handle fluctuating loads effectively.

All in architecture-design

AI Infrastructure

Featured in AI, ML & Data Engineering

Navigating LLM Deployment: Tips, Tricks, and Techniques

Meryem Arik shares best practices for self-hosting LLMs in corporate environments, highlighting the importance of cost efficiency and performance optimization. She details quantized models, batching, and workload optimizations to improve LLM serving. Insights cover model selection and infrastructure consolidation, emphasizing the differences between enterprise and large-scale AI lab deployments.

All in ai-ml-data-eng

Culture & Methods

Featured in Culture & Methods

Data, Drugs, and Disruption: Leading High-Performance Company in Drug Development

Olga Kubassova shares her journey from mathematician to CEO, detailing how engineering skills translate into business leadership. She discusses building a company, emphasizing team dynamics, strategic growth, and overcoming challenges. Learn how to leverage your technical background for entrepreneurship and navigate business complexities.

All in culture-methods

DevOps

Featured in DevOps

Checklist for Kubernetes in Production: Best Practices for SREs

This article provides SREs with a checklist for managing Kubernetes in production environments. It identifies common challenges including resource management, workload placement, high availability, health probes, storage, monitoring, and cost optimization. By implementing consistent GitOps automation across these areas, teams can significantly reduce complexity, and prevent downtime.

All in devops

Events

Helpful links

Choose your language

Discover emerging trends, insights, and real-world best practices in software development & tech leadership. Join now.

InfoQ Dev Summit Boston

Learn how senior software developers are solving the challenges you face. Register now with early bird tickets.

InfoQ Dev Summit Munich

Learn practical solutions to today's most pressing software challenges. Register now with early bird tickets.

QCon San Francisco

Explore insights, real-world best practices and solutions in software development & leadership. Register now.

InfoQ Homepage Resilience Content on InfoQ

Articles

RSS Feed

Newer Older

Culture & Methods

Write More, Talk Less: Building Organizational Resilience through Documentation and InnerSource

Better documentation and knowledge sharing creates transparency that aids onboarding, prevents turnover disruption, and withstands reorganizations. Different practices can help, such as communicating asynchronously, creating incentives for documentation, making docs discoverable, understanding team members' preferences, and providing dedicated writing time. And maybe InnerSource can help too.

David Grizzanti
on Dec 20, 2023
DevOps

Debugging Production: eBPF Chaos

This article shares insights into learning eBPF as a new cloud-native technology which aims to improve Observability and Security workflows. You’ll learn how chaos engineering can help, and get an insight into eBPF based observability and security use cases. Breaking them in a professional way also inspires new ideas for chaos engineering itself.

Michael Friedrich
on Jun 20, 2023
Culture & Methods

How We Improved Application’s Resiliency by Uncovering Our Hidden Issues Using Chaos Testing

This article lists the chaos testing principles which are outlined by Netflix. The readers should be able to understand the advantages and disadvantages that chaos testing offers. This will help them to decide whether they want to perform it or not. The article also explains why we should convince the management to perform chaos tests, considering all benefits over the risks.

Vipin Jain
on Dec 20, 2022
Cloud

How Do We Utilize Chaos Engineering to Become Better Cloud-Native Engineers?

Engineers these days are closer to the product and the customer needs—there is still a long way to go and companies are still struggling with how to get engineers closer to their customers to understand in-depth what their business impact is: what do they solve, what’s their influence on the customer, and what is their impact on the product?

Eran Levy
on Jun 13, 2022
Culture & Methods

Chaos Engineering and Observability with Visual Metaphors

This article introduces a new actor for visualising chaos engineering and observability: metaphors. It provides the conceptual foundations of chaos engineering and observability, presents a state of art of visualisation techniques available in the market and shows how treemaps, gauge charts, geocentric and city metaphors can enrich the spectrum of the visual strategies to observe the chaos.

Yury Niño Roa
on May 02, 2022
DevOps

DevOps and Cloud InfoQ Trends Report - July 2021

This article summarizes how we see the "cloud computing and DevOps" space in 2021, which focuses on fundamental infrastructure and operational patterns, the realization of patterns in technology frameworks, and the design processes and skills that a software architect or engineer must cultivate.

Matt Campbell Steef-Jan Wiggers Shaaron A Alvares Helen Beal Daniel Bryant Lena Hall Rupert Field Aditya Kulkarni Jared Ruckle Renato Losio Holly Cummins
on Jul 19, 2021
Culture & Methods

Building Reliable Software Systems with Chaos Engineering

Advances in large-scale, distributed software systems are changing the game for software engineering. As an industry, we are quick to adopt practices that improve flexibility and improve feature velocity. If we can move quickly, can we do so without breaking things? Chaos Engineering practices can be used to navigate complexity and build more reliable systems.

Ben Linders Casey Rosenthal
on Jun 08, 2021
Culture & Methods

Continuous Learning as a Tool for Adaptation

The fifth and capstone article in a series on how software companies adapted and continue to adapt to enhance their resilience explores key themes with a special view on the practicality of organizational resilience. It also provides practical guidance to engineering leadership and recommendations on how to create this investment.

Nora Jones
on May 07, 2021
Architecture & Design

Software Architecture and Design InfoQ Trends Report—April 2021

An overview of how the InfoQ editorial team sees the Software Architecture and Design topic evolving in 2021, with a focus on what architects are designing for today.

Thomas Betts Holly Cummins Daniel Bryant Eran Stiller
on Apr 19, 2021
Culture & Methods

Designing & Managing for Resilience

The fourth article in a series on how software companies adapted and continue to adapt to enhance their resilience explores the strategies used by engineering leaders to help create the conditions for sustained resilience. It provides stories, examples, and strategies towards designing an organizational structure to support resilient performance and managing for resilience.

Laura Maguire
on Apr 15, 2021
Culture & Methods

Adaptive Frontline Incident Response: Human-Centered Incident Management

The third article in a series on how software companies adapted and continue to adapt to enhance their resilience zeros in on the sources that comprise most of your company’s adaptive resources: your frontline responders. In this article, we draw on our experiences as incident commanders with Twilio to share our reflections on what it means to cultivate resilient people.

Emily Ruppe Ryan McDonald
on Feb 05, 2021
Culture & Methods

Learning from Incidents

Jessica DeVita (Netflix) and Nick Stenning (Microsoft) have been working on improving how software teams learn from incidents in production. In this article, they share some of what they’ve learned from the research community in this area, and offer some advice on the practical application of this work.

Jessica DeVita Nick Stenning
on Jan 27, 2021

Newer Articles

Older Articles

BT