InfoQ Homepage Site Reliability Engineering Content on InfoQ
-
Mastering Impact Analysis and Optimizing Change Release Processes
Dynamic IT professional with a proven track record in optimizing production processes and analyzing outages in complex systems handling millions of TPS. The recent CrowdStrike outage highlights the importance of continuous improvement and adherence to best practices. Passionate about elevating operational excellence through strategic reviews and effective process enhancements.
-
How Platform and Site Reliability Engineering Are Evolving DevOps
Companies are now looking to grow and more effectively manage DevOps with platform engineering and site reliability engineering roles. No one has these roles perfectly carved out right now — there’s just too much to do and not enough people to do it — but knowing where these three disciplines do and don’t overlap will help organizations evolve and take advantage when they are ready.
-
Data-Driven Decision Making - Software Delivery Performance Indicators at Different Granularities
Optimizing a software delivery organization is not a straightforward process standardized in the software industry. Getting the organization to analyze the data and act on it is a difficult undertaking. This article presents insights into how a socio-technical framework for optimizing a software delivery organization has been set up and brought to the point of regular use.
-
AIOps: Site Reliability Engineering at Scale
AIOps can simplify and streamline processes which can reduce the mental burden on employees while improving communication and collaboration between departments.
-
Assessing Organizational Culture to Drive SRE Adoption
SRE adoption is greatly influenced by the organizational culture at hand. This article describes how to assess the organizational culture in terms of production operations at the beginning of the SRE transformation. It provides a roadmap of small culture changes accumulating over time, and shows how the leadership facilitated the necessary culture changes
-
Environment-as-a-Service (EaaS) as a Technique to Raise Productivity in Teams
In essence, EaaS addresses developer productivity issues by providing settings that make it simple for developers to test and mimic real-world uses of their system. This article discusses the benefits of EaaS.
-
The Hows and Whys of Effective Production-Readiness Reviews
At QCon Plus November 2021, Nora Jones, CEO and founder of Jeli, talked about how to build production readiness reviews (PRR) with emphasis on context and psychological safety. Her talk focused on the particulars of a PRR process that relates to incidents.
-
Employing Team-Based Agile Coaching to Establish SRE in an Organization
Establishing SRE in a software delivery organization typically requires a socio-technical transformation. Operations teams need to learn how to provide a scalable SRE infrastructure to enable development teams to run their services efficiently. This paper presents how agile coaching has been employed to run an SRE transformation in a 25-teams strong product delivery organization.
-
Establishing a Scalable SRE Infrastructure Using Standardization and Short Feedback Loops
This article explores an SRE implementation where the operations team builds and runs the SRE infrastructure and the development teams build and run the services leveraging the SRE infrastructure. This SRE solution enables the software delivery organization to scale the number of services in operation without linearly scaling the number of people required to operate the services.
-
DevOps and Cloud InfoQ Trends Report – June 2022
This article summarizes how we see the "cloud computing and DevOps" space in 2022, which focuses on fundamental infrastructure and operational patterns, the realization of patterns in technology frameworks, and the design processes and skills that a software architect or engineer must cultivate.
-
InfoQ Mobile and IoT Trends Report 2022
This report summarizes the views of the InfoQ editorial team and of several practitioners from the software industry about emerging trends in a number of areas that we collectively label the mobile and IoT space. This is a rather heterogeneous space comprising devices and gadgets from smartphones to smart watches, from IoT appliances to smart glasses, voice-driven assistants, and so on.
-
Improving Speed and Stability of Software Delivery Simultaneously at Siemens Healthineers
In this article, we focus on the software delivery process at Siemens Healthineers Digital Health. The process is subject to strict regulations valid in the medical industry. We show our journey of transforming the process towards speed and stability. Both measures improved at the same time during the transformation, confirming research from the “Accelerate” book.