InfoQ Homepage Observability Content on InfoQ
-
Cloud Native and Kubernetes Observability: Expert Panel
InfoQ recently caught up with Observability experts to discuss several topics including fundamental questions about what Observability really entails, the misconceptions and challenges that the users are facing, the open standards that are influencing the industry in general and why there is more interest in this area off late.
-
Site Reliability Engineering Experiences at Instana
With the popularity of distributed architectures, distributed databases, containers and container orchestrators, an approach that emphasizes automation and a culture of collaboration is a natural fit for modern day operations. Site Reliability Engineering takes engineering practices that have been established and proven in software engineering and applies them to the field of operations.
-
Software Architecture and Design InfoQ Trends Report—April 2021
An overview of how the InfoQ editorial team sees the Software Architecture and Design topic evolving in 2021, with a focus on what architects are designing for today.
-
Piercing the Fog: Observability Tools from the Future
Visibility into those distributed systems and how they are performing is challenging. Despite all the observability tools available for site reliability, debugging remains incredibly difficult, and many SREs would agree that their debugging processes have only marginally improved. This article explores how observability for troubleshooting could be done from the user’s point of view.
-
Instrumenting the Network for Successful AIOps
AIOps platforms empower IT teams to quickly find the root issues that originate in the network and disrupt running applications. AI/ML algorithms need access to high quality network data to determine what went wrong and where. Network visibility starts from TAPs around network equipment, and teams can add application instrumentation and logs as data sources for complete insights.
-
Load Testing APIs and Websites with Gatling: It’s Never Too Late to Get Started
Conducting load tests against APIs and websites can both validate performance after a long stretch of development and get useful feedback from an app in order to increase its scaling capabilities and performance. Engineers should avoid creating “the cathedral” of load testing and end up with little time to improve performance overall. Write the simplest possible test and iterate from there.
-
Realtime APIs: Mike Amundsen on Designing for Speed and Observability
In a recent apidays webinar, Mike Amundsen, trainer and author of the recent O’Reilly book “API Traffic Management 101”, presented “High Performing APIs: Architecting for Speed at Scale”. Drawing on recent research by IDC, he argued that organisations will have to drive systemic changes in order to meet the upcoming increased demand of consumption of business services via APIs.
-
Understandability: The Most Important Metric You’re Not Tracking
Understandability is the concept that a system should be presented so that an engineer can easily comprehend it. The more understandable a system is, the easier it will be for engineers to change it in a predictable and safe manner. A system is understandable if it meets the following criteria: complete, concise, clear, and organized.
-
The Fundamental Truth behind Successful Development Practices: Software is Synthetic
Software systems are creative compounds, emergent and generative; the product of complex interactions between people and technology. They are different from the orderly, analytic worlds that our school-age selves expect to find. Being so full of complexity and uncertainty, we use a different way to arrive at a solution.
-
Q&A with Tyler Treat on Microservice Observability
Tyler Treat attempts to disambiguate the concepts of Observability and Monitoring. He discusses how the complexity of elastic systems produces more unknowns that require a discovery-based approach. InfoQ recently sat down with Treat to discuss the topics of observability and monitoring, and he shares some challenges and best practices when introducing observability concepts.
-
Sustainable Operations in Complex Systems with Production Excellence
Successful long-term approaches to production ownership and DevOps require cultural change in the form of production excellence. Teams are more sustainable if they have well-defined measurements of reliability, the capability to debug new problems, a culture that fosters spreading knowledge, and a proactive approach to mitigating risk.
-
DevOps and Cloud InfoQ Trends Report - February 2019
An overview of how the “cloud computing” and DevOps space is evolving in 2019 including updates on Kubernetes, Chaos Engineering, Service meshes and more.