InfoQ Homepage Monitoring Content on InfoQ
-
Amazon Cloudwatch Dashboards Supports Sharing
AWS recently introduced the ability to share Amazon CloudWatch Dashboards with users who do not have access to the AWS account. This feature opens up new use cases for dashboards, including sharing metrics and information on big screens, or embedding real-time information in public pages.
-
Netflix Presents Telltale, an Application Health Monitoring Tool
The Netflix Engineering team recently blogged about Telltale, a monitoring and alerting tool that utilizes a variety of data sources to learn the typical health of an application. Telltale shows only the relevant data from application. There's also information about important events, such as nearby deployments and regional traffic evacuations.
-
Focused on Observability: CNCF Publishes Latest Technology Radar
CNCF released their second quarterly technology radar focused on Observability. The goal of the radar is to “share what tools are actively being used by end users, the tools they would recommend, and their patterns of usage” when adopting cloud-native technologies.
-
Observability Strategies for Distributed Systems - Lessons Learned at InfoQ Live
A good observability strategy makes it easy for teams to share their data, and uses data from across a distributed system to identify if business goals are being achieved. These were some of the ideas discussed during the InfoQ Live roundtable discussion on observability patterns for distributed systems, held on August 25.
-
Rookout CTO Discusses Understandability, Architecture Styles, and Live Debugging
In a recent InfoQ podcast, Liran Haimovitch, CTO at Rookout, discussed the concept of “understandability” and how this relates to building modern software systems. Building on the concepts introduced in his recent InfoQ article, he also discussed how complexity impacts a system’s understandability, and the benefits of live debugging tooling.
-
InfoQ Live Virtual Event on Aug 25th: Session Spotlights and Roundtables
The inaugural InfoQ Live (Aug 25th) is a one-day virtual learning event that deep-dives into building and operating microservices and distributed systems. Discover practical strategies for the current environment that you can put into use straight away. Join world-class practitioners for inspiration, connections, and actionable ideas. See the InfoQ Live full schedule and the speaker line-up.
-
Brenda - an Artificial Intelligence Team Member
Brenda uses artificial intelligence with machine learning to monitor the infrastructure, do quality assurance checks and support troubleshooting, handle alerts and communicate critical issues, and apply auto-healing. Sree Rama Murthy Pakkala and Collin Mendons from Swisscom will talk about an AI/ML framework named Brenda, who helps their teams to increase quality at Swiss Testing Day 2020.
-
Metrics Collection at Scale: Learning from Uber's M3
In a recent InfoQ podcast, Rob Skillington, co-founder and CTO at Chronosphere, shared his experience and opinions on the topic of observability in modern distributed systems. Key topics covered: metrics collection at scale, multi-dimensional metrics and high-cardinality, the importance of the developer experience, and the value of open standards, such as OpenMetrics.
-
Applying Observability to Ship Faster
To get fast feedback, ship work often, as soon as it is ready, and use automated systems in Live to test the changes. Monitoring can be used to verify if things are good, and to raise an alarm if not. Shipping fast in this way can result in having fewer tests and can make you more resilient to problems.
-
How Netlify’s Infrastructure Team Improved Observability While Increasing Deployment Speed
Netlify's infrastructure team shared their story of how they increased their customer deployment speeds by up to 2x by optimizing their deployment algorithm and increased observability into their systems in the process.
-
Moogsoft Adds Virtual Network Operations Centre Capability
AIOps platform vendor, Moogsoft, has announced the release of Moogsoft Enterprise 8.0, featuring a capability for technology teams to build a virtual Network Operations Centre (NOC). Moogsoft Enterprise consolidates monitoring tools with the intention of helping technology teams reduce noise, prioritize incidents, reduce escalations and ensure uptime.
-
Splunk Launches New Release of SignalFx APM
Splunk, a platform for searching, monitoring, and examining machine-generated big data, has launched a new release of application monitoring tool SignalFx Microservices APM™. The new release combines NoSample™ tracing, open standards based instrumentation and artificial intelligence (AI)-driven directed troubleshooting from SignalFx and Omnition into a single solution.
-
Instana Launches Context Guide: Enabling Visual Navigation of Infrastructure & Services
Provider of automated application performance management (APM) solutions for microservices, Instana, has launched the Instana Context Guide, providing GUI-based access to the company’s underlying system model called the Dynamic Graph. Instana’s solution discovers application service components and application infrastructure, including cloud infrastructure.
-
Periskop: SoundCloud's Exception Monitoring Service
SoundCloud's engineering team wrote about their exception monitoring software called Periskop, which collects and aggregates exceptions across servers and reports to a central server for analysis.
-
Grafana Labs Announces GA of Cortex v1.0 and Discusses Architectural Changes
Grafana Labs, the company behind popular open-source monitoring projects Grafana and Loki, announced the General Availability of Cortex v1.0. Cortex is a clustered Prometheus implementation that includes features such as horizontal scalability, multi-tenancy, durability, and long-term storage.