InfoQ Homepage Monitoring Content on InfoQ
-
How Observability Impacts Testing: Q&A with Amy Phillips at QCon London
Observability gives you a picture of the system’s current health and can replace certain types of testing. For low-risk application areas you can rely on observability instead of testing, provided you have continuous delivery that provides fast feedback and allows you to release changes quickly.
-
Monitoring Distributed Task Queues at MeilleursAgents
MeilleursAgents, a website that lets property sellers list and get an estimated price of their property, shared details of how their Celery-based distributed task queue is monitored. A combination of Python, StatsD, Bucky, Graphite and Grafana form the pipeline to monitor task lifecycle and execution rates.
-
How MakeMyTrip Monitors Its Large-Scale E-Commerce Website
MakeMyTrip, an online travel company, talks about their monitoring philosophy and setup in a series of articles. The hybrid infrastructure is monitored across the stack by mostly open source tools.
-
How ING Bank Does SRE
Janna Brummel and Robin van Zijll, from ING Netherlands, talked at the Velocity conference in London about how poor availability from their internet banking systems prompted the bank to implement an SRE culture. A centralized SRE team was set up in the Netherlands to provide tooling, consulting and education on reliability to product teams (known as BizDevOps squads internally).
-
Monitoring Microservices - A Prediction for 2018
The monitoring and distributed tracing of microservices has been a recognised challenge for a number of years. Recently Péter Márton, CTO of RisingStack, has written an article on experiences with various approaches including the OpenTracing initiative and has some recommendations, example code and makes a prediction or two about the future.
-
Observability and the Monitoring of Cloud-Native Applications
Cindy Sridharan summarizes her thoughts on observability and its relevance in monitoring cloud native applications in her recent article. Observability is a philosophy that encompasses monitoring, log aggregation, metrics and distributed tracing to gain deeper, ad-hoc insights into a system.
-
CNCF Adds Security, Service Mesh and Tracing Projects: Docker Notary, Lyft Envoy and Uber Jaeger
The Cloud Native Computing Foundation (CNCF) has announced the addition of four new hosted projects over the past month: Docker’s Notary, The Update Framework (TUF), Lyft’s Envoy, and Uber’s Jaeger.
-
Monitoring Cloudflare's Global Network Using Prometheus
Matt Bostock’s SRECON 2017 Europe talk covers how Prometheus, a metric-based monitoring tool, is used to monitor CDN, DNS and DDoS mitigation provider CloudFlare’s globally distributed infrastructure and network.
-
Amazon CloudWatch Dashboards Gains API and CloudFormation Support
Amazon Web Services (AWS) recently added programmatic creation and manipulation of CloudWatch dashboards and widgets to support use cases such as dynamic resource lifecycle tracking and consistent cross-account dashboard maintenance.
-
NGINX Releases Microservices Platform, OpenShift Ingress Controller, and Service Mesh Preview
NGINX Inc has released the NGINX Application Platform which aims to be a “one stop shop” for microservice developers; a Kubernetes Ingress Controller solution for load balancing on the Red Hat OpenShift Container Platform; and an implementation of NGINX as a service proxy for the Istio service mesh control plane.
-
Amazon CloudWatch Events Gains Cross-Account Event Delivery
Amazon Web Services (AWS) recently added cross-account event delivery to Amazon CloudWatch Events to support use cases such as the tracking of events across an entire organization and the handling of events in separate accounts to implement advanced security schemes.
-
Why the JVM is a Good Choice for Serverless Computing: John Chapin Discusses AWS Lambda at QCon NY
At QCon New York John Chapin presented “Fearless AWS Lambdas”, and not only argued that the JVM is a good platform on which to deploy serverless code, but also provided guidance on extracting the best performance from Java-based AWS Lambda functions.
-
A Comparison of Mapping Approaches for Distributed Cloud Applications
An application map is a topology view of the components of a distributed application and the network or interprocess interactions between them. A recent article gives an overview of application mapping approaches adopted by various tools like AppDynamics, OpenTracing and Netsil.
-
AWS Lambda Support Added to AWS X-Ray Distributed Tracing Service
Following from the General Availability (GA) release of the AWS X-Ray distributed tracing service in April, Amazon has added AWS Lambda support for AWS X-Ray, enabling function invocations and associated metadata to be recorded, displayed graphically via the AWS Console, and analysed for debugging or fault resolution purposes.
-
Metrics Collection and Monitoring at Robinhood Engineering
The Robinhood server operations team published a series of articles talking about their metrics collection, monitoring and alerting infrastructure. OpenTSDB, Grafana, Kafka and Riemann form the core of the stack, with Kafka acting as a proxy layer from which the data is pushed into Riemann for stream processing of the metrics and into OpenTSDB for storage.