InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage Monitoring Content on InfoQ

News

RSS Feed

Newer Older

Culture & Methods

Testing Complex Distributed Systems at FT.com: Sarah Wells Shares Lessons Learned

The complexity in complex distributed systems isn’t in the code, it’s between the services or functions. Testing implies balancing finding problems versus delivering value, said Sarah Wells at the European Testing Conference. Testers often have the best understanding of what the system does; they have a good hypothesis about what went wrong, and are able to validate it pretty quickly.

Ben Linders
on Feb 14, 2019
DevOps

Adopting Envoy as a Service-to-Service Proxy at Reddit

Reddit introduced Envoy into their backend framework as service-to-service proxy to support their ongoing architectural improvements. By adopting Envoy as a service-to-service Layer 4/Layer 7 proxy, they discovered significant improvements in observability, ease of adoption, and performance.

Matt Campbell
on Jan 30, 2019
Development

The Evolution of Full Cycle Developers at Netflix: Greg Burrell at QCon SF

At QCon San Francisco, Greg Burrell talked about the journey towards “full cycle developers” within the Netflix edge engineering team. Following the principle of “operate what you build”, developers within this team chose to take on more operational responsibility for their services, and were facilitated by comprehensive tooling, training and management support.

Daniel Bryant
on Jan 06, 2019
DevOps

Shipping More Safely by Encouraging Ownership of Deployments

Many incidents happen during or right after the release argues Charity Majors, CEO at Honeycomb. She believes that stronger ownership of the deployment process by developers will ensure it is executed regularly and reduce risk. She argues for investment in the tooling, high observability during and after release, and small, frequent releases to minimize the impact caused by shipping new code.

Matt Campbell
on Dec 31, 2018
Cloud

Amazon Introduces AWS Cloud Map: "Service Discovery for Cloud Resources"

In a recent blog post, Amazon introduced a new service called AWS Cloud Map which discovers and tracks cloud resources. With the rise of microservice architectures, it has been increasingly difficult to manage dynamic resources in these architectures. But, using AWS Cloud Map, developers can monitor the health of databases, queues, microservices, and other cloud resources with custom names.

Kent Weare
on Dec 27, 2018
DevOps

Grafana Adds Log Data Correlation to Time Series Metrics

The Grafana team announced an alpha version of Loki, their logging platform that ties in with other Grafana features like metrics query and visualization. Loki adds a new client agent promtail and serverside components for log metadata indexing and storage.

Hrishikesh Barua
on Dec 26, 2018
DevOps

Inside Stack Overflow’s Monitoring Systems

Nick Craver, architecture lead at Stack Exchange, wrote about their monitoring systems in a recent article. He discussed the philosophy and motivation behind their monitoring strategy and talked about their toolset - mainly Bosun, Grafana and Opserver.

Hrishikesh Barua
on Dec 21, 2018
DevOps

Scaling Observability at Uber: Building In-House Solutions, uMonitor and Neris

Uber’s infrastructure consists of thousands of microservices supporting mobile applications, infrastructure, and internal services. To provide high observability of these services, Uber’s Observability team built two in-house monitoring solutions: uMonitor for time-series metrics-based alerting, and Neris for host-level checks and metrics.

Matt Campbell
on Dec 20, 2018
DevOps

Q&A with the Creator of Checkless, a Low-Cost, Simple Site Monitoring Tool

Steve Elliott wanted a simple, cheap way to monitor uptime for his websites. He found most off-the-shelf tooling to either be too complex or too costly. This lead him to build Checkless, a serverless tool that can monitor sites for uptime via ping-based checks and depending on your usage, can potentially be free to use.

Matt Campbell
on Sep 19, 2018
AI, ML & Data Engineering

Confluent Platform 5.0 Supports LDAP Authorization and MQTT Proxy for IoT Integration

Confluent Platform 5.0, the enterprise streaming platform built on Apache Kafka, supports LDAP authorization, Kafka topic inspection, and Confluent MQTT Proxy for Internet of Things (IoT) integration.

Srini Penchikala
on Sep 17, 2018
DevOps

Pinterest Switches from OpenTSDB to Their Own Time Series Database

The Pinterest engineering team has used OpenTSDB for storing and querying metrics since 2014. Recently, they developed and switched to their own time series database called Goku to mitigate various performance issues in OpenTSDB caused by a growth in the amount of metrics data.

Hrishikesh Barua
on Sep 16, 2018
DevOps

Auth0's Move to a Single-Cloud Architecture on AWS

Auth0, a provider of authentication, authorization and single sign on services, moved their infrastructure from multiple cloud providers (AWS, Azure and Google Cloud) to just AWS. An increasing dependency on AWS services necessitated this, and today their systems are spread across four AWS regions, with services replicated across zones.

Hrishikesh Barua
on Aug 25, 2018
DevOps

Prometheus Monitoring Platform "Graduates" from the Cloud Native Computing Foundation (CNCF)

On August 9th, the Cloud Native Computing Foundation (CNCF) announced open source monitoring toolkit, Prometheus, has graduated from its incubation status. In order to achieve this rating, projects must demonstrate growth, documentation, organized governance processes, commitment to community sustainability and inclusivity.

Kent Weare
on Aug 19, 2018
DevOps

Uber Open Sources Its Large Scale Metrics Platform M3

Uber’s engineering team released its metrics platform M3 as open source which it has been using internally for some years. The platform was built to replace its Graphite based system, and provides cluster management, aggregation, collection, storage management, a distributed time series database (TSDB) and a query engine with its own query language M3QL.

Hrishikesh Barua
on Aug 18, 2018
DevOps

How Coinbase Handled Scaling Challenges on Their Cryptocurrency Trading Platform

Coinbase, a digital currency exchange, faced scaling challenges on their platform during the 2017 cryptocurrency boom. The engineering team focused on upgrading and optimizing MongoDB, traffic segregation for hotspots to resolve them, and building capture and replay tools to prepare for future surges.

Hrishikesh Barua
on Aug 12, 2018

Newer News

Older News

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News