InfoQ Homepage Performance & Scalability Content on InfoQ
-
Scaling Graphite at Booking.com
Booking.com's engineering team scaled their Graphite deployment from a small cluster to one that handles millions of metrics per second. Along the way, they modified and optimized Graphite's core components - the carbon-relay and carbon-cache, and the rendering API.
-
Scaling Apache Kafka at Pinterest
Apache Kafka is used at Pinterest for transporting data for real time streaming applications, logging and visibility metrics for monitoring. Hosted on AWS, Pinterest’s Kafka installation uses the MirrorMaker and DoctorKafka tools for replication and high availability.
-
The Evolution of Uber’s 100+ Petabyte Big Data Platform
Uber’s engineering team wrote about how their big data platform evolved from traditional ETL jobs with relational databases to one based on Hadoop and Spark. A scalable ingestion model, standard transfer format and a custom library for incremental updates are the key components of the platform.
-
Scaling Global Traffic at Dropbox with Edge Locations and GSLB
The Dropbox engineering team shared their experience of architecting and scaling their global network of edge locations. Located around the globe, these run a custom stack of nginx and IPVS and connect to the Dropbox backend servers over their backbone network. A combination of GeoDNS and BGP Anycast ensures availability and low latency for end users.
-
Supercharging Marketo's Campaign Engine at Reactive Summit
Marketo is a marketing automation software, executing over 20 billions customer defined actions per month. Apurva Pawar, Daniel Pugliese, Dennis Bronnikov and Pei-Chiang Ma from Marketo’s engineering team explained at Reactive Summit how they rewrote the core of their system with Akka and a reactive approach.
-
Amazon S3 Increases Request Rate Performance and Drops Randomized Prefix Requirement
Amazon Web Services (AWS) recently announced significantly increased S3 request rate performance and the ability to parallelize requests to scale to the desired throughput. Notably this performance increase also "removes any previous guidance to randomize object prefixes" and enables the use of "logical or sequential naming patterns in S3 object naming without any performance implications".
-
Facebook Open Sources LogDevice - a Distributed Data Store for Log Storage
Facebook open sourced their internal distributed log storage project called LogDevice. It offers high write availability using replication, durable log storage and recovery from failure.
-
How Coinbase Handled Scaling Challenges on Their Cryptocurrency Trading Platform
Coinbase, a digital currency exchange, faced scaling challenges on their platform during the 2017 cryptocurrency boom. The engineering team focused on upgrading and optimizing MongoDB, traffic segregation for hotspots to resolve them, and building capture and replay tools to prepare for future surges.
-
Hyperledger Adds "Caliper" to Measure Blockchain Performance across Implementations
On March 19th, Hyperledger announced Caliper has been accepted by the Technical Steering Committee as a Hyperledger project. Hyperledger Caliper is a blockchain benchmark tool that allows projects to consistently track performance characteristics across different blockchain implementations.
-
How Booking.com Uses Kubernetes for Machine Learning
Sahil Dua explained how Booking.com was able to scale machine learning (ML) models for recommending destinations and accommodation to their customers using Kubernetes, at the QCon London conference. In particular, he stressed how Kubernetes elasticity and resource starvation avoidance on containers helps them run computationally (and data) intensive, hard to parallelize, machine learning models.
-
Handling Traffic Spikes from Global Events at Facebook Live
Facebook Live’s engineers talked about how they scale their systems to handle traffic from both predicted and unpredicted events. While the latter is handled by their global distributed architecture, the former involves careful advance planning and load testing.
-
Smart Replies for Member Messages at LinkedIn
LinkedIn has launched a new natural language processing (NLP) recommendation engine which is used to provide members with smart-reply recommendations to messages. The models and infrastructure development process has been documented in detail in a recent blog post by the engineering team.
-
GoTo Copenhagen 2017: How Shopify Powers Online Commerce
Simon Hørup, senior product engineering lead at Shopify, gave an overview of how Shopify is architectured to support large sales at GoTo Copenhagen 2017. This included their OpenResty configured NGINX instances, shop and pod isolation architecture, failover strategies and more.
-
AWS Offers 4TB Memory Virtual Machines
AWS has now the largest cloud virtual machine in terms of memory, after launching the x1e.32xlarge, a new memory-optimized EC2 instance type. AWS customers can use this new type of instances in their production environments in order to handle the high memory requirements of software like SAP HANA and their in-memory databases.
-
Personalized Notifications at Twitter
Gary Lam, staff engineer at Twitter, spoke about personalized notifications at QCon London 2017. This involved giving a high-level overview of their personalization and recommendations algorithms, and an explanation of how they work at scale despite the large volumes of data and bi-modal nature of Twitter.