InfoQ Homepage Performance & Scalability Content on InfoQ
-
Pinterest Open-Sources a Production-Ready PubSub Java Client for Kafka, Flink, and MemQ
Pinterest open-sourced its generic PubSub client library, PSC, which has been heavily used in production for a year and a half. The library helped the engineering teams by increasing developer velocity, and the scalability and stability of services using it. Over 90% of Java applications have migrated to PSC with minimal changes.
-
Uber Improves Resiliency of Microservices with Adaptive Load Shedding
Uber created a new load-shedding library for its microservice platform, serving over 130 million customers and handling aggregated peaks of millions of requests per second (RPSs). The company replaced the solution based on QALM with Cinnamon library, which, in addition to graceful degradation, can dynamically and continuously adjust the capacity of the service and the amount of load shedding.
-
How RevenueCat Manages Caching for Handling over 1.2 Billion Daily API Requests
RevenueCat extensively uses caching to improve the availability and performance of its product API while ensuring consistency. The company shared its techniques to deliver the platform, which can handle over 1.2 billion daily API requests. The team at RevenueCat created an open-source memcache client that provides several advanced features.
-
Discord Scales to 1 Million+ Online MidJourney Users in a Single Server
Discord optimized its platform to serve over one million online users in a single server while maintaining a responsive user experience. The company evolved the guild component, which is responsible for fanning out billions of message notifications, in a series of performance and scalability improvements supported by system observability and performance tuning.
-
lastminute.com Improves Search Scalability Using Microservices with RabbitMQ and Redis
The team at lastminute.com rearchitected the search result aggregation process by breaking up the single service into multiple ones and introducing asynchronous integration. Developers used RabbitMQ for messaging and Redis for storing results from data suppliers. The revised architecture improved scalability and deployability and reduced resource utilization.
-
Zendesk Moves from DynamoDB to MySQL and S3 to Save over 80% in Costs
Zendesk reduced its data storage costs by over 80% by migrating from DynamoDB to a tiered storage solution using MySQL and S3. The company considered different storage technologies and decided to combine the relational database and the object store to strike a balance between querybility and scalability while keeping the costs down.
-
Why LinkedIn chose gRPC+Protobuf over REST+JSON: Q&A with Karthik Ramgopal and Min Chen
LinkedIn announced that it would be moving to gRPC with Protocol Buffers for the inter-service communication in its microservices platform, where previously an open-source Rest.li framework was used with JSON as a primary serialization format. InfoQ contacted Karthik Ramgopal and Min Chen to learn more about the decision and company motivations behind it.
-
Automated Horizontal Scaling with Amazon Aurora Limitless Database
AWS recently announced the preview of Amazon Aurora Limitless Database, a new capability supporting automated horizontal scaling to process millions of write transactions per second and manage petabytes of data in a single Aurora database.
-
LinkedIn Migrates Espresso to HTTP2 and Reduces Connections by 88% and Latency by 75%
LinkedIn was able to dramatically improve the scalability and performance of its Espresso database by migrating it from HTTP1.1 to HTTP2, resulting in a reduction in the number of connections, latency, and garbage collection times. To achieve these gains, the team had to optimize the Netty’s default HTTP2 stack to make it fit their needs.
-
How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions
HubSpot adopted routing messages over multiple Kafka topics (called swimlanes) for the same producer to avoid the build-up in the consumer group lag and prioritize the processing of real-time traffic. Using a combination of automatic and manual detection of traffic spikes, the company ensures the majority of customers’ workflows execute without delays.
-
Partitioned Namespaces for Azure Service Bus Premium Are Now Generally Available
During the recent Ignite conference, Microsoft announced the general availability (GA) of partitioned namespaces feature for Azure Service Bus, which allows customers to use partitioning for the premium messaging tier.
-
Microsoft Refreshes its Well-Architected Framework
Microsoft recently announced a comprehensive refresh of the Well-Architected Framework (WAF) for designing and running optimized workloads on Azure.
-
AWS Restructures and Consolidates Its Well-Architected Framework
AWS published a new set of updates to its Well-Architected Framework, with changes across all six pillars of the framework. The performance efficiency and operational excellence pillars have been restructured and consolidated to reduce the number of best practices. Other pillars received improved implementation guidance, including recommendations and steps on reusable architecture patterns.
-
How DoorDash Rearchitected its Cache to Improve Scalability and Performance
DoorDash rearchitected the heterogeneous caching system they were using across all of their microservices and created a common, multi-layered cache providing a generic mechanism and solving a number of issues coming from the adoption of a fragmented cache.
-
Monzo Employs Targeted Traffic Shedding against Stampeding Herd Effect from the Mobile App
Monzo developed a solution for shedding traffic in case its platform comes under intense and unexpected load that could lead to an outage. Traffic spikes can be generated by the mobile app and triggered by push notifications or other bursts in user activity. The solution can reduce the read traffic by almost 50% with 90% overall accuracy without noticeable customer impact.