InfoQ Homepage Scalability Content on InfoQ
-
Stripe Rearchitects Its Observability Platform with Managed Prometheus and Grafana on AWS
Stripe replaced its observability platform, which used a third-party vendor solution, with a new architecture utilizing managed services on AWS. The company made the move due to scalability limits, reliability issues, and increasing costs while transitioning to microservices. The migration involved dual-writing metrics, translating assets, validation, and user training.
-
Netflix’s Pushy: Evolution of Scalable WebSocket Platform That Handles 100Ms Concurrent Connections
Netflix shared details on the evolution of Pushy, a WebSocket messaging platform that supports push notifications and inter-device communication across many different devices for the company’s products. Netflix’s engineers implemented many improvements across the Pushy ecosystem to ensure the platform's scalability and reliability and support new capabilities.
-
How Amazon Aurora Serverless Manages Resources and Scaling for Fleets of 10K+ Instances
AWS engineers published a paper describing the evolution and latest design of resource management and scaling for the Amazon Aurora Serverless platform. Aurora Serverless uses a combination of components at different levels to create a holistic approach for dynamically scaling and adjusting resources to satisfy the needs of customer workloads.
-
Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day
Canva evaluated different data massaging solutions for its Product Analytics Platform, including the combination of AWS SNS and SQS, MKS, and Amazon KDS, and eventually chose the latter, primarily based on its much lower costs. The company compared many aspects of these solutions, like performance, maintenance effort, and cost.
-
Microsoft Introduces the Public Preview of Flex Consumption Plan for Azure Functions at Build
At the annual Build conference, Microsoft announced the flex consumption plan for Azure Functions, which brings users fast and large elastic scale, instance size selection, private networking, availability zones, and higher concurrency control.
-
QCon London: Scaling Microservices Architecture and Technology Organization at Trainline
During the recent QCon London conference, Trainline’s CTO spoke about the evolution of the company’s system architecture and organizational structure over the last five years. The company had to adapt to market changes and growing customer expectations by improving the performance and reliability of its technology platform.
-
QCon London: How Duolingo Sent 4 Million Push Notifications in 6 Seconds During the Super Bowl Break
As part of the Super Bowl marketing campaign, Duolingo sent out 4 million mobile push notifications when the company’s five-second ad aired during the commercial break. At QCon London, Doulingo’s engineers presented the asynchronous AWS architecture responsible for broadcasting messages to millions of users across seven US cities.
-
Hashnode Creates Scalable Feed Architecture on AWS with Step Functions, EventBridge and Redis
Hashnode created a scalable event-driven architecture (EDA) for composing feed data for thousands of users. The company used serverless services on AWS, including Lambda, Step Functions, EventBridge, and Redis Cache. The solution leverages Step Functions' distributed maps feature that enables high-concurrency processing.
-
Uber Builds Scalable Chat Using Microservices with GraphQL Subscriptions and Kafka
Uber replaced a legacy architecture built using the WAMP protocol with a new solution that takes advantage of GraphQL subscriptions. The main drivers for creating a new architecture were challenges around reliability, scalability, observability/debugibility, as well as technical debt impeding the team’s ability to maintain the existing solution.
-
Discord Scales to 1 Million+ Online MidJourney Users in a Single Server
Discord optimized its platform to serve over one million online users in a single server while maintaining a responsive user experience. The company evolved the guild component, which is responsible for fanning out billions of message notifications, in a series of performance and scalability improvements supported by system observability and performance tuning.
-
lastminute.com Improves Search Scalability Using Microservices with RabbitMQ and Redis
The team at lastminute.com rearchitected the search result aggregation process by breaking up the single service into multiple ones and introducing asynchronous integration. Developers used RabbitMQ for messaging and Redis for storing results from data suppliers. The revised architecture improved scalability and deployability and reduced resource utilization.
-
Griffin 2.0: Instacart Revamps Its Machine Learning Platform
Instacart created the next-generation platform based on experiences using the original Griffin machine-learning platform. The company wanted to improve user experience and help manage all ML workloads. The revamped platform leverages the latest developments in MLOps and introduces new capabilities for current and future applications.
-
Zendesk Moves from DynamoDB to MySQL and S3 to Save over 80% in Costs
Zendesk reduced its data storage costs by over 80% by migrating from DynamoDB to a tiered storage solution using MySQL and S3. The company considered different storage technologies and decided to combine the relational database and the object store to strike a balance between querybility and scalability while keeping the costs down.
-
Why LinkedIn chose gRPC+Protobuf over REST+JSON: Q&A with Karthik Ramgopal and Min Chen
LinkedIn announced that it would be moving to gRPC with Protocol Buffers for the inter-service communication in its microservices platform, where previously an open-source Rest.li framework was used with JSON as a primary serialization format. InfoQ contacted Karthik Ramgopal and Min Chen to learn more about the decision and company motivations behind it.
-
Automated Horizontal Scaling with Amazon Aurora Limitless Database
AWS recently announced the preview of Amazon Aurora Limitless Database, a new capability supporting automated horizontal scaling to process millions of write transactions per second and manage petabytes of data in a single Aurora database.