BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Scaling Content on InfoQ

  • OpenAI Scales Single Primary PostgreSQL Instance to Millions of Queries per Second for ChatGPT

    OpenAI described how it scaled PostgreSQL to support ChatGPT and its API platform, handling millions of queries per second for hundreds of millions of users. By running a single-primary PostgreSQL deployment on Azure with nearly 50 read replicas, optimizing query patterns, and offloading write-heavy workloads to sharded systems, OpenAI maintained low-latency reads while managing write pressure.

  • Enhancing Reliability Using Service-Level Prioritized Load Shedding: Netflix at QCon SF 2025

    At QCon San Francisco, Netflix engineers unveiled their advanced Service-Level-Prioritized Load-Shedding strategy, enhancing reliability during traffic spikes. By prioritizing high-value requests and automating management across microservices, they safeguard user experience and system stability. Key insights stress prioritization, automation, and structured load shedding for optimal resilience.

  • Advanced Autoscaling Helps Companies Reduce AWS Costs by 70%

    The next generation of Kubernetes autoscaling techniques and tools is enabling organisations to make substantial cost savings in their cloud infrastructure. Svetlana Burninova recently used Karpenter to build a multi-architecture EKS cluster and managed a 70% reduction in cost whilst also improving performance.

  • Amazon DocumentDB Serverless: Auto-Scaling Database Solution for Variable Workloads

    AWS has launched Amazon DocumentDB Serverless, an auto-scaling database solution compatible with MongoDB, tailored for variable workloads. While marketed as "serverless," it functions more like auto-scaling, charging from $30/month. Ideal for enterprises and SaaS vendors, it adeptly handles spikes in demand, particularly for AI-driven applications.

  • Inflection Points in Engineering Productivity for Improving Productivity and Operational Excellence

    As companies grow, investing in custom developer tools may become necessary. Initially, standard tools suffice, but as companies scale in engineers, maturity, and complexity, industry tools may no longer meet needs. Inflection points, such as a crisis, hyper-growth, or reaching a new market, often trigger investments, providing opportunities for improving productivity and operational excellence.

  • Lessons Learned from Growing an Engineering Organization

    As their organization grew, Thiago Ghisi's work as director of engineering shifted from being hands-on in emergencies to designing frameworks and delegating decisions. He suggested treating changes as experiments, documenting reorganizations, and using a wave-based communication approach to gather feedback, ensuring people feel heard and invested.

  • Optimizing Amazon ECS with Predictive Scaling

    Amazon Web Services (AWS) recently released Predictive Scaling for Amazon ECS, an advanced scaling policy that employs machine learning (ML) algorithms to anticipate demand surges, ensuring applications remain highly available and responsive while minimizing resource overprovisioning.

  • Staying Innovative on a Journey from Start-Up to Scale-Up

    As ClearBank grew, it faced the challenge of maintaining its innovative culture while integrating more structured processes to manage its expanding operations and ensure regulatory compliance. Within boundaries of accountability and responsibility, teams were given space to evolve their own areas, innovate a little, experiment, and continuously improve, to remain innovative.

  • Deezer Optimizes Kubernetes Autoscaling with Custom Metrics

    Popular music streaming service Deezer has written about using custom metrics to enable auto-scaling in its Kubernetes infrastructure. Server utilisation and performance issues made scaling applications to an appropriate size and number of replicas challenging, and Kuberenetes' HPA scaling alone didn't solve these issues. So Deezer turned to custom metrics.

  • Kubernetes Autoscaler Karpenter Reaches 1.0 Milestone

    Amazon Web Services (AWS) has released version 1.0 of Karpenter, an open-source Kubernetes cluster auto-scaling tool. This release marks Karpenter's graduation from beta status and introduces stable APIs and several new features. Karpenter, initially launched in November 2021, has evolved into a comprehensive Kubernetes-native node lifecycle manager.

  • How Tech-Enabled Networks of Software Teams Work

    To maintain agility at scale, software teams can use technological and organizational solutions to reduce dependencies and work autonomously. According to Fabrice Bernhard, collaboration technology can be leveraged to create a distributed network of teams. To empower their teams, leaders can support them with a systematic problem-solving culture aimed at delivering good products to customers.

  • How to Build Large Scale Cyber-Physical Systems

    To build large-scale safety-critical systems, we need to decompose the system into smaller solvable problems, resolve what is known, and resolve unknowns through experiments, Robin Yeman argued. She suggested investing in test environments for both software and hardware early to enable being test-driven early to increase the safety, security, reliability, and availability of the systems.

  • Expedia Open-Sources Container-Startup-Autoscaler (CSA) for Scaling Kubernetes Workloads

    Expedia's Performance and Reliability team has recently open-sourced its container-startup-autoscaler (CSA). It is a Kubernetes controller leveraging the In-Place Update of Pod Resources feature to dynamically adjust CPU and/or memory resources of containers during startup based on user-defined startup/post-startup configurations.

  • DigitalOcean Introduces CPU-Based Autoscaling for its App Plaform

    DigitalOcean has launched automatic horizontal scaling for its App Platform PaaS, aiming to free developers from the burden of scaling services up or down based on CPU load all by themselves.

  • How to Create a UI That's Both Robust and User Friendly

    The key challenge in building UIs is balancing ease of use and maintainability, with scale and complexity. It requires thoughtful component design and an understanding of common usage paths to create a UI that's both robust and user-friendly. Automation can be a game-changer when it comes to improving efficiency and consistency in your codebase.

BT