BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News TikTok Owner Open-Sources Next Gen Kubernetes Federation Tool

TikTok Owner Open-Sources Next Gen Kubernetes Federation Tool

ByteDance, the company behind popular global platforms like TikTok, has unveiled KubeAdmiral, its next-generation cluster federation system for Kubernetes. This is designed to manage multiple clusters with the efficiency and effectiveness comparable to "a seasoned navy admiral commanding a fleet".  KubeAdmiral scales to run more than 10 million pods across dozens of federated Kubernetes clusters.

Based on the foundations of the older KubeFed system, KubeAdmiral boasts enhanced multi-cluster orchestration and scheduling capabilities, specifically tailored for various mainstream business scenarios. Currently managing over 100,000 microservices with more than 10,000,000 pods across dozens of federated Kubernetes clusters at ByteDance, KubeAdmiral handles 30,000 upgrade and scaling operations daily. It maintains a stable deployment rate of 95-98%, significantly reducing the need for manual intervention.

Having run KubeFed for a number of years, ByteDance witnessed phenomenal growth in its cloud-native infrastructure. The surge in the size and number of Kubernetes clusters at ByteDance, coupled with the diversification of workloads beyond stateless microservices to include stateful services, storage, offline, and machine learning jobs, prompted the need for a more robust and scalable cluster federation system.

In a comprehensive post, ByteDance's Gary Liu has delved into the details of KubeAdmiral's objectives and functionality.

KubeAdmiral Architecture

One of the standout features of KubeAdmiral is its rich multi-cluster scheduling capabilities. The scheduler, often referred to as the "brain" of KubeAdmiral, plays a pivotal role in computing the desired placement of workloads in member clusters. Unlike its predecessor KubeFed, KubeAdmiral has comprehensive scheduling semantics, supporting more flexible and fine-grained mechanisms for cluster selection and scoring via labels and taints, and scoring of clusters based on load and affinity. Automatic dependency scheduling is also a notable feature, allowing dependencies such as ConfigMaps to automatically follow their Deployment to corresponding member clusters.

Taking inspiration from kube-scheduler's design, KubeAdmiral introduces a flexible framework dividing scheduling into four distinct stages: Filter, Score, Select, and Replica. Each stage is handled by individual plugins, promoting modularity and allowing users to add or remove their own scheduling plugins without affecting others. This design simplifies the scheduler logic and reduces overall complexity.

KubeAdmiral also addresses challenges related to unschedulable workloads by automatically migrating these to other clusters, ensuring that temporary issues such as node outages, resource shortages and affinity problems don't compromise reliability. The system also introduces dynamic weight scheduling based on real-time cluster resource utilization, achieving dynamic load balancing across member clusters.

In contrast to KubeFed's replica rescheduling algorithm, which could often lead to skewed and inconsistent utilization depending on how frequently a cluster was deployed to, KubeAdmiral has a more refined algorithm based on real-time cluster utilization instead. KubeAdmiral also refines KubeFed's approach to rescheduling pods for optimum utilization by reducing pod churn and minimising changes made to rebalance clusters.

KubeAdmiral also goes beyond its predecessors by fully using native Kubernetes APIs rather than a separate federation API for cluster management. This permits users of KubeAdmiral federated clusters to see a consolidated view of deployments across multiple clusters, rather than having to look at each individual cluster in turn. This use of the native Kubernetes API also helps new users to seamlessly transition from single-cluster to multi-cluster architecture. More expansive details on these improvements are listed in the associated blog post.

Having been incubated within ByteDance and served as a crucial component of their internal PaaS platform, KubeAdmiral has been officially open-sourced on GitHub. Looking ahead, ByteDance plans to continue refining KubeAdmiral, focusing on improving orchestration and scheduling capabilities for stateful and job-like workloads. They also aim to enhance the user experience, optimize logging and metrics, and explore features like one-click migration from a single cluster.

ByteDance encourages developers and the broader community to explore KubeAdmiral, try it out, and provide valuable feedback. They extend an invitation to join the KubeAdmiral community and contribute to the ongoing evolution of this innovative cluster federation system.

About the Author

BT