The Cloud Native Computing Foundation (CNCF) has announced that the open-source distributed storage system CubeFS has reached graduation status. CubeFS was founded in 2017 and supports multiple access protocols, including POSIX, HDFS, S3, and its own REST API. CubeFS key platform targets are big data, AI/LLM applications, container platforms, and databases.
The key subsystems of CubeFS are:
- Resource management: Monitors data and metadata nodes while handling volume and partition information.
- Metadata: Uses memory-based shards with MultiRaft for high availability and consistency, supporting expansion through splitting.
- Data storage: Offers multi-copy and erasure code options to balance performance and cost.
Version 1.0.0 of CubeFS was released as open-source in March 2019. It entered the CNCF Sandbox in December 2019, moved to the Incubator in 2022, and graduated as a full CNCF project in January 2025. Since joining the CNCF in 2019, CubeFS's pool of contributors has diversified significantly - increasing from 27 people working for just five companies to nearly 400 people across 42 companies. Across its installed user base, approximately 350 petabytes of data is managed in CubeFS installations, supporting hundreds of thousands of clients. CubeFS is now used in over 200 organisations, including JD.com, NetEase, and Shopee, and Chinese mobile phone manufacturers OPPO, Xiaomi and Meizu. These companies have implemented CubeFS across commerce, cloud storage, and online media streaming applications.
In the announcement, Chris Aniszczyk, CTO of the CNCF, explains "Large-scale organisations like OPPO are already turning to CubeFS to run machine learning platforms in production and using AI training." He notes that "the stability, reliable performance, and active community of CubeFS have built the trust of adopters, and we look forward to seeing how adoption develops as AI and ML continue to drive the growth of data."
To achieve graduation as a CNCF project, CubeFS had to significantly improve its governance and code of conduct and complete a comprehensive security audit. This audit included threat modelling, a supply-chain security review, and code assessment for security vulnerabilities. In the graduation announcement, Haifeng Liu, CubeFS creator and maintainer, expressed confidence in the project's placement within CNCF: "Having worked with CNCF through other projects like Kubernetes and Vitess, I know it is the ideal home for open source cloud native projects. We look forward to making CubeFS the best open source unstructured data storage service for enterprises across both private and public cloud services."
CubeFS's entry into the CNCF landscape for distributed cloud-native storage puts it alongside a growing set of similar open-source projects, with Rook/Ceph, Longhorn and OpenEBS all commanding significant user bases. Rook is a Kubernetes Operator used to implement Ceph, a distributed storage platform with file, block, and object storage. OpenEBS offers similar functionality across a wide array of storage options, and Longhorn is a distributed storage platform for Kubernetes, but it focuses only on block storage.
CubeFS is mentioned in the Q3 2024 CNCF Technology Landscape Radar - with the radar's authors suggesting that companies in the batch/AI/ML area should look at adopting CubeFS, and ranking it highly for usefulness and maturity.
In a blog post, Benjamin Arntzen, a self-described distributed storage enthusiast, shares his experience exploring CubeFS, encouraged by its status as a CNCF project that supports exabyte-scale deployments. Arntzen explains the benefits of several key features in CubeFS, such as its native S3 gateway capabilities, Kubernetes persistent storage support, volume management, and horizontally scalable metadata storage that eliminates the need for large controller nodes. While CubeFS meets most of his requirements for distributed storage (high availability, data resilience, self-healing, performance, and erasure coding), it currently lacks storage tiering: the ability to intelligently distribute data across different storage types (SSDs, HDDs, NVMe). However, he notes the CubeFS team is actively developing this feature. The major challenge Arntzen encountered was around security. The default CubeFS deployment lacks proper authentication, making systems vulnerable on untrusted networks. To deal with this issue, he modified CubeFS by integrating authentication and encryption. Arntzen has shared this code on GitHub.
The project's plans for 2025 and beyond include optimising costs for the metadata service, a tiered storage implementation, distributed cache acceleration, and improved issue-tracking capabilities through call chain tracing. CubeFS is available for download and also on GitHub now.