StorageOS aims to make container storage flexible by providing a single view of the underlying storage and exposing APIs for automation.
StorageOS is a virtualization layer that offers a unified view of a pool of available storage. This view is accessible as a volume. It itself runs as a container and has a Docker volume plugin for accessing the volume. The volume can also be accessed outside the container directly. The underlying storage can span many underlying containers running StorageOS. InfoQ got in touch with Alex Chircop, Founder and CTO at StorageOS, to learn more about the technology behind the solution. According to Chircop, StorageOS "can access different types of storage on the backend through a single layer. The virtualization engine currently supports physical and virtual disks. We are looking to add functionality to use object stores such as S3."
A distributed storage system has to deal with fault-tolerance and latency. To achieve fault-tolerance, “the pool is protected through erasure coding and replicas. Erasure coding is used within a node to protect against disk failures and replicas are used across nodes to protect against node failures”, says Chircop.
The StorageOS volumes might span multiple hosts so they are created on the nodes where the container is instantiated to reduce latency. The software is designed to be optimized for Solid State Drives (SSDs), Chircop says, by using an SSD aware layout and reducing write amplification in the algorithms. Write amplification is a problem encountered while writing to SSDs. It occurs due to the re-reading of already written data, updating it and writing to a new location as part of the process of rewriting. SSDs here also include Non-Volatile Memory Express (NVME) devices, which is a PCI Express bus-based specification for accessing non-volatile storage media like SSDs.
Stateless applications are more suited for containers than stateful applications since the latter need persistent storage. Existing storage architectures are not amenable to automation. One of the stated aims of StorageOS is to achieve the same flexibility in operations that compute has in container environments. For example, the StorageOS Docker plugin provisions storage on the fly and is directly integrated with APIs and the control plane. A “docker run” command can provision and mount the storage in the process of starting up the specified container. There are also plans to integrate with Kubernetes.
To better integrate with orchestration pipelines, there is a feature called labels. A label can indicate location, a specific app, or an environment like QA or staging. Tagging a volume with a label activates that feature on the volume.
Docker recently acquired Infinit, a startup that has a portable distributed filesystem or storage layer. How does StorageOS differ from their offering? Chircop responded:
The lack of persistent container storage is a problem in the enterprise. Docker accelerating stateful distributed storage is an easy starting point for customers, which moves them to an enterprise persistent container storage requirement sooner. Infinit has a distributed filesystem, whilst databases and message queues need fast, deterministic performance and consistency guarantees which are key features of the StorageOS architecture.
StorageOS can also integrate with public clouds like AWS. Since it is deployed as a container, it can be installed on any platform which runs containers. Encryption can be enabled to comply with data privacy requirements.