Google improved the file system implementation in gVisor, the open source isolation layer used in its commercial container-oriented offerings, such as App Engine, Cloud Run, and Cloud Functions. According to Google engineers Ayush Ranjan and Fabricio Voznika, the new gVisor file system, dubbed VFS2, may improve performance of file-intensive workloads by 50%-75% approximately.
The main goal of gVisor is providing an isolation layer between a container and the underlying kernel, which is shared by all containers running on the same node. To prevent a malicious or otherwise vulnerable container from jeopardizing a whole node security, gVisor implements a large portion of the Linux system surface, including an Open Container Initiative-compliant runtime called runsc
that provides an isolation boundary between the application and the host kernel.
Since the gVisor kernel cannot be trusted, it doesn’t have direct access to the file system. File system operations are brokered by a proxy (called Gofer) that is isolated from a possibly malicious workload. Operations like open, create, and stat are forwarded to the proxy, vetted, and then executed by the proxy.
Google engineers discovered that the way gVisor Gofer file system handled path resolution by delegating it to the underlying file system using one RPC call per path component was detrimental to performance. This was especially the case for workloads performing frequent file operations, such as build tasks or running Python and NodeJS programs with a large number of imports.
Addressing this challenge required enabling gVisor’s Sentry with the ability to delegate path resolution directly to the file system. [...] As an example, in VFS1 stat(/foo/bar/baz) generates at least three RPC to the gofer (foo, bar, baz) whereas VFS2 only generates one.
Additionally, Google also took the chance to create a new protocol for communication between the gVisor sandbox and the Gofer. Called LISAFS (Linux Sandbox File system protocol), the new protocol reduces both the number of RPC calls as well as it memory usage, improving multiple path-component walks as well as speeding up file I/O.
Thanks to these changes, say Ranjan and Voznika, the overhead introduced by runsc
was reduced by 50%-75% according to a number of different metrics.
The largest improvements were measured when using a bind mount vs. when hosting the source code in the root file system or an in-memory file system. These results were obtained by running the official bazel benchmarks to build gRPC and Abseil.
The benchmarks results were substantially confirmed by empirical data showing that Google App Engine cold start times improved by more than 25% across the whole platform, with this figure including all kinds of workloads, and not only file system-intensive ones.