Based on observed behaviour on Android and Chrome OS, Google began working on a new page reclamation strategy for its Linux-based OSes aimed to improve how the virtual memory subsystem reclaims unused memory pages. More recent work shows the new MGLRU policy can benefit server environments, too.
Google's research on how the Linux kernel managed memory overcommit originated from analysis of both servers equipped with hundreds of gigabytes of memory as well as personal and mobile devices. In both cases, a Google engineer came to the conclusion that:
The current page reclaim is too expensive in terms of CPU usage and often making poor choices about what to evict. We would like to offer a performant, versatile and straightforward augment.
Two tenets of the current LRU-like implementation of page replacement in the Linux kernel fell under their scrutiny: classifying pages into active and inactive lists, and scanning those lists incrementally to find candidates for eviction, which lead to a number of inefficiencies, according to Google engineers.
In particular, incremental scans using rmap
resulted in high CPU usage and reduced performance in memory pressure situations, since it requires to scan many pages to find enough pages to reclaim. On the other hand, reasoning in terms of active and inactive pages did not appear to be useful for job scheduling in server environments and led to biased page eviction on Android and Chrome OS with negative impact on UI rendering.
The new policy, MGLRU, leverages instead the notion of generation numbers to move beyond the active/inactive distinction, and replaces incremental scans with differential scans via page tables. Roughly, this means pages are grouped into generations, with each generation being comprised of all pages referenced since the previous generation. Generations are discovered using differential scan. Older generations are marked evictable and are eventually evicted through a process of aging that keeps into account whether a page has been used since the last scan.
The cost of each differential scan is roughly proportional to the number of referenced pages it discovers. Unless address spaces are extremely sparse, page tables usually have better memory locality than the rmap.
According to Google's initial benchmarks, based on MGLRU roll-out to tens of millions of Chrome OS users and about a million Android users, the new policy led to 59% fewer OOM kills on Chrome OS and 18% fewer on Android, along with the improvements of other UX metrics.
Since the initial patch, submitted in March 2021, Google engineers have kept working on MGLRU to improve its performance and extend it to additional architectures. The latest patch, submitted in the first days of 2022, includes benchmarks for the most popular open-source memory-hungry applications, such as Apache Hadoop, Memcached, MongoDB, PostgreSQL, and others.
An independent lab evaluated MGLRU with the most widely used benchmark suites for the above applications. They posted 960 data points along with kernel metrics and perf profiles collected over more than 500 hours of total benchmark time. Their final reports show that, with 95% confidence intervals (CIs), the above applications all performed significantly better for at least part of their benchmark matrices.
Linus Torvalds endorsed Google's engineers' work on MGLRU by observing:
So I personally think this is worth going with, partly simply due to the reported improvements that have been measured. But also to a large extent because the whole notion of doing multi-generational LRU isn't exactly some wackadoodle crazy thing. We already do active vs inactive, the whole multi-generational thing just doesn't seem to be so "far out".
While the outlook is positive, it is not clear yet whether MGLRU will make it into 5.17 or some later version, though. InfoQ will keep reporting on progress on this new Linux feature as more details become available.