BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News TCMalloc, Google's Customized Memory Allocator for C and C++, Now Open Source

TCMalloc, Google's Customized Memory Allocator for C and C++, Now Open Source

This item in japanese

Google's TCMalloc can be used as a replacement for C and C++ default memory allocators to provide greater efficiency at scale and better support for parallelism, says Google.

To clear up any ambiguity, it is worth noting this is actually the second time Google open-sources its memory allocator. Indeed, Google had already provided its memory allocator as a part of Google Performance Tools in 2005 along with many other tools, including a memory profiler, a heap checker aimed to ensure heap consistency, and Perl-based ppro profile analyzer and visualizer. As it happens, though, the internal version in use at Google diverged with time from the external one, so Google is now open sourcing its current version of TCMalloc, which contains several improvements such as per-CPU caches, sized delete, fast/slow path improvements, and more.

This repository is Google’s current implementation of TCMalloc, used by ~all of our C++ programs in production. The code is limited to the memory allocator implementation itself.

As hinted above, TCMalloc includes implementations for the C *alloc family and for C++ ::operator new and ::operator delete. These provide a number of optimizations over their respective counterparts that come with the C and C++ standard libraries. For example, TCMalloc performs allocations from the OS using fixed-size "pages", which simplifies bookkeeping. Additionally, some of those pages are dedicated to objects of specific sizes, e.g., all 16-bytes objects. This also brings a simplification when it is time to get or release that memory. Finally, commonly-used objects are cached for speed of operation.

TCMalloc also supports telemetry extensions via MallocExtension which can be useful to gather heap profiles and snapshots to investigate memory behaviour.

A number of configuration options are available to tune TCMalloc performance. In particular, you can define the logical page size, which can be 4KiB, 8KiB, 32KiB, or 256KiB. Larger page-sizes will reduce the probability of requiring a new page allocation from the OS, thus speeding up operation at the cost of larger memory consumption. It is also possible to set the cache size on a per-thread or per-CPU basis, which is the default. Similarly to page sizes, larger cache sizes will improve performance. Finally, you can tune how aggressively memory is released, which also affects performances in several ways.

The following diagram shows TCMalloc architecture, which is thoroughly described in the relevant document:

TCMalloc can only be built using Bazel, Google's internal build system, and this might come as a less-than-ideal surprise to some developers using other build systems. Bazel, though, is available in binary format for macOS, Ubuntu, Fedora, and Windows, so this should not really be a major hindrance.

Rate this Article

Adoption
Style

BT