Key Takeaways
- The generational hypothesis is key to efficient modern garbage collection
- HotSpot counts the number of collections an object has survived to implement generational GC
- The Parallel collector is still the most widely used Java GC
- Algorithmic complexity of GC is difficult to reason about concisely
- Compacting collectors (like ParallelOld) behave completely differently to in-place collectors
In Java 8, the default garbage collector for the HotSpot VM for the old generation is called ParallelOld. In Java 11, this default has been changed to the G1 collector instead.
NOTE: Technically, the collector switch happened with Java 9, but major further enhancements to G1 were made in Java 10 and 11 and in practice, very few companies are running anything other than the LTS releases of Java.
In this article, we will discuss some basics of garbage collection theory, and how they are implemented in HotSpot’s collectors. This will pave the way for a discussion of why the default has been changed and other recent changes in Java’s approach to garbage collection.
Basic Concepts
Garbage collection is a system "house-keeping" activity, separate from the main processing of the application, that seeks to determine memory that is no longer in use and automatically recover it for reuse.
Dijkstra’s definition makes it clear that while reference counting is a form of automatic memory management, it is not a form of garbage collection.
Reference counting works by updating per-object metadata as the program runs (e.g. when setting a field of reference type to a new value). The work needed to update the metadata takes place on the application thread and so cannot be cleanly divided into a separate activity.
Practical GC algorithms proceed by starting from GC roots – a set of objects that are known to be live – and determining all live objects by following pointers.
These tracing collectors implement graph theory algorithms to partition heap memory into live and reclaimable sets.
In modern GC literature, the words concurrent and parallel are both used to describe collection algorithms. They sound as though they should be synonyms, but in fact they have completely different meanings:
- Concurrent - GC threads can run while application threads are running
- Parallel - Multiple threads are used to execute the garbage collection algorithm
The terms are not equivalent. Instead, they can be thought of as the duals of two other terms - concurrent is the opposite of stop-the-world and parallel is the opposite of single-threaded.
Real production collectors will have a number of phases within the GC duty cycle, and each phase may display a mixture of characteristics.
For example, it would be entirely possible to have a phase which is single-threaded and concurrent, or parallel and stop-the-world.
NOTE: Concurrent collectors are much more complex than stop-the-world collectors. Not only that, but they are expensive in terms of computation expended and have other caveats about their behaviour.
Some more GC terms that you should know:
- Exact - An exact collector has enough type information that it can always tell the difference between an int and a pointer that must be traversed.
- Evacuating - Moves (evacuates) all live objects to another region of memory. At the end of the collection cycle, the source memory region is totally empty, and can be reused.
- Compacting - At the end of the collection cycle, surviving objects are arranged as a single contiguous block at the start of the memory region with the rest of the area free for reuse.
The dual of an exact GC scheme is a conservative scheme. These lack the information of an exact scheme and as a result are typically more wasteful of memory.
Some sources also refer to moving collectors - which includes both compacting and evacuating algorithms. However, the differences between the two subtypes are so significant that grouping them together is not often useful.
Collectors that are not moving are referred to as in-place. These algorithms need a free list of blocks of available memory, to deal with memory fragmentation, and to coalesce free blocks.
Practical Considerations in HotSpot
Let's start by considering some basic facts that follow from the definitions:
- Objects allocated by a moving collector do not have stable addresses over their lifetime.
- Compacting collectors will avoid memory fragmentation.
- Evacuating collectors can be written to avoid fragmentation, and achieve partial compaction of the surviving objects.
- If the heap consists of only a single memory pool, it cannot be collected by an evacuating algorithm.
The Generational Hypothesis is an observed runtime behaviour of object oriented systems, and roughly divides objects into two classes - short-lived temporary objects, and longer-lived objects that participate in the working set of the program.
NOTE: Generational collectors are not guaranteed to always be more efficient than non-generational, but in practice, almost all applications benefit from generational GC.
The mark-sweep-compact nomenclature (e.g. as per Blackburn and McKinley) for GC algorithms is:
- Marking: Identifying liveness via a trace of the object graph.
- Sweeping: Identifying free space while leaving live objects in place.
- Evacuating: Freeing space by moving survivors to another pool.
- Compacting: Freeing space by moving survivors within the same pool.
In generational GC, the young and the old collectors are often completely different algorithms.
One consequence of this is that it is hard to accurately label the collectors where different phases employ different algorithms. For example, in CMS the young generation is collected by evacuation, and the old generation is collected by in-place mark-sweep, with a fallback to a mark-compact if the concurrent collection fails (e.g. due to fragmentation).
Young Collections in HotSpot
In HotSpot, the traditional collectors divide memory into 4 memory pools named Eden, Survivor 0, Survivor 1, and Tenured. The first three of these are collectively grouped into Young space and Tenured is regarded as Old space.
The young spaces are then collected during a Young collection. This uses parallel, stop-the-world evacuation to collect the young generation by moving surviving objects into one of the survivor spaces.
The collection algorithm marks the live data in the currently active memory pool and then evacuates it to the inactive pool. At the end of the collection the roles of the two survivor spaces are reversed - the active pool becomes inactive (i.e. empty) and the inactive pool becomes active. This is sometimes called a hemispehric collection.
The hemispheric approach can be wasteful with memory. A single-pass algorithm cannot know upfront how much of the memory area being collected is still live. This means that the to-space (the area to which objects are being evacuating) must be as large as the area being cleaned - so the algorithm needs double the actual survivor size.
It also means that half of the space sits empty at any given moment. These properties would make it a bad fit for the old generation on modern workloads, where the working set may be quite large: in fact no production HotSpot collector uses hemispheric collection for the old generation.
Instead, hemispheric collection is used for the young generation. It fits well for workloads where the generational hypothesis applies - i.e. areas that are mostly composed of garbage. The collector benefits from the fact that surviving objects can be promoted out of the young generation into the old generation.
Another major advantage of evacuating collectors is how they handle free space. The simplest approach is to use a top of area pointer to act as a free list. As live data is evacuated it will be compacted "naturally" - essentially for free.
The evacuating approach, which is typical of the OpenJDK young generational collectors, uses a tracing pass. However, the collection is performed in just a single pass - no separate mark, sweep or compact phases.
Consequences of the Generational Hypothesis
Object lifetimes are never normally known, and in real applications can change dynamically. Tracking of the actual age of an object in wallclock time is therefore neither achievable nor productive.
Instead, HotSpot records the number of collections that an object has survived. This takes just a few bits of metadata in the object header, and when the object has survived sufficient collections, it is physically moved (promoted) to the old generation and managed by a different collector.
This mechanism has an interesting interaction with the application’s allocation rate. If the allocation rate rises then the young generation will fill more quickly - but the expected lifetime of "short-lived objects" (measured in milliseconds) will remain constant.
This can lead to more objects surviving GC cycles, which in turn can lead to the young generation's survivor spaces overflowing with objects that are not yet eligible for promotion to the old generation. In this circumstance, the JVM has no choice other than to promote some objects early - leading to the effect known as premature promotion.
Many of these objects are actually short-lived and will die quickly after arriving in the old generation. Unfortunately, the JVM has no mechanism to collect them other than waiting until the next collection of the old generation.
Algorithmic Complexity of Garbage Collection
It is quite common for developers to want to apply complexity analysis ("big-O notation" as it is sometimes called) to garbage collection algorithms. However, in practice, this approach is not actually very satisfactory.
Naively, it might be supposed that the time complexity of the mark and compact phases would be linear in the size of the live set, whilst the sweeping phase would be linear in the size of the overall heap size.
However, even leaving aside the issue that the separation of phases may not be clean in actual implementations (as for the HotSpot young generational collectors discussed above), a deeper problem exists.
This is the simple fact that garbage collectors are, by their very nature, among the most general purpose algorithms in use. This means that the inherent assumption in big-O analysis - that what matters is the limiting behaviour as the data set size increases - is simply not valid.
GC algorithms for production systems are required to behave acceptably across an entire range of possible inputs and workloads. Their asymptotic behaviour is not a suitable proxy for their overall performance at all.
Put another way, the live set and heap size can vary essentially independently (e.g. due to differing object graph topologies). This means that the multiplicative scaling factors can cause very different effects for different workloads.
For example, compaction requires copying bytes, so although a compaction phase would be linear in the size of the live set, one of the other factors would be the size of the objects that need to be moved. In the case of a large number of arrays with many elements, this could quickly swamp all other considerations for pause time.
There are also well-known second-order effects for the various different forms of GC algorithms. For example, when performing compaction on a memory area which has very few surviving objects ("sparse heap") then the live data will be coalesced into a much denser area. This area will then be much less sparse for subsequent GC cycles, provided the objects are truly long-lived.
Comparing this to an in-place collector, like CMS, we can see that long-lived objects will remain sparsely distributed for the entire lifetime of the program. In fact, over time, the free space will become more and more fragmented and the management of the list of free memory blocks (the free list) will become more and more expensive.
Overall, the time and space cost models for the different approaches to GC are very different and naive algorithmic complexity is just not that useful. In HotSpot's case, the ultimate failure mode of an in-place collector is to fall back to a compacting collector if insufficient contiguous space can be found. This is a significant behavioural effect, and not one that is covered by a simplistic approach.
Summary
We’ve discussed some of the fundamental aspects of garbage collection as it is implemented in the Java virtual machine. Garbage collection is a mature field of computer science, and the collectors present in HotSpot are robust and well-tested and should behave well over a very large class of workloads. Most Java applications just don't need to worry too much about their GC behaviour.
For the small percentage of cases which are sensitive to the behaviour of GC, a deeper understanding of the principles of garbage collection (and how it is implemented in the JVM) is a very useful tool for the developer to have in their toolkit.
In recent releases of Java the garbage collection subsystem has become an active area of improvement once again. To fully understand these changes, however, it is necessary to have a good grasp of the basics. A forthcoming article will discuss these updates in detail and explain, for example, why the default collector has been changed and what this means for teams that are upgrading to Java 11.
About the Author
Ben Evans is a co-founder of jClarity, a JVM performance optimization company. He is an organizer for the LJC (London's JUG) and a member of the JCP Executive Committee, helping define standards for the Java ecosystem. Ben is a Java Champion; 3-time JavaOne Rockstar Speaker; author of "The Well-Grounded Java Developer", the new edition of "Java in a Nutshell" and "Optimizing Java" He is a regular speaker on the Java platform, performance, architecture, concurrency, startups and related topics. Ben is sometimes available for speaking, teaching, writing and consultancy engagements - please contact for details.