I have introduced and discussed the Garbage First Garbage Collector here on InfoQ in a couple of previous articles - G1: One Garbage Collector To Rule Them All and Tips for Tuning the Garbage First Garbage Collector.
Today I would like to discuss JEP 248, the proposal to make G1 the default GC, targeted for OpenJDK 9. As of OpenJDK 8, the throughput GC (also known as Parallel GC), and more recently - ParallelOld GC (ParallelOld means that both -XX:+UseParallelGC and -XX:+UseParallelOldGC are enabled) has been the default GC for OpenJDK. Anyone wanting to use a different garbage collection algorithm, would have to explicitly enable it on the command line. For example, if you wanted to employ G1 GC, you would need to select it on the command line using -XX:+UseG1GC.
The proposal to set G1 GC as a default GC for OpenJDK 9 has been a major source of community concern, which has given rise to a few amazing discussions, and has eventually led to the updating of the original proposal in order to incorporate a clause to provide the ability to revert back to using Parallel GC as the default.
So, Why G1 GC?
You may be familiar with the software optimization tradeoff: Software can be optimized for latency, throughput or footprint. The same is true for GC optimizations, and is reflected in the various popular GCs. You could also focus on two of those three, but trying to optimize for all three is enormously difficult. OpenJDK HotSpot GC algorithms are geared towards optimizing one of the three - for example, Serial GC is optimized to have minimal footprint, Parallel GC is optimized for throughput and (mostly) Concurrent Mark and Sweep GC (commonly known as CMS) is optimized for minimizing GC induced latencies and providing improved response times. So, why do we need G1?
G1 GC comes in as a long term replacement for CMS. CMS in its current state has a pathological issue that will lead it to concurrent mode failures, eventually leading to a full heap compacting collection. You can tune CMS to postpone the currently single threaded full heap compacting collection, but ultimately it can’t be avoided. You can tune CMS to postpone the currently single threaded full heap compacting fallback collection, but ultimately it can’t be avoided. In the future, the fallback collection could be improved to employ multiple GC threads for faster execution; but again, a full compacting collection can’t be avoided.
Another important point is that even for the well seasoned GC engineer, maintenance of CMS has proven to be very challenging; one of the goals for the active HotSpot GC maintainers has been to keep CMS stable.
Also, CMS GC, Parallel GC and G1 GC all are implemented with different GC frameworks. The cost of maintaining three different GCs, each using its own distinct GC framework is high. It seems to me that G1 GC’s regionalized heap framework, where the unit of collection is a region and various such regions can make up the generations within the contiguous Java heap, is where the future is heading - IBM has their Balanced GC, Azul has C4, and most recently there is the OpenJDK proposal called Shenandoah. It wouldn’t be surprising to see a similar regionalized heap-based implementation of a throughput GC, which could offer the throughput and adaptive sizing benefits of Parallel GC. So potentially the number of GC frameworks used in HotSpot could be reduced, thereby reducing the cost of maintenance, which in turn enables more rapid development of new GC features and capabilities.
G1 GC became fully supported in OpenJDK 7 update 4, and since then it’s been getting better and more robust with massive help from the OpenJDK community. To learn more about G1, I highly recommend the earlier mentioned InfoQ articles, but let me summarize a few key takeaways:
- G1 GC provides a regionalized heap framework.
- This helps provide immense tunability to the generations, since now the unit of collection (a region) is smaller than the generation itself. And increasing/ decreasing the generation size is as simple as adding/removing a region from the free regions list. Note: Even though the entire heap is contiguous; the regions in a particular generation don’t have to be contiguous.
- G1 GC is designed on the principle of collecting the most garbage first.
- G1 has distinct collection sets (CSet) for young and mixed collections (for more information please refer to this article). For mixed collections, the collection set is comprised of all the young regions and a few candidate old regions. The concurrent marking cycle helps identify these candidate old regions, and they are effectively added to the mixed collection set. The tuning switches available for the old generation in G1 GC are more direct, many more in number, and provide more control than the limited size tuneables offered in Parallel GC or the size and 'initiation of marking' threshold settings offered in CMS. The future that I envision here is an adaptive G1 GC that can predictively optimize the collection set and the marking threshold based on the stats gathered during the marking and collection cycles.
- An evacuation failure in G1 GC is also (so-to-speak) a “tunable”. Unlike CMS, fragmentation in G1 is not something that accumulates over time and which leads to expensive collection(s) and concurrent mode failures. In G1, the fragmentation is minimal and controlled by tunables. Some fragmentation is also introduced by very large objects that that don't follow the normal allocation path. These very large objects (also known as 'humongous objects') are allocated directly out of the old generation into regions known as 'humongous regions'. (Note: To learn more about humongous objects and humongous allocations please refer to this article). But when these humongous objects die, they are collected and thus the fragmentation dies with them. In the current state, it can still at times be a bit of a heap region and heap occupancy tuning nightmare, especially when you are trying to work with restricted resources; but again, making the G1 algorithm more adaptive would lead to the end user not encountering any failures.
- G1 GC is scalable!
- The G1 GC algorithm is designed with scalability in mind. Compare this with ParallelGC and you have something that scales with your heap size and load without much of a compromise in your application’s throughput.
Why Now?
The proposal is targeted for OpenJDK 9. OpenJDK 9 general availability is targeted for September of 2016, which is still a year away. The hope is that the OpenJDK community members that choose to work with early access builds and release candidates are the ones that can test the feasibility of G1 GC as the default GC and also help with providing timely feedback and even provide code changes.
Also, the only end users that are impacted are those who do not set an explicit GC today; those that set an explicit GC on the command line, are not impacted by this change. The ones who do not set a GC explicitly will be using G1 GC instead of Parallel GC, and if they want to continue to use Parallel GC, they merely have to set -XX:+UseParallelGC (the current default that enables parallel GC threads for young collection) on their JVM command line. Note: With the introduction of -XX:+UseParallelOldGC in JDK 5 update 6; for all recent builds, you will find that if you set -XX:+UseParallelGC on the JVM command line, -XX:+UseParallelOldGC will also be enabled, hence parallel GC threads will also be employed for full collections. Hence, if you are working with >JDK 6 builds, setting either of these command line options will offer the same GC behavior as you had previously.
When Would You Choose G1 GC over Parallel GC?
As mentioned in this article, Parallel GC doesn’t do incremental collection, hence it ends up sacrificing latency for throughput. For larger heaps as the load increases, the GC pause times will often increase as well, possibly compromising your latency related system level agreements (SLAs).
G1 may help deliver your response time SLAs with a smaller heap footprint, since G1’s mixed collection pauses should be considerably shorter than the full collections in Parallel GC.
When Would You Choose G1 GC over CMS GC?
In its current state, a tuned G1 can and will meet the latency SLAs that a CMS GC can’t due to fragmentation and concurrent mode failures. Worst case pause times with mixed collections are expected to be better than the worst case full compaction pauses that CMS will encounter. As mentioned earlier, one can postpone but not prevent the fragmentation of a CMS heap. Some developers working with CMS have come up with workarounds to combat the fragmentation issue by allocating objects in similar sized chunks. But those are workarounds that are built around CMS; the inherent nature of CMS is that it is prone to fragmentation and will need a full compacting collection. I am also aware of companies like Google who build and run their own private JDK built from OpenJDK sources with specific source code changes to help their needs. For example in an effort to reduce fragmentation, a Google engineer has mentioned that they have added a form of incremental compaction to their (private) CMS GC’s remark phase and have also made their CMS GC more stable (see: http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-July/019534.html).
Note: Incremental compaction comes with its own costs. Google probably added incremental compaction after weighing the benefits to their specific use-case.
Why Did The JEP Become Such A Hot Topic?
Many OpenJDK community members have voiced their concern over whether G1 is ready for prime time. Members have provided their observations on their experience with G1 in the field. Ever since G1 was fully supported, it has been touted as a CMS replacement. But the community has concerns that with this JEP it now feels like G1 is in fact replacing Parallel GC, not CMS. Hence, it is widely believed that while there may be data comparing CMS to G1 (due to businesses migrating from CMS to G1) there is not sufficient data comparing Parallel GC (the current default) to G1 (the proposed default). Also, field data seems to indicate that most businesses are still using the default GC, and so will definitely observe a change in behavior when G1 becomes the default GC.
There have also been observations that G1 has showcased some important (albeit very hard to reproduce) issues of index corruption and such issues need to be studied and rectified before G1 is made the default.
There are also others that ask whether we still need a single default GC that is not based on “ergonomics”. (For example, since Java 5, if your system identified as a “server-class” system, the default JVM would change to server VM instead of client VM (ref: http://docs.oracle.com/javase/7/docs/technotes/guides/vm/server-class.html)).
Summary
After much back and forth, finally Charlie Hunt, Performance Architect at Oracle summarized and proposed the following plan moving forward (Note: The excerpt below is referenced from here: http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-June/018804.html):
- “Make G1 the default collector in JDK 9, continue to evaluate G1 and enhance G1 in JDK 9
- Mitigate risk by reverting back to Parallel GC before JDK 9 goes “Generally Available” (Sept 22, 2016 [1]) if warranted by continuing to monitor observations and experiences with G1 in both JDK 9 pre-releases and latest JDK 8 update releases.
- Address enhancing ergonomics for selecting a default GC as a separate JEP if future observations suggests it’s needed.”
Also, Staffan Friberg of Java SE Performance team at Oracle urged the community to help gather data points for key metrics. I have paraphrased Staffan’s message for conciseness:
- the startup time: to ensure that the G1 infrastructural complexity doesn’t introduce much delay at the Java Virtual Machine (JVM) initialization;
- the throughput: G1 is going head to head with the throughput GC. G1 also has pre and post write barriers. The throughput metric is the key in understanding how much of an overload can the barriers impose on the application.
- the footprint: G1 has remembered sets and collection set that do increase the footprint. Data gathered from the field should provide enough information to understand the impact of increased footprint.
- the out-of-box performance: businesses that go with the default GC, many-a-times also go with the out-of-box performance provided by that GC. Hence it is important to understand the out-of-box performance of G1. Here the GC ergonomics and adaptiveness plays an important part.
Staffan also helped identify the business applications that currently employ the default GC algorithm will be the ones impacted by the change in the default GC. Similarly scripts that don’t specify a GC or interfaces that specify just the Java heap and generation sizes on the command line with be impacted by the change in the default GC algorithm.
Acknowledgement
I would like to extend my gratitude to Charlie Hunt for his review of this article.
About the Author
Monica Beckwith is a Java Performance Consultant. Her past experiences include working with Oracle/Sun and AMD; optimizing the JVM for server class systems. Monica was voted a Rock Star speaker @JavaOne 2013 and was the performance lead for Garbage First Garbage Collector (G1 GC). You can follow Monica on twitter @mon_beck