Aleksey Shipilëv, performance and OpenJDK developer at RedHat, has filed a new JEP draft to create a no-op garbage collector; that is, a GC that doesn't actually reclaim memory. This collector is aimed at aiding JVM implementers and researchers and, to a lesser extent but perhaps more interesting for the public, ultra-performant applications that generate little to no garbage. If the JEP goes ahead, the new GC would be available together with the existing ones, and would have no effect unless explicitly activated.
Garbage Collection and Java performance is always a complex topic to handle, and to help clarify things InfoQ reached out to Java Champions and performance experts Martijn Verburg and Kirk Pepperdine. We also talked to Remko Popma, who leads the transformation to a garbage-free Log4j, about how this objective can be achieved. Verburg and Popma confirmed that, in their view, the main beneficiaries of a no-op GC, or Epsilon GC as it has been called, would be GC developers and performance researchers. Epsilon GC can serve as the control variable to measure the performance of other garbage collectors. As a simplified example, we could have an application running with a no-op GC for a reduced GC overhead (barring other considerations about memory allocation and mutation control). If the same application is then run with the same workload but with different GC algorithms configured, the difference in performance would indicate the impact that the garbage collector has on the application. This will help GC developers and performance researchers understand the behaviour of garbage collectors in a more isolated manner.
"I think that this is actually a great step forwards for allowing more accurate benchmarking of various parts of the JVM (such as the existing JIT C1/C2 compilers, the possible shift to Graal, etc). It will really add extra longevity for the JVM." Martijn Verburg
On the other hand, ultra-performant applications could benefit from Epsilon GC. There is a rare breed of applications and libraries, like the aforementioned Log4j, that have been implemented in such a way that they produce no garbage, and therefore have no need for a garbage collector; for this type of applications, performance could be improved by removing the overhead of the collector. However, as Popma highlighted, building a library that can run with Epsilon GC "would take significant engineering effort to make sure an application's memory is managed carefully enough that it won't run out", and even then a risk and benefit assessment has to be made to ascertain whether the gains obtained by choosing a no-op GC are consistent with the difficulty of achieving a zero-garbage state.
It might seem difficult to envision how an application can be written to produce no garbage however, and although the topic is much more complex than what could be explained in this article, it might be easier to understand by taking into account the following considerations:
- Memory is managed through two different mechanisms in the JVM: the heap and the stack; this is why there are two different errors regarding a lack of memory (
OutOfMemoryError
andStackOverflowError
). Memory placed on the stack is only visible by the current thread and during execution of the current method; therefore, when the current thread leaves the current method, this memory is automatically released without the need of a garbage collector. Memory in the heap, however, is accessible by the entire application at any point, which means an independent garbage collector needs to verify when a piece of memory is no longer in use and can be reclaimed. - Allocation of primitives always goes on the stack, and therefore poses no pressure for the garbage collector. If one was to write code using mostly primitive types, there would be fewer objects for the garbage collector to look after.
- Producing no garbage is not the same as producing no objects; objects can still be created without the need of a garbage collector to look after them:
- An application or library could produce a number of objects at the beginning and then reuse them constantly; this requires the developer to understand the memory footprint of the application very well.
- Sometimes the compiler can understand that a particular object is not going to be used outside of a method; this is known as escape analysis. When the object is found not to escape the method, it can be allocated to the stack as opposed to the heap, and therefore eliminated automatically as soon as the current method finishes.
Although all of this is possible, Pepperdine pointed out that this is a very unnatural way of writing code that implies losing many of the benefits that Java provides, while Verburg also indicated that memory management is precisely one of the main reasons why Java has been so successful in the industry. In addition to this, we need to bear in mind that, despite its name, the garbage collector doesn't only have the task of reclaiming unused memory, but also allocating new blocks, and Epsilon GC would still have to do this. Pepperdine uses this argument to suggest that, at least in theory, there would be no significant difference in performance between having Epsilon GC and having any other garbage collection algorithm tuned to the point where it doesn't actually do any garbage collection.
However, even considering all these caveats, both Pepperdine and Verburg confirmed that there is a small audience for which this kind of behaviour could be very useful, but the fact that this audience is small poses, according to Kirk, doubts about its usefulness. The vast majority of applications out there need to reclaim memory at some point, and therefore needs a functional Garbage Collector.
"Reasonable GC pause times are simply not an issue for most applications, so why give up all the benefits of Java for a questionable performance win." Kirk Pepperdine
Pepperdine also reminded us that every new feature adds maintenance costs to OpenJDK, and therefore OpenJDK developers should take into consideration the big picture when adding them. Oracle has been reducing the number of available garbage collectors precisely to reduce maintenance costs, and therefore adding a GC that will only be useful to a small percentage of users may not represent the right investment. Shipilëv did seem to consider these points when filing the JEP draft though, and the preliminary analysis seems to indicate that the overhead would be minimal in this case, judging by the contents of the JEP draft and by the prototype that has already been provided. In fact, both Pepperdine and Verburg pointed out that, given Shipilëv's experience, the fact that he is leading this iniatitive is reason enough to be optimistic about it.
On the other hand, Pepperdine also emphasised the fact that, although OpenJDK is the reference implementation of the JVM, there is no requirement for compliance of the garbage collector, which means vendors can implement their own algorithms and still be fully compatible with standard Java. This could cause a divison of opinions among the public: while some might think that implementing algorithms for niche markets may be something more appropriate for commercial versions of the JVM, others might consider this a very useful addition to OpenJDK.
@shipilev Zing actually has a no GC option. It's very useful for testing. Some people would love that option on OpenJDK.
— Nitsan Wakart (@nitsanw) February 12, 2017
Popma also recommended at least considering the use of commercial JVMs when performance becomes critical, since the cost of a license of one of these products could, overall, be lower than the cost of having engineering staff selecting and tuning a particular GC algorithm. However, even if someone was fixed on using only OpenJDK technologies, both Popma and Verburg mentioned the currently-in-development Shenandoah GC, which aims at producing ultra-low pause-times for very large heaps (100GB or more). In either case, the consensus among the experts seemed to be that, when it comes to application performance, a carefully-selected GC algorithm will almost always be better than no GC at all.
It is still early days for this proposal, and it needs to be reviewed and polished before it even becomes an official JEP. When and if this happens, a target version will eventually be added to it. Although at this time we can only speculate about what this target version might be, Verburg thinks that it is reasonable to expect Epsilon GC to be ready for Java 10 or 11. If anything, it would at least help understand what the interface of a GC should be, contributing to a more modular JVM.