BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Standardizing Native Java: Aligning GraalVM and OpenJDK

Standardizing Native Java: Aligning GraalVM and OpenJDK

Lire ce contenu en français

Key Takeaways

  • Native Java is essential for Java to remain relevant in the evolving cloud world.
  • Native Java is not a solved problem yet.
  • The development lifecycle needs to adapt as well.
  • Standardization through Project Leyden is key to the success of native Java.
  • Native Java needs to be brought into OpenJDK to enable co-evolution with other ongoing enhancements.

This article is part of the article series "Native Compilation Boosts Java". You can subscribe to receive notifications about new articles in this series via RSS.

Java dominates enterprise applications. But in the cloud, Java is more expensive than some competitors. Native compilation with GraalVM makes Java in the cloud cheaper: It creates applications that start much faster and use less memory.

So native compilation raises many questions for all Java users: How does native Java change development? When should we switch to native Java? When should we not? And what framework should we use for native Java? This series will provide answers to these questions.

 

Native Java Is Essential for Java to Remain Relevant in the Evolving Cloud World

Java has been the language of choice for enterprise applications and networked services for over two decades. It has an immensely rich ecosystem of middleware, libraries, and tools and an extremely large community of experienced developers. This makes it an obvious choice for developing cloud-based applications or moving existing Java applications to the cloud. However, there is a mismatch between the historical development of Java and its runtime and today’s cloud requirements. So Java needs to change to stay relevant in the cloud! Native Java is the most promising option here. Let’s explain the mismatch between traditional Java and the cloud.

The Java Virtual Machine (JVM) uses adaptive Just-in-time (JIT) compilation to maximize throughput of long lifetime processes. Peak throughput has been the priority, memory was assumed to be cheap and expandable, and startup times didn’t matter much.

Now infrastructure like Kubernetes or OpenShift, and cloud offerings from Amazon, Microsoft, or Google, scale through small, cheap containers with few CPU and memory resources. Because these containers start more often, the fixed JVM startup costs become much more significant as a percentage of the total runtime. And Java applications still need memory for JIT compilation. So, how can Java applications run efficiently in containers?

Firstly, Java applications increasingly run as microservices, performing less work than the monoliths they replace. That’s why they have smaller application data sets and need less memory.

Secondly, frameworks like Quarkus and Micronaut replaced the resource-hungry dynamic injection and transformation at startup with offline transformation to inject required services.

Thirdly, fixing long JVM startup times has proven very hard. It would be best to avoid precomputing the relevant compiled code, metadata, and constant object data at every startup altogether. The OpenJDK project has attempted this several times, most notably with the jaotc AOT compiler and Class Data Sharing. But jaotc has been abandoned, and Class Data Sharing is still a work in progress. OpenJ9, a Java implementation different from OpenJDK, has had some notable success with AOT compilation. But this has not achieved wide adoption.

These optimizations are difficult because the JDK runtime is also an abstraction and portability layer over the underlying hardware and operating system. Precomputation risks folding in build-time assumptions that are no longer valid at runtime. This problem is arguably the biggest challenge for native Java. That’s why prior efforts have focused on pre-generating content for the same runtime. Still, Java’s dynamic nature creates two more blocker issues.

Firstly, the JVM and JDK maintain a relatively rich metadata model compared to other AOT-compiled languages. Retaining information about class structure and code allows for compilation and recompilation of the code base as new classes are loaded into the runtime. That’s why for some applications, metadata footprint is significant relative to application data footprint.

The second issue is that most pre-generated code and metadata linkage must be indirect so it can be rewritten for changes later. The cost is twofold: loading the pre-generated content into memory requires linking of references, and execution uses indirect accesses and control transfers which slow down the application.

Native Java offers to remedy all these problems with one significant simplification: not supporting a dynamically evolving application. This strategy delivers fast startup and small footprint through a tightly linked, small executable with all initial code, data, and metadata precomputed at startup. This is indeed a solution but comes with costs that need to be understood. It also does not solve the problem of matching build-time assumptions to runtime configuration.

Native Java Is Not a Solved Problem Yet

At first glance, packaging appears to be the major difference between GraalVM Native and the JVM. A JVM application needs a Java runtime for the target host, including the Java binary, various JVM libraries, the JDK runtime classes, and the application JAR files.

By contrast, GraalVM Native takes all those JARs as a build-time input and throws in the JDK runtime classes plus some extra Java code providing JVM-equivalent functionality. It compiles and links all this into a native binary for one target CPU and operating system. The binary does not need class loading or a JIT compiler.

This full AOT compilation has two critical details: Firstly, it requires complete knowledge of all classes whose methods will execute. Secondly, it needs detailed knowledge of the target CPU and operating system. Both requirements raise significant challenges.

The Closed-World Assumption

The first requirement is known as the Closed-World Assumption. That closed world should only include code that will actually run. Closing an application starts by identifying which classes and methods are explicitly referenced from the main method of the entry class. That is a relatively straightforward, if complex, analysis of all bytecode code on the classpath and in the JDK runtime. Unfortunately, tracing references to classes, methods, and fields by name is not enough.

Linkage – Java provides indirect linkage of classes, methods, and fields without explicitly mentioning their names. What actually gets linked can depend on arbitrarily complex application logic, opaque to AOT analysis. Methods like Class.forName() can load classes, possibly with a name computed at runtime. Fields and methods can be accessed using reflection or method and var handles, once again perhaps derived from computed names. A smart analysis might detect cases where string literals are used – but not computed values.

Bytecode Generators – A worse problem is that a class may be defined via application-generated bytecode based on input data or the runtime environment. A related issue is the runtime transformation of bytecode. At best, it might be possible to modify some of these applications with an AOT-compiled equivalent. However, this is impossible for a whole class of applications.

Loader & Module Delegation – It is not just a matter of what types or code are available either. Even when it is known precisely which classes might be loaded, application logic can determine linkage and visibility of classes. Once again, such applications cannot use AOT compilation.

Resources & Service Loading – A similar difficulty arises when loading classpath resources. It is possible to identify resources in classpath JARs and put them into the native binary. But it may not be clear which ones are actually going to be used, and they may have a computed name. This is particularly important because it affects the JDK runtime’s model for service provision, including dynamic loading of functions such as the FileSystemProvider or LocaleProvider implementations. An AOT compiler could compile in support for every option — but at the expense of the executable's size and memory footprint.

Impact of Closed-World Requirement on Developers

All this means that developers now must guarantee that all required code for the target system is available at build time. GraalVM’s recent change to handling classpath resources is an example of this additional burden. Originally, classes missing at build time would abort the build. The --allow-incomplete-classpath option worked around this, turning build-time configuration errors into runtime ones. GraalVM recently made this workaround the default behavior. While this may smooth onboarding an application into native Java, the resulting runtime errors extend the compile-test-exception-fix cycle.

And then there are "day 2" costs of the closed world. Monitoring tools typically instrument classes at runtime. That instrumentation can, in theory, happen at build time. But this may be difficult or even impossible, especially with code specific to the current runtime configuration or data input at runtime. Monitoring is improving for native executables, but today, developers can’t rely on their usual tools and workflows to monitor native deployments.

Build Time vs. Runtime Compiler Configuration

The second requirement is a common problem for AOT compilation: It either aims for specific hardware and runtime capabilities of the target environment or generates vanilla code for a target environment range. This adds complexity to the compilation process: developers now must select and configure compiler options at build time that would normally be defaulted or configured at program start.

This is not simply a matter of targeting, say, Linux or hardware vector support, as with other AOT-compiled languages like C or Go. It also requires advance configuration of specific Java choices, such as the garbage collector or the application locale.

These latter choices are needed because compiling all functionality into the generated executable would make it much larger and slower than dynamic Java. The JIT compiler generates code for the specific capabilities of the current hardware and runtime environment. By contrast, an AOT compiler would have to introduce conditional code or generate multiple variants of compiled methods to allow for all possible cases.

Impact of Build-Time Compiler Configuration Requirement on Developers

AOT compilation makes Continuous Integration (CI) more complex. Looking to support Linux deployments on both x86-64 and aarch64? That doubles the compile time in the CI system. Also building native executables for developers on Windows and macOS? Another doubling of the CI compile time. All this increases the time till the pull request is ready to merge. 

And it will only get worse in the future. Testing out a different GC policy? That’s a full compile cycle rather than a command-line switch. Validating the effects of Compressed References on the application’s heap size? That’s another full compile cycle.

These continuous paper cuts in the development cycle rob developers of their joy. They slow experimentation and make gathering results onerous. Deployment times increase, resulting in delays getting changes into production and outages restored.

Build-Time Initialization

There is actually a critical third requirement for reducing startup time and footprint with AOT compilation. Native executables have no classloaders or JIT compiler, a more lightweight VM, and less metadata for classes and code. But AOT compilation does not necessarily mean fewer classes or methods: for the most part, the JVM runtime already only loads the needed code. So, the AOT compiler will not substantially reduce the amount of code at runtime or the time it takes to run it. That reduction requires a more aggressive policy that either removes code or replaces it with an equivalent that needs less space and time to execute.

The most crucial innovation of AOT compilation does precisely that: Most of the JVM’s work during application startup is initialization code for the static JDK runtime state — much of it repeated exactly the same every time. Computing this state at build time and including it in the native executable can improve startup enormously. The same applies to middleware and application state.

So build-time initialization avoids runtime work by doing it at build time. But it also allows dropping that build-time initialization code from the native executable as it only runs at build time. In many cases, that has the knock-on effect of removing other methods and classes because they were only called during startup. This combined effect reduces startup time and footprint in GraalVM the most.

Unfortunately, build-time initialization faces just as many, if not more, problems as the first two requirements. Most static initialization is simple, setting a field to a constant or the result of some determinate computation. The value is the same in any runtime environment on any hardware.

But some static fields depend on runtime specifics. Static initializers can execute arbitrary code, including code that depends on the precise order or timing of initialization, the hardware or operating system configuration, or even on data input to the application. If build-time initialization is impossible, then runtime initialization steps in. This is a per-class decision: Just one field that cannot be initialized at build time moves the whole class to runtime initialization.

Static field values can also depend on other static fields. So, validating build-time initialization requires a global analysis, not a local one.

Impact of Build-Time Initialization on Developers

While build-time initialization is a superpower of native Java, it can easily continuously create complexity for developers. Each build-time initialized static field forces build-time initialization to move like a wave through the reachable classes required to create its value.  

To show an example:

class A implements IFoo {
  static final String name = "A";

  void foo() { … }
}

class B extends A {
  private static final Logger LOGGER = Logger.getLogger(B.class.getName());

  static int important_constant = calculate_constant();

  ...
}

class BTIExample {
  static final B myB = new B();
}

Suppose the BTIExample class is initialized at build time. This requires all its superclasses and implemented interfaces and the classes referenced by its static initializers to be build-time initialized: BTIExample, Object, B, A, IFoo, Logger, String, Logger’s superclasses. And the classes used in the calculate_constant() method and in Logger.getLogger(), as well as in B’s constructor (not shown), also need to be build-time compatible.

Changes to any one of these classes — or the classes they depend on — may make it impossible to build-time initialize BTIExample. Build-time initialization can be viewed as a viral condition propagating through the class’s dependency graph. Innocent seeming bug fixes, refactorings, or library upgrades can force classes into build-time initialization to support another class or limit a class to runtime initialization instead.  

But build-time initialization can also capture too much information about the build environment. One example is capturing an environment variable representing the build machine and storing it in a static field of a build-time initialized class. This is routinely done in existing Java classes to ensure a consistent view of the value and avoid repeatedly fetching it. But in native Java, this may pose a security risk.

The Development Lifecycle Needs to Adapt as Well

Native Java does not just change application deployment. The whole development process changes: Not only do you as a developer need to think about adopting new frameworks, minimizing reflection and other dynamic behaviors, and taking maximum advantage of build-time initialization. You also need to examine your build and test processes.

You need to think about how your libraries are initialized because the build-time initialization of one library may require (or be blocked by!) another library. Every piece of state captured at build time needs validation to ensure that it does not capture security-sensitive information and that it is valid for all future executions.

Moving work to build time also means that building your application will take longer, locally and in CI. AOT compilation requires machines with lots of CPU and memory to fully analyze every element of your program. Native Java explicitly requires this tradeoff — the compilation time does not go away. It just moves from JIT compilation at runtime to AOT compilation at build time. It also takes a lot longer as the closed-world analysis and validation of build-time initialization are much more complex than JIT compilation.

As the article on Quarkus has shown, it is best to run your tests on the dynamic JVM. This exercises your business logic, runs unit and integration tests, and ensures all the pieces run correctly on the dynamic JVM.

Yet it is still essential to test the native executable: As the other articles have shown, the closed-world version of your application, built by your framework with the help of GraalVM Native Image, may be missing some pieces. And native Java does not promise bug-for-bug compatibility with the dynamic JVM. 

Now unit tests typically can’t run against the separate, native executable as required methods may not be registered for reflection or may have been eliminated from the native code. But including the unit tests in the native executable keeps additional methods alive, increasing the file size and security attack surface. After all, no one ships their unit tests on the dynamic JVM!

So testing has to be done on both the dynamic JVM and the native executable. What will you do for day-to-day development? Just test on the dynamic JVM? Compile to native before opening a pull request? Changes here will affect your inner loop speed.

And speaking of speed, those longer compile times will affect how quickly your CI pipeline runs. Does a longer build and longer test cycle change your DevOps metrics like mean-time-to-recovery (MTTR)? Maybe. Does it increase your CI costs due to more powerful machines for compilation? Maybe. Does it complicate using existing Application Performance Monitoring (APM) tools (like Datadog) and other instrumentation agents? Certainly.

There are tradeoffs here. Moving work to build time (and, by extension, to development time) is a choice: It provides excellent runtime benefits, but the costs don’t just disappear. Adopting native Java requires a lot of change. The benefits, while impressive, aren’t worth it for every use case. Think hard and prepare to make changes not just to your software but also to how you work.

Standardization Through Project Leyden Is Key To the Success of Native Java

One of Java’s superpowers is something quite boring: standardization. It is the stable bedrock of the entire Java ecosystem. Standardization ensures that, regardless of how many different implementations of the JDK exist, there is conceptually one Java language and one Java runtime model to target. The same application will run on any JDK — whether it is an OpenJDK derivative or an independent implementation like Eclipse OpenJ9. This builds confidence in developers: the platform "just works". Coupled with Java’s long-standing commitment to backward compatibility — that old Java 1.4 JAR file still runs today on Java 18 — we’ve seen the ecosystem of frameworks, libraries, and applications flourish. Java’s continued, careful, standards-backed evolution has been key to that growth.

Java’s standardization makes a promise that your application will keep running. That it is worth investing in this ecosystem. That libraries continue to work without needing significant changes for every release. That frameworks don’t have to reinvest in recreating themselves with every release. That application developers can focus on adding business value rather than continuously adapting to incompatible changes. Not all programming language ecosystems make these guarantees. And with Java releases coming fast and frequently, these guarantees are critical for Java to keep evolving without losing you, the developer.

These promises sound great. But how does "standardization" actually work here? When we talk about "the Java standard", we’re really talking about the Java Language Specification (JLS) and the JVM Specification (JVMS), as well as the comprehensive Javadoc specification of core JDK runtime classes. These two specifications are central to Java’s guarantees as they, rather than an implementation, define what it means to be "Java". They define the language and the runtime behavior in sufficient detail so that implementors can create independent implementations of the JVM and Java source compiler. In contrast, many languages have a standard but treat it more as documentation of what their implementation does than as directives for what it has to do.

Each JDK release is based on an accompanying update to the specifications. This regular revision creates two critical Java deliverables: a clear statement of the definitive behavior of Java and the JDK for a given release and a visible account of the differences in behavior between releases. This clarity and visibility are vital for anyone implementing and maintaining Java applications, middleware, and libraries.

Updates to the Java specification are more crucial than even new OpenJDK releases. It takes incredible care that each new feature evolves the specification coherently without breaking existing applications. Exhaustive test suites check the conformance of implementations to the specification. Critically, these tests are based on the specification, not the implementation.

Native Java has existed outside this specification process and so splits the ecosystem today. It is not intentional, but the split will remain if native Java’s evolution continues independently from the rest of the Java platform. Now there’s no avoiding that framework developers and library authors have to work hard to paper over the split. They do so by relying only on features that work in native Java, eschewing most uses of dynamic features like reflection, MethodHandles, or even dynamic class loading. This effectively forms a subset of dynamic Java. Coupled with the changed semantics of features, like Finalizers, Signal Handlers, or class initialization, the outcome is a growing divergence between Java on the dynamic VM and native Java.

And there is no guarantee that an application built with native Java today will build and behave the same way with the next release of native Java. Major behaviors — such as the --allow-incomplete-classpath change discussed earlier and the change from build-time to runtime initialization as the default — flip their defaults between releases. The choices are practical and pragmatic decisions to grow adoption at the expense of current users. These aren’t inherently bad decisions. But they undermine the stability of the native Java ecosystem as they undercut the promises of Java standardization.

Many behaviors — especially of critical features like build-time initialization — are still in flux for native Java. And that’s fine! Even dynamic Java is changing, too! What’s missing with native Java is a definitive statement of what should work, what shouldn’t work, and how that is changing. If the boundary of native Java just had wide margins, that would be ok. But we don’t really know where the boundary lies. And it is shifting in unknown ways.

This lack of standardization is not just a problem for framework and library authors, as these seemingly pragmatic changes affect the stability of native Java and its guarantees. Application developers feel this pain when they need to validate their application for every release — particularly its resource usage. Now dynamic Java also needs some validation of new releases. But that usually requires a highly specific user response and imposes only marginal performance costs. Native Java may need continuous tuning efforts or suffer deployment cost increases — a tax that developers will pay on every update.

Project Leyden has the mandate to address Java’s "slow startup time, slow time to peak performance, and large footprint", as Mark Reinhold said in April 2020. Initially,  its mandate focused on bringing the "concept of static images to the Java Platform and the JDK". Now Mark’s recent posting has reaffirmed the focus on those same pain points while recognizing a "spectrum of constraints" between the fully dynamic JVM and native Java. The goals of Leyden are to explore this spectrum and to identify and quantify how intermediate positions between a fully AOT compiled native image and a fully dynamic, JIT-compiled runtime can deliver incremental improvements to footprint and startup time, while still allowing users the option to retain some of the dynamic behaviors their applications require.

Leyden will extend the existing Java specifications to support the different points on this spectrum, including eventually the closed-world constraint required by native Java. All key properties of native Java — the closed-world assumption, build-time compilation, and build-time initialization — will be explored, given precise, defined semantics, and brought under the standardization process. While Project Leyden has only recently created its mailing list, explorations have happened around these themes across the Java ecosystem and the GraalVM community.

Bringing native Java into the existing Java standardization process through Project Leyden will provide those same solid foundations that allowed traditional Java, its ecosystem, libraries, and middleware to flourish. Standardization is the remedy to the technical debt growing between the native and dynamic Java. The incremental path outlined by Leyden will help ease the migration pain for all developers.

Native Java Needs to Be Brought into OpenJDK for Co-evolution with Ongoing Enhancements

This series of articles has demonstrated the benefits of native Java. It is essential for Java to remain relevant in the cloud. There’s a lot of good for the Java community in this. And yet, native Java requires massive changes in how applications are developed and deployed. As it exists outside the stability guarantees of the core platform and standardization process, it risks forking the definition of Java.

Meanwhile, dynamic Java continues to evolve in OpenJDK. There are major projects underway: Loom adds lightweight threads and structured concurrency, Valhalla introduces types that "code like a class and work like an int", Panama improves how Java works with non-Java code, and Amber releases smaller features that make developers more productive.  

These projects bring new capabilities to Java, increasing the dynamic nature of the platform through further use of MethodHandles, invokedynamic, and runtime code generation. They are designed to fit together into a coherent whole. Native Java isn’t yet part of this coherent whole. Bringing native Java into OpenJDK through Project Leyden enables a co-evolution of dynamic Java, its new features, and native Java.

This co-evolution is critical to the long-term success of native Java. Without it, native Java will perpetually lag behind dynamic Java. A recent example of such lagging support is broken JSON serialization of Java Records in native Java when annotations are used. But GraalVM currently also misses out on the opportunity to influence the design of new Java features. Minor tweaks and adaptations to the specifications can be the difference between an easy and efficient native implementation, an implementation that’s expensive in memory use and compile time, and something that simply can’t be implemented in native Java.

Native Java has been remarkably successful at tracking the platform so far. But it has only succeeded by adapting the Java platform and core JDK libraries using Substitutions: these are companion Java classes that modify classes to work with native Java. But they risk breaking the invariants of the code they modify. Substitutions are an eminently pragmatic solution when the original code can’t be changed. But they don’t scale. And they suffer from the same issues as "monkey-patching" solutions in other languages — powerful but dangerous. They can become incorrect due to changes of the classes they modify. A recent example was a JDK runtime Substitution that became invalid after JDK changes. The Quarkus team thankfully caught and fixed this. 

Bringing native Java into OpenJDK provides an opportunity to "do better", modifying the Java platform instead of using tricks like Substitution — updating not only the JDK class libraries directly but potentially also the programming model. It ensures that existing OpenJDK projects examine the whole platform — both dynamic and native use cases — when features are developed. And it ensures that applications benefit from native Java being a first-class citizen in the platform, bringing better solutions to both deployment models.

Conclusion

Java has been the dominant enterprise language for the past 20 years, building on the stability provided by its standardization process. Co-evolution between the language, the runtime, and the libraries, all historically chasing the rapid hardware improvements thanks to Moore’s law, has simplified the work of developers as they strove to eke the most performance out of their applications.

Native Java has risen to adapt Java for resource-constrained cloud deployments. But it now stands at a crossroads. It can continue to evolve on its own and risk diverging from dynamic Java with each release until it effectively becomes a separate entity with its own stakeholders, community, and libraries. Alternatively, native Java can join under the banner of the Java standards and evolve with the rest of the platform into something that benefits all use cases. This would bring stability to the capabilities of native Java and allow common deployment practices to emerge.

As Project Leyden begins taking shape, we expect it to become a place where dynamic and native Java converge toward a common future of fast startup and a smaller footprint for all Java users. Today, GraalVM continues to be the pragmatic choice for native Java. At a not-to-distant tomorrow, a single Java specification will determine what your program means when running anywhere across that spectrum from dynamic to native, independent of the underlying implementation.

 

This article is part of the article series "Native Compilations Boosts Java". You can subscribe to receive notifications about new articles in this series via RSS.

Java dominates enterprise applications. But in the cloud, Java is more expensive than some competitors. Native compilation with GraalVM makes Java in the cloud cheaper: It creates applications that start much faster and use less memory.

So native compilation raises many questions for all Java users: How does native Java change development? When should we switch to native Java? When should we not? And what framework should we use for native Java? This series will provide answers to these questions.

About the Authors

Rate this Article

Adoption
Style

BT