Oracle has announced the 1.0 release of GraalVM, a polyglot virtual machine and platform. The initial release includes the capability to run Java and JVM languages (via bytecode) as well as full support for JavaScript and Node.JS, with beta support for Ruby, Python, R and LLVM bitcode.
The overall platform comprises a number of components:
- Graal - a JIT compiler written in Java
- SubstrateVM - a lightweight wrapper to abstract away the execution container
- Truffle - a toolkit and API for building language interpreters
The overall intent is to provide a polyglot language execution environment that can be embedded within another execution container - either an OpenJDK container or another possibility, such as within an Oracle or MySQL database.
InfoQ spoke to Thomas Wuerthinger, research director at Oracle Labs, for an in-depth interview about GraalVM.
InfoQ: Can you explain the history of GraalVM? Where did it come from?
Thomas Wuerthinger: The big idea was that a compiler didn't have to have any built-in knowledge of the language semantics. The reason for this was that it was very clear from the beginning that groups within Oracle had interest in a very diverse range of languages including JavaScript, R and Ruby.
There were two key areas we felt we could improve upon with some exploration:
- The common belief of VM architects throughout the software industry at that time was that a language VM needed to be designed specifically with the semantics of the language in mind if you wanted to get optimal performance.
- The Java architects worked to support multilingualism starting in Java 7 by conversions to Java bytecode by adding `invokedynamic` to the JVM specification. The GraalVM team felt that since Java bytecode had too much Java semantics built into it, a longer term effort could be much more efficient and a 100% compatible implementation at that.
We also had the goal of demonstrating a fully-multilingual engine with competitive performance, and we started off some collaborations with universities to work on other languages.
A group at Purdue did an implementation of R, and a group at UC Irvine did an implementation of Python in 2012-2014. A student intern worked on Ruby as a thesis project, and it had enough weird and interesting properties that we thought it was a good proof point that our multilingual compiler was fully general. Later we brought those language projects into Oracle Labs to get an industrial-quality implementation for those languages.
With the release of GraalVM 1.0, we feel that we have demonstrated both that a multilingual VM with competitive performance is possible, and that the best way to do it isn't through a language-specific bytecode like Java or Microsoft CLR.
InfoQ: What are the key technologies that make up GraalVM? How will they help developers?
Wuerthinger: The key technology idea in GraalVM is called "Partial Evaluation", which is a technique for converting interpreters into compilers automatically. This allows you to build new languages very quickly. Language developers can focus on just building an [Abstract Syntax Tree] AST interpreter, and do not have to worry about writing a code generator or other low-level runtime functionality like a garbage collector.
A key technology we built into GraalVM is logical/physical data separation. This allows any memory data layout to be made to appear like a local language object. Most VMs have a specific layout for objects (with object headers often called "boxes"), and to have the semantics of an object in the language, your data has to have the same layout.
This means that you have to allocate memory and copy data to be an "object" in most VMs.
In Graal, you can have zero-cost interop between languages, because a JavaScript object can be made to behave like a JavaScript object or an R object or whatever, without the need for additional object creation & data copies.
This also allows, for example, exposing data from a custom binary format in the form of high-level objects to a program without ever actually allocating those high-level objects.
Another key technology is our LLVM interpreter. We have a GraalVM interpreter for LLVM bitcode, which is the output of the popular LLVM compiler, which works well with languages targeting native binaries (like C++, FORTRAN, Rust, and so forth).
One of the key problems in implementing languages is that they usually have libraries that are implemented in native code (usually C/C++), and they use an API that exposes some amount of the language internals. So, if you want to be compatible with a language ecosystem, you have to support its native libraries.
One of the problems with implementations like JRuby or Nashorn is that they don't work for modules containing native parts, which is a problem for compatibility.
A final and separate technology piece of GraalVM that was added later is called SubstrateVM. The idea here was to make VMs embeddable on a mix and match basis. A conventional language VM like Java Hotspot wants to control everything: which threads are scheduled, how memory allocations are parcelled out to those threads, access to the OS, etc.
We wanted a technology layer that would allow an underlying system to provide some of those services, with SubstrateVM making up the missing pieces in terms of what is needed to support a dynamic JIT compiler (like a garbage collector). When you are running in a database, for example, the database wants to control memory management, thread allocations, and so forth.
SubstrateVM is the "wafer-thin" VM that can sit on something like a database, so when you run the Graal compiler there, the database can specify the maximum heap size rather than the Java mechanism, and the database can manage code artefacts rather than the JVM class loader, and so forth.
One of the important ways the SubstrateVM operates is to do ahead-of-time compilation on Java code (like Graal) directly into a native binary, using a closed world assumption (all the classes to be used should be known at compile time). So, when you deploy the Graal compiler in a database, we take Graal and some set of GraalVM languages and compile them into a native shared library, and they don't have to be compiled at runtime.
InfoQ: What is Truffle? Why is it needed?
Wuerthinger: Truffle is the API you have to write a GraalVM interpreter to, when you are building a new GraalVM language. The way interpreters work is that they first parse source code into what is called an Abstract Syntax Tree or AST. For example, if you wrote code like:
c = a + b
then the AST would contain two nodes, one for the add operator and having child nodes ofa
andb
, and a second one for the assignment having child nodes ofc
and the add operator node.The interpreter internals like the AST need to be inherited from Truffle interfaces, and if you do so, the Graal compiler will watch the activity of the language interpreter to learn the semantics of the language, so that a compiler for that language can be generated.
InfoQ: One of GraalVM's most visible components is the Graal compiler - a Java-based JIT compiler, just known as "Graal". Can you speak to some of the difficulties that are present in C2 (the existing C++ JIT in OpenJDK) that Graal is looking to move past? Is it expected that Graal will replace C2 / Tiered Compilation as default in the mainline OpenJDK?
Wuerthinger: The Graal compiler is an experimental compiler in Java 10 and Oracle is optimistic that it can be the default in a future release.
The big advantage that a compiler like Graal has is that it first compiles itself (since it is written in Java), and that makes it possible to write Graal in a much higher-level way than working directly on the AST in C++ as C2 does. For example, this makes it easier for Graal to do more aggressive inlining and object escape analysis, which allows more object allocations to be removed from the program automatically. In general, the more abstractions your program has, the more the Graal compiler helps you, so you see the biggest performance benefits in things like the Stream & Lambda support in Java8 or in Scala.
InfoQ: Oracle is heavily marketing GraalVM as a polyglot VM. How does the content of the 1.0 release differ to what was shipped as part of Java 10? Should, for example, JRuby developers choose one release over the other? Is GraalVM seen as a separate project, or a part of OpenJDK? Will future releases of Graal be tied to the Java release cadence, or is it seen as independent?
Wuerthinger: GraalVM and Java are 100% independent projects with large open source presence and with separate release cadences, and they both grab pieces of technology from each other. Java 10 is grabbing the compiler from GraalVM (as an experimental option), and GraalVM inbounds the bulk of Java 8 as one of the supported languages. GraalVM inbounds lots of languages, not just Java, so we use technology from Node.js, JRuby, and from the GNU R implementation, among others.
Like most technology producer/consumer relationships across organizations, the consumer waits for new technology from others to stabilize before including it. Similarly, The Java Platform Group has to be conservative about including new technology until they are sure it won't break existing deployments, and they have lots of backwards and forwards compatibility concerns, so there is going to be some lag.
InfoQ: IBM's OpenJ9 runtime takes a rather different approach to GraalVM. Can you explain the main differences between OpenJ9 / OMR as you see them, and speak to what makes GraalVM unique?
Wuerthinger: OpenJ9 is trying to speed up existing single language runtimes (like Ruby MRI) by adding some limited just-in-time compilation functionality to them. The advantage of their approach is that it is easier to fit into an existing runtime while staying compatible. The major disadvantage however is that the benefits are much more limited.
First, the performance benefit of their Ruby prototype is very small compared to the order of magnitude Ruby running on GraalVM achieves. The reason is that GraalVM works with much more profiling and speculative optimizations while OMR keeps the data structures of the existing interpreter.
Second, they require you to write a code generator and cannot automatically derive the compiled code from the interpreter via partial evaluation.
Third, there is no built-in physical/logical data separation.
This leads to most important fourth point: there is no ability with their approach for efficient language interoperability.
OMR can give marginal improvement to existing single-language runtimes like Ruby MRI that are not well-optimized already. GraalVM uses a completely new way to write languages with an order of magnitude improvement and polyglot capabilities to run any set of languages with zero overhead interoperability.
InfoQ: What about wider community involvement in Graal? Twitter are on record as being early adopters and running Graal in production - who else is already using it? What has been your experience of working with non-Oracle developers? What does the future development of Graal look like? Will it adopt a similar model to OpenJDK as a whole?
Wuerthinger: OpenJDK has a very different mission than GraalVM.
OpenJDK has a large surface area for developers, since it is creating new language features and new libraries all the time. You have to change your code to deal with many of the changes to OpenJDK, and the mission is to get community consensus on which changes are worthwhile. The OpenJDK community is very large and there are a lot of opinions, and their process helps get to community consensus.
GraalVM is about making code in languages and libraries that already exist run more efficiently, in more places, and interoperate more, and not to change language semantics (for the most part). The only major piece of developer interface it has is the interoperabity API that allows you to invoke foreign language functions in other runtimes, and we try to stay below the waterline as much as possible.
The community for each language that GraalVM supports comes to its own conclusions on how it should evolve.
As with OpenJDK, GraalVM has a lot of great contributions from vendors like Red Hat, Intel and AMD that are quite substantial. Red Hat took the initiative a while ago to build an ARM backend for GraalVM without much input from us, and that turned out to be very useful for us. You may be aware that Oracle invested in an ARM vendor recently, so now we are paying more attention to that ecosystem.
Those vendors believe that GraalVM will get enough uptake from others to warrant the investments, and their interest isn't primarily for their own usage. As you point out, Twitter has been contributing to GraalVM because they are using it themselves. There are some other major technology vendors using it that aren't going public the way Twitter has, and we have internal users at Oracle.
InfoQ: Oracle is usually reticent to discuss future roadmaps. Can you give us any indication of where the development track of GraalVM might take us in the next 18 months?
Wuerthinger: Well, the GraalVM team is in Oracle Labs, so we have kind of the opposite problem where we have lots of research projects that are public that may or may not go anywhere beyond a paper in an academic conference. Oracle Labs has a cloud of graduate student collaborators looking to try out ideas using GraalVM.
We have one student who was thinking about linking GraalVM into Chromium so that you could use any language in the browser and not just JavaScript, but I think it is very clear that it is unlikely ever to be released as an Oracle product. You can certainly look for academic papers based on GraalVM to see some other crazy ideas. We've done investigations into some other languages as research projects that may or may not go anywhere, based on community interest.
What I can tell you about the roadmap is that we are working on supporting more platforms. The initial release was just for a Linux/X86 and Mac/X86 to minimise the "release stress". We are actively working on Windows support and also on support for the ARM platform (together with Red Hat). There is also quite some stabilisation and completeness work necessary for the languages that are currently marked as "experimental" i.e., R, Ruby, and Python.
In terms of integrations, there are a number of ways that we are going to integrate GraalVM more deeply into MySQL and the Oracle RDBMS that you will see in the next 18 months. But how GraalVM will look like in 18 months will also very much depend on the overall community feedback and contributions.