BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations How the HotSpot and Graal JVMs Execute Java Code

How the HotSpot and Graal JVMs Execute Java Code

Bookmarks
49:38

Summary

James Gough explores the subsystems that are involved in interpreting, compiling and executing a Hello World Application. He dives into JIT compilation and the arrival of the JVM Compiler Interface to explore how optimizations are applied to boost the performance. He discusses HotSpot, explores Graal and the JVM ecosystem to discover performance benefits of a platform 25 years in the making.

Bio

James Gough is an executive director and developer at Morgan Stanley, where he’s focused on building customer-facing technology. A Java developer and author, he first became interested in Java during his degree program; after graduating, he became a member of the London Java Community. He is a regular conference speaker and spent four years teaching Java and C++ around the world.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Gough: My Twitter handle is @Jim__Gough. I'm a university graduate from Warwick. I was really interested in compilers and performance. That was it. That was where it ended. Then I started going to the London Java Community. I did a variety of different things in the LJC. One of those was helped to design and test JSR-310, which is the Date, Time library. I was also a developer and a trainer for a couple of years. While I was doing that, I met someone called Ben Evans. We wrote this book together called, "Optimizing Java." Through the education at the LJC, I went back and wrote something that was really an interest of mine. It was a really cool book to work on. My day job, I work on API gateways and do things with API platforms at Morgan Stanley. I'm also an occasional Maven hacker as well. If you see various banter around Maven and Gradle on Twitter, that's why.

Exploring the JVM

What about this talk? There's actually been some excellent talks about GraalVM and how it works for natively creating images, and compiling images, and getting really good startup times. I'm going to try and take it from a slightly different perspective. We're going to try and look at this from the existing JVM that we use today and that we know about today. We're going to look at the different pieces that are involved in the story of executing Java, and where Graal and ahead-of-time compilation comes in to start to complement that.

Building Java Applications - A Simple Example

This is the area that we spend most of our time, writing Java source code. Not many people use Javac anymore. It's hidden away behind all the build tools that we use. Fundamentally, you're creating a class file. Let's use a really simple example. This is what's going to run through the talk. We're going to use a very noisy Hello World. There's not too much Java code to understand in here. I think the point was made earlier in the containers' talk, it doesn't necessarily have to be Java to benefit from a lot of the performance benefits at the VM level that we will talk about. I have a public static void main, which says, printInt. It does that 1 million times. In printInt, I print out to error Hello World, a number. There's a couple of things in there. Some people who've played around with this thing before will be able to spot why I'm doing that. I will get into the reason for that being so verbose, as we go through.

You take your Java source code. You run Javac, and you get a class file. You can inspect what are the contents of the class file using Javap. We're not going to go in too much detail. There are some good talk things on bytecode. Inside Hello World, the first thing that's happened is Javac has effectively added a default constructor. The default constructor running an invokespecial on the object in it. That's something that Javac has added for you into your class. The next thing is the main. There's lots of different bytecode examples here where we load the integer, store it, do a compare against a million. If it's less than a million, carry on going and invokestatic. We still have gotos. All that time when we thought we got rid of gotos there, they're still there. The point of this slide is to really show you that Javac as a tool doesn't do very much. It just takes your code and converts it to bytecode. There's not a lot of magic compilation that's going on behind the scenes like there would be if you run something like GCC. We will look at the printInt method because this is curious. We get a hold of the PrintStream. We do an invokedynamic on make concat with constant. That's going on behind the scenes. Then we invoke the virtual function in println to print out the new variable.

That's curious, because if we were to take that away and look at the same example in Java 8, you'd actually see different bytecode being created for the same example. If you're actually looking into the bytecode for Java 8, you'll see, we're creating a new string builder. We're then loading in the Hello World. We're invoking the Add on the string builder. This is weird. How many people have done the plus operator inside a for loop, and the first thing you've seen in a code review is, you should replace that with a string builder because string builders are better there, because they're better performers? You only create one object and you won't have lots of strings floating around. That's weird now because that just doesn't hold true at all. Where you potentially would get better performance from 8, 9+, maybe you've limited yourself. There isn't much that goes on in Javac, but as the platform evolves there's different things that will actually appear here.

Classloaders

We've got our class file. Everything that happens outside the JVM. We now want to effectively get it into our environment. We use the classloader. The classloader plays an important part in our story, because it's generally not known what's going to happen when the JVM starts up. In Java, classes are loaded just before they're needed. You've probably seen this when you've seen a ClassNotFoundException, NoClassDefinedError. Or maybe you don't see that because your build tool is amazing, and it hides that away from you. You still see that coming out. What happens is that class file, it gets mapped into a Klass object. This is the C++ representation. The instance methods are held in the klassVtable. Then you get static variables initialized once and held in the instanceKlass. That's an interesting representation. The main thing that we're doing here then is taking what was in your class file, and mapping it into what is effectively a cache of your methods and your static data that you will then run in interpreted mode.

Demo

One way I want to try and show you that classloading is a little bit magical, is actually by writing a custom classloader. This is something that you can do. There's various different implementations that people have of this. This is something I've got on GitHub. It does a couple of weird things, actually. This is deliberate because I used to use this to teach Java classloaders. What the functionality is, is you take a Java file and you drag it into the classes' folder. Inside the classloader, it uses some of the tooling to actually run Javac from inside the Java process. It takes the .java file. It compiles it to a class file. Then it loads it into your system. You cannot know anything about the class. That's the thing I'll try to get across. You can't know anything about it.

Let's have a look. We'll hit play. I've set up this watcher here. It's watching the file. I've put some reflection code in here, of course, because it wouldn't be a Java demo without a little bit of reflection. We do demo and ClassNotFound. What I'm going to do is I'm going to take this demo.javafile, which is just effectively getA returns 42 and a setter. I'm going to drag that into this classes' folder here. I've attempted to load demo.java. What is this thing? Then it's compiled it to a class file, and now the class is loaded. If I go in here and type in demo, I can now see that I've got getA and setA. I showed this as an example in a presentation and somebody said, "That means I can write Java code and put it into the JVM at runtime, and nobody will know anything about it from production controls." It's like, maybe you want to step away from that slightly. It's really just created the .class file. Javac is not that smart. You can use it in lots of smart ways. There's more to the story. Actually, the biggest part of the Java story starts after classloaders.

Interpreting Bytecode

If you were to go to a talk 20 years ago, maybe a bit longer, about Java, you'd probably be in one of the talks where Java is really slow. This is where it got its reputation for being quite slow, was that it was originally just an interpreted language. Lots of work has gone on in the background to improve that performance. We'll talk about some of those techniques as we go along. Initially, your bytecode, fully interpreted. I'm going to say fully interpreted now because there are so many options to change that behavior. In the default JIT compile JVM, it's not this fully interpreted bytecode to start with.

You have this conversion of instructions to machine instructions, which uses a template interpreter. There was a great talk by Alex Blewitt about HotSpot under the hood, which goes into a lot of detail if you're interested in how to take bytecode instructions and convert them into instructions on the fly. The idea is that you don't spend time compiling code that you only use one or two times. Think of traditional applications like Spring, where when you start up the application it's doing lots of reflection. It's trying to boot up your application. A lot of those methods may be just used a handful of times. There's maybe not any benefit to compiling them at the beginning. We'll talk about the benefits you get from compiling everything upfront versus at runtime. There's a trade-off as with everything. There's no free lunch. Interpreting allows us to monitor the behavior of our application, and make decisions on how we're going to compile and how we're going to apply some critical optimizations, which are going to improve the performance of our Java code.

The HotSpot Compiler

That's where it gets interesting. We got this information, so what do we do with it? There's the JIT compiler. The JIT compiler, or C2 compiler, if you're going to be specific for the HotSpot example, observes the code as it's been executing. Don't confuse this with complicated profiling because there's lots of profiling that exists. This is purely tracing through the invocations that you've done. It captures some various different counts. Then you trigger a compilation when you meet a threshold. Everything is done on a method by method level. We'll explore that through one of the examples that we'll look at. Essentially, you're trying to find the hot points in your application where you'll benefit from potentially bringing some of the bytecode together, eliminating some pieces before you actually bring out your compiled instruction at the end.

Utilizing that profile is really key. At the beginning, as we've just seen with that example, Java doesn't necessarily know anything about what you're trying to do. That's intentional, because things that pull in from anywhere, it generally doesn't know. We want to emit, essentially, our machine code instructions to replace the bytecode that we started out with. Applications running bytecode has been interpreted. If something is being used a lot, you take that to one side. You asynchronously dispatch it onto another thread, and say, "Compile me this method." It goes in the background. It compiles it. When it's ready, it will do some pointer swiveling, and it will replace that in the running application. Historically, our main option for JIT compilation has been the C2 compiler, which is the second level of compilation. We're talking about server mode compilation. It's implemented in C++. The engine that's doing all the compilation, historically, has been written in C++.

Challenges with the C2 Compiler

There are challenges with C++. We've got to paint a little bit of a picture here, because we've got quite far with what we have. What things do we see when we try and build C++ maybe from 20 years ago, as an example? You've got this problem that is unsafe. If you ever worked with C++, and you go and accidentally reference something, or you remove something, you end up in a situation where you get a segmentation fault. I'm not going to say you don't get those in Java, because I certainly remember times when you do. Actually, that's probably down to partly the unsafe nature of the language behind the scenes. Partly, that's because there's a mixture of the pointers that are used in C++ and within the compiler, and an Ordinary Object Pointer or an OOP. That's what's used to represent an object instance in C++ that's actually allocated in Java, essentially. It means it's managed by the garbage collector. If it's managed by the garbage collector, and you then go and delete it, and then Java tries to delete it, things go really bad.

This dialect of C++ is also legacy. This isn't C++ is legacy, because I know Stroustrup quite well and he'll probably tell me off for that. What I mean by legacy is this dialect is very difficult to maintain. It's not like modern C++ 11/14/17. It's much older than that. There is actually custom memory allocation, Malloc process that are being written in the background for the libraries. Again, the legacy points to the [inaudible 00:14:36] mixture of trying to take those two languages and put them together. Also, the tooling is a little bit tricky with C++, especially this old dialect as well. That combines to make it difficult to make changes. What we're actually saying is a story that happened 20 years ago, when we started using Java in the first place. Java was always this blue-collar language to replace C++, or help with those problems that weren't needing deterministic performance. Actually, now we've got similar problems inside the JVM itself.

One thing to say, though, is it's been built upon over 20 years. It's really awesome piece of engineering. Now there's a couple of other things. When we originally had Java and the JVM, it was all about Java. There are so many other dynamic languages that are running now on the JVM that changes the story a little bit and some of the stuff that we want to be able to alter, it becomes very difficult. The other side of this is that Java is now fast enough to be a compiler in its own right, which is also cool.

Evolving the JIT Compiler

How do we get this into the application? This is great. We've got the C2 compiler. What do we do here? This is where this idea of the JVM compiler interface comes in. The JVM compiler interface is a mechanism that allows you to replace the compiler within the JVM. This is where we can now potentially use either the C++ one, or we can use something written in Java. That's where we start to see a bit of a difference in the story that we wouldn't have had if I was presenting this three or four years ago.

JVM Compiler Interface (JVMCI)

What does the JVM compiler interface actually do? It gives you access to the VM structures so that the compiler can see the fields, the methods, and that profile information that you've captured. Then it gives you a mechanism which you can use to install the code that you compiled using the compiler. It's interesting, because when you start to think about, what is a compiler? What does the interface actually look like? Fundamentally, it's a method that takes a bytestream and returns a bytestream. Its interface is simple. What it does behind the scenes is then fairly interesting. What the idea is, is that you're producing machine code at the method level. There's a use of JEP-261, as well, the modules, to do some of the security and isolation parts, especially with the Graal implementation. Security becomes an interesting point when you start saying, I'm going to just plug in some code and run it on Java. That's for obvious reasons.

Graal as a JIT

What about Graal? We've been looking at Graal as a VM. What I'm going to show you is we're actually going to debug through some Graal compiler code, so you can see the part of Graal that we use here, which is basically the Graal compiler to generate these machine code instructions. Fundamentally, in terms of how JIT compilation is working, it's not doing anything super revolutionary like the structure. You may have heard the term Sea-of-Nodes, and the idea that you take a graph that represents your program as it is running. Then you do optimizations to that graph, essentially. Then you take the output graph and emit that into the actual machine code itself. It uses a combination of, what's actually going on? What does the profile look like? It runs through a series of phases. Then you end up with your output code, which is a lot quicker than being interpreted. Implementing a JIT in Java is actually a fairly compelling story. You've got language level safety and expressions. It's pretty easy to debug, which I'm hopefully going to show you is quite easy to debug. There's also really good tools and great IDE support because, ultimately, it's using all of the Java stuff that you probably use for your day-to-day development.

Getting Started with Graal

How do you get started with this? I'm going to write a post on all the different steps that you would have to take. Effectively, there's a command line tool for Graal called mx. What the mx tool does, is it effectively allows you to run builds of the Graal compiler in the example that I'm using. I've pulled in the GraalVM project locally onto my laptop. I'm working specifically within that Graal compiler area. What you can do then is you can run mx build, which will build you the Graal compiler and give you a JAR file. You can also use the IDE in it. If you're going to do the debugging stuff, that just makes it super easy. Then what you can do is you can run mx -d vm. What this is basically saying is I want to run a VM, and I want you to install the Graal compiler that's local. You'll actually see then it replays and puts it in.

I've got a bunch of options, of course. I'm unlocking the ExperimentalVMOption. I'm going to enable the JVM compiler interface. I'm going to use the JVM compiler interface as well. That's going to give me, essentially, the JIT running in Graal. If you were to run that on OpenJDK 13, you'd be running the Graal compiler in the background, because that's all shipping with Java now. Actually, just to limit the scope of what we're doing, I'm going to say, I only want to compile HelloWorld, and system/error/println. There's a reason for that. I'm going to show you why that happens when we finish doing the debug. I'm also parsing in a parameter to say, dump out everything that Graal is doing. That's for another tool that we'll look at, too. There is a tool that also ships. It's an Oracle Enterprise tool called Ideal Graph Visualizer. This allows you to observe the compilation, Sea-of-Nodes, at the different phases that you go through.

Exploring Inlining

We're going to explore inlining. If you've not heard of inlining before, in C++, there's a keyword called inline. It is up to the compiler, whether it starts to pull things in together. The inlining mechanism in Java is different to supplying a keyword, and it's different to the inline keyword as well. What it does is it allows you to explore the current execution chain, and look at opportunities to bring those methods together. If you think about what happens in our example, we call printInt that moves up the stack frame pointer. It has to have all the locals, the return addresses, and the parameters. Then when we call println, it does the same again. You end up invoking and expanding your stack frame. Then when you come back, you essentially collapse those away. Looking for opportunities to inline is one of the biggest performance wins that you can get in a Java application. Then, effectively, you end up with the bytecode or the machine code that comes out of the altChunk together.

Live Debug

I'm going to fire up this debug example. It just has exactly the same output that's on the slides just so I don't end up typing something completely wrong. It's now waiting for my IDE to connect. I'm just going to go over to IntelliJ. Into here, just so you know, the Graal compiler doesn't have lots of arbitrary system.out.println in. That's me adding things into here just to show as an example. This is the first thing. I've got a compilation request. The JVM has gone, there's something interesting here for you to compile. We can see down at the bottom that that's the printInt function. The printInt function is effectively the one that we're looking at. If we look at our bigger display in output, that's effectively what's happening there.

We'll just spend a couple minutes exploring this just so you can get an idea of how you can go in and actually just debug the compiler. We can step into the compile method. What really happens then is there's a bunch of startup that goes on. We have a check to see if we're shutting down. We grab hold of the method. We wrap up the task inside a compilation task. We do some setup and some debug. Then, eventually, we get down to actually running the compilation. If we actually look at the thread monitoring on the left here, you can see that I'm in the JVMCI compiler thread. This is separate to, essentially, the running application. It will be paused, obviously, in the background while I'm doing this, because it's just debug mode. If you run your application normally, it would still be interpreted even though you're asking for it to be compiled. You can have a look in here. We grab hold of the runtime compiler. We step through this a little bit. It has a look to see if this is on-stack replacement. We go through here, and we're just effectively creating the wrapper for running this compilation.

We'll just go into this next piece, because we start to get to something a little bit more interesting. We'll skip over this part into the actual compiler. This is where we start to see the first part of what I was talking about. The first thing that happens is we create that graph. The idea of the graph that's basically the bytecode and how it all connects together. We grab hold of that. That's the first thing we do. It takes a little bit of a while to process that piece. We get a compilation result. Then we can go into the actual compilation here. The thing that's interesting then is the actual compilation phase. We'll just look at inlining in detail and not the pieces around it. Here, you can see that there are the main things that actually get processed. You got this idea of suites. The compilation suites are effectively the different phases that we're going to go through. We've got this idea of high-tier, mid-tier, and low-tier. I'll show you a slide that will expand out on what each of those things do. We have our profiling info. That's the profile information that we mentioned that it needs in order to actually do the compilation. Then we've got some options here as well. You actually see that if we step through this, it effectively goes through and it applies all the different phases. Actually, if you were to go into your IDE and just type in phase, you'll see all the different phases that are there. The one that I've highlighted for us to look at is going to be the inlining phase.

The interesting thing here is that before we go to the inlining phase, there's two things that are happening. There's this idea of frontend compilation and backend. The frontend compilation is basically taking that Sea-of-Nodes and canonicalizing it, applying all the optimizations. Then you get the nice tree that's what you want to compile, effectively. Then the backend emitter is what then converts it into the target platform that you've got running. For me, that will be the AMD Mac one, essentially. If we go on, we'll see that we get into inlining data. That's actually inside the inlining phase. You can go and have a look at the phases and what they're doing. We can have a look through this and see what actually goes on. We go into effectively looking and traversing through the different nodes that are available. We ask if something was inlined as a build check. It's interesting, all of the behaviors within the actual data node itself, which is the inlining data, is nice and encapsulated and all those other good things that we wanted to see.

If we pop into here, we can have a look and see that it basically goes through. It captures what the current invocation is. As we go through this function, it asks if I should go into the next invoke. What you end up doing is going through, having a look, and then I capture what the CallSite holder is, and then what the invoke target was, as well. If you've got some info, as part of this, if you get inline info, then you can potentially look at doing the inlining. What I'll do is I'll just press play here, just to allow it to go through. We'll just have a quick look at the output. We were missing the backend there, and it installs it into the code cache. That's just returning out. Then we can actually go and look at the Hello main, just very briefly.

What we can see here is that we step through each of the different phases. We looked at println. We looked at printInt, but we made a decision not to effectively inline at this stage. That decision is an interesting one, because it's going to probably have more effect if we inline it directly into main rather than trying to do some inlining here. Also, the other thing that's interesting is println itself can't be inlined. It's why I can't do anything at this stage. One of the things that you're seeing here, you might be wondering, where's all the rest of the stuff? Because I've told the compiler to limit to just those two things, it's not actually going along the println chain, which has a load of other buffers and other things that we could inline.

We're just going to skip through the compilation pieces. If we jump back over through this then, we end up in a situation where we actually have something that we can inline. It is essentially what you get as it goes through, it can do the inline. Then it will start to process that and output the corresponding bytecode. You get the decision then from the policy that it should inline. Then behind the scenes, it actually goes in. It canonicalizes the nodes and then pulls all of that chain of methods together.

I think I've left the last debug in. If not, we'll return to the slides which shows the machine code itself. This is where we're then emitting the backend. We've got the AMD 64. That's the HotSpot backend. If we step through that, we should be able to see the compilation result. Inside here, you can see all things like the source mapping, the data section itself, which has got inside it all of the different pieces. You can actually go in and you could go and look at all those things. It's quite nice.

However, I should have started this by saying you probably won't want to do this that often. You probably may just want to see what decisions the compiler has made behind the scenes. I have another script in here, which does the example-inline. I've taken off the compilation limit. It's going to go through and effectively run the whole thing. If we do example-inline, and we pipe that into a less, you'll see that actually you start to invoke a lot more. You've got main. You've got PrintStream. You can see that list going down. If you run this with no-limits turned on. I've turned off like, don't just compile this stuff, just go knock yourself out. You actually start to see something quite interesting. You end up in a situation where all these odd things start to get compiled. They're not really that odd, because we're familiar with them in Java. I certainly didn't have any HashMaps in my application. What's going on there? Actually, if I go have a look and see if there is any Hello World. There's nothing. It's still my Decided to Inline things, but that's for something else.

The reason why this is happening, and the reason why you see all this is because Graal is actually compiling itself. One of the things that you sometimes see when you're using Graal as a JIT compiler, because it's in Java, obviously, it's interpreted. If it's interpreted, you need to compile it to make it quicker. What does it compile? It compiles itself. You end up with this slowdown at the beginning while the compiler is compiling itself. Actually, it never got to compiling Hello World for me, because it ran out of time and the program had finished interpreting. Just to not keep you hanging around, I actually have this one way, I've just increased the thing to 10 million. When you go to 10 million, and you go and have a look, then eventually, we do actually get our Hello World. You start to see all of the different pieces getting pulled in. You've got StringConcatHelper, stringSize, mixLen. You've got the probability. All of those get folded in. You do pay this cost of having to start up because it's Java. It's slow, at least when it's not compiled. That's the key with Graal as the compiler. There are going to be some tricks that we can look at that maybe make that trade-off a little bit easier for you.

Compilation Tiers and Phases

What do these phases look like? We have this high-tier, mid-tier, and low-tier. You can actually go and step into those. We're going to focus on the high-tier. One of the things I'm planning to write is an exploration in more detail into each of these phases, and actually stepping through some of the graphs that you can see. I will show you the graph that came out of the application when we go back out of this. We've looked at inlining. There's a couple of other optimizations that are worth knowing about. We'll explore those now. One thing that's interesting is that you have a lot of canonicalization phases that happen just to rebalance the tree into a format that is known before it goes through the phases. You also see dead code elimination start to feature twice. Dead code elimination is really interesting.

Dead Code Elimination

What dead code elimination does, is not like when you've written something in Java, and it goes, "You can just delete that because you did something stupid." It's a lot more around the idea that, at runtime, you maybe never need that branch of code. Or, you may never use that method is another example. If you just compile that, it will potentially make your compile part of your application much bigger than it needs to be. Alex did a great talk about low level optimizations of CPU. What you can sometimes get with by pushing in a new function that you didn't know about, is you can change the way in which your application profiles. It's nice that what you see in Java is that anything you're not using, anything that's irrelevant, any branches that you don't need will be removed. That removing of branches decreases branch prediction, and all these other things. It actually ends up having a really good effect on your overall application. You could do this twice. Once at the beginning, then once when you've done some of the other factors later. This is determined at runtime. It is determined by that profiling information. That's why that profiling information is so key.

Loop Unrolling

Another one is unrolling loops. Loops themselves require back branching and branch prediction, as well as the actual overhead of doing the counting and everything else. What if you could unroll those and make your application simpler? When you use an int, and you iterate over a for loop, you can actually take that for loop when you compile it, you can remove all of the iteration and just lay the code out linearly. You end up with all of the instructions just together. That then removes a lot of having to back branch your application, or CPU and prediction having to work really hard. The other thing you can remove is safe point checks. Safe point checks are triggered within the JVM when it's looking to do GC, for instance. It will always do that when your application is in the safe state. The safe points also have a big overhead. If you can remove those, that's a good thing.

There's a little benchmark here. This is straight out of "Optimizing Java." Where if you are running an int through the same code, you end up with a throughput of 2400 operations per second. Basically, it's the same thing but it's using an int and a long. Whereas if you're using a long, that drops off to 1469. You lose a lot of performance there, just by how you create your loops. I was talking to a guy called Chris Newland about loop unrolling. I was kind of, "I'm not really sure. I see the benefit with the benchmark, but how complicated can it get?" He was saying that he'd actually seen a double nested for loop that was doing ray tracing that had actually been loop unrolled. There can be some really heavy things that it gets down to doing.

Escape Analysis

Escape analysis is a really interesting one as well. Escape analysis itself was introduced in versions of Java 6. It's been around for a little while now. It analyzes code to check if an object ever leaves the scope with a method or is stored within some global variable. The nice thing about escape analysis, is it goes back to that age old interview question of, in Java, where do you allocate an object? Obviously, you could just say heap and you're probably about 80% right. Now there's so much more of a story to that. What escape analysis does if your object is small enough is it'll actually stack allocate that object. Going back to the C++ style of automatic memory management, once the stack frame goes away, the object goes away. Why would you want to do that? One of the biggest subsystems in the JVM is GC. If GC is under pressure, as it often is by lots of objects that are created, and actually lose reference and die quite quickly, you end up having to do more GC cycles. If you can stop putting the objects there, you can potentially prevent the GC cycles. Obviously, going on from that interview question now you can talk about Valhalla and other things as well, because that makes the conversation even more interesting if you're in an interview. It's a nice optimization that most people just got for free. One of the things that I've certainly taken away from this conference is the more you're on the latest version of Java, the more performance benefits you're getting, essentially, for free.

Monomorphic Dispatch

The other one that's inherent in Java and C++ is this idea of when you call a function in Java, the function is always virtual. What I mean by that is, because it's an OO language, it has to look up where the corresponding implementation is. This is because Java is an OO language by default. Object is always at the root. You always have to look up to see where the function you want to invoke lives. In C++, you can do static binding, which means it's there. It just calls the function at the memory address directly. There isn't this intermediate lookup. HotSpot, because we do that all the time in Java, one thing you want to do is optimize that away. You don't want levels of indirection that as soon as you're indirecting a pointer, you're losing locality. It's difficult to do things with. What HotSpot does is it's able to look at the V-table structure, which is a data structure, which does effectively pointing to implementations. Once it's done it the first time, it can actually collapse that to be a direct call. You can just go, it's over there because I can infer that from the way your application is running. It's interesting because it requires the idea of tracing. Sometimes it gets this wrong. Classloading and the magical features of classloading mean that you can end up in a situation where that's an optimistic optimization, essentially. What the JVM can also do, which I like about the JVM is if it realizes that it's done something that's no longer a decent optimization, it can revert back to interpreting and recompile within your application itself. It has a back out if it's not sure. The other thing that you can also do is bimorphic dispatch. If there are two CallSites, it's able to do that. Once you get to megamorphic, you lose optimization there. There's diminishing returns as you go through. People have worked on trying out trimorphic as well.

Code Cache

What happens then is your emitted code, effectively, your machine code gets output into the code cache. Then what happens as you're running through is it replaces the interpreted running application. The other thing that's key here is there's also this idea of on-stack replacement. If you're interpreting and if you think about, most Java applications are running in some form of a loop, because otherwise it would just stop. The idea is that if your main function is busy like it is in Hello World there, then you want to be able to replace the main interpreted loop with a compiled one. That's where on-stack replacement comes in to actually be able to replace the code that's being interpreted with the compiled target.

There's a bit of a bigger story that's starting to emerge as well, AOT, Ahead of Time Compilation. It's really great to do JIT, because when we do JIT, we have all the information about how our application is working, and what it's doing. It comes at a cost. You've got slow startup time. Or, you actually often have a spike in memory usage at the beginning of your application, while you're creating all those temporaries to wire up Spring, or whatever else. That slow startup time, and that high spike in memory, if you're running on cloud, it might not be the factor that you want to actually optimize for. This is where you start to get into a trade-off of saying, JIT is really cool for long running applications but have slow startup. Actually, AOT is pretty good in cases where my application runs almost in a consistent way.

Ahead of Time Compilation

What could we do here? We can actually use ahead of time compilation as well. We can use AOT plus JIT. We've seen in some of the talks about GraalVM, how you can supply a profile of your application, effectively try and do some better optimizations there. Inside the JVM itself, there is a tool called jaotc, the Java ahead of time compiler. It is used to generate the target code for either a class or a module. The idea is that the JVM itself, it treats the AOT as being an extension to the code cache. What you can do is you can stuff in precompiled classes or precompiled modules into the JVM and still benefit from JIT. There's a couple of different things that you could potentially do there.

Someone asked a great question when I gave this presentation in New York, what about if somebody tries to give you something that's doesn't work? Maybe it's some random code, or maybe it just wasn't compiled in the same way. The way in which you compile AOT code is quite specific. What it does is it actually has a fingerprinting technique, which looks at the fingerprint on the class, essentially. It looks at the one that you've got currently that's running, and it compares them. If they're different, it won't be able to use that effectively at runtime. It will reject it and go back to interpreted. The jaotc tool takes a class file, and it gives you a shared object like in this example.

In this example, I'm just doing a very simple Hello World. I'm outputting it to libHelloWorld.so. Then I have to use ExperimentalVMOptions, AOTLibrary. Parse in the AOTLibrary and say, HelloWorld. This slide makes it look really cool. That works fine. When you compile these things, you usually have to indicate what GC you're using. That's one option. You have to supplement it with other types of flags that you have as well at runtime. It becomes a really big potential inconsistency between what you compile and what you run. Obviously, the first thing we go is, that sounds like Javac, where's my build tool? Actually, build tool support for this is not there yet. If you wanted to do a mixture of saying, I'd like to compile this module, but run the JVM as you normally would. There's no Maven plugin that does that.

Something that I'm experimenting with at the moment I mentioned, comes back to the beginning of the talk, where one of the things I like to do is hack around on Apache Maven. I'm actually working on a plugin at the moment for jaotc that will enable you to do that consistent rebuild and run with shared objects and mix it in with that ahead of time and JIT compilation as well. There's always a trade-off. Obviously, these things, you have to balance them out. You'll probably have to try and find a balance between what makes sense to be ahead of time compiled. Maybe there's the stuff that you use at startup, versus what you want to leave to be essentially jittered.

The Bigger Picture

We talked a lot about the JVM. How it works. How it gets interpreted, and everything else. Garbage collection is something we haven't really talked about much other than there is garbage collection, and you need to think about it. A lot of the reasons why I give this talk is purely out of compiler interest and just because I like fiddling around and showing what's going on inside the JVM. If you have a performance issue, please don't start there. There are other places that your time will be better served looking into. Looking at the overall profile of your application. Most of the problems that you have with performance are going to be network, database, I/O base related. Look at those first. Then the other thing is, what are you deploying on? Obviously, cloud starts to complicate that story a little bit.

Questions and Answers

I'll just show you the IGV tool just so you can see it. You can actually see a bunch of stuff that it was throwing out there as we were going through. You can actually have a look at what the high-tier tree looks like after it's done. Then what it simplifies down to. I say simplify, it's still pretty complicated. Probably, looking at printInt makes a little bit more sense, because it's a little bit smaller.

Participant 1: Is there any work on taking the code cache from an existing running JVM and then using it as an ahead of time compiler?

Gough: Yes. Certainly, from what we've heard in different talks as well as specifically with GraalVM, you could supply a profile to the ahead of time compiler, so it can make some better decisions about the compilation that it does. There's also within, I think, some of the other JVMs as well, there's an ability to effectively dump the profiling information that you get from the running VM, and you can supply that into some of the different tools. I haven't experimented on that with jaotc yet. I'm pretty sure there is support for that across a lot of the tool chains. I know, for example, Openj9 does it, which is another JVM.

Participant 2: It seems that Oracle now essentially has two different JVMs or two different JIT compilers? How do they compare feature-wise in terms of optimizations they can do maybe performance-wise, in steady state, obviously.

Gough: Certainly, the JVM as an ecosystem, at least within the scope of this talk, is the same. The difference is the pluggable JIT compiler. You've got an option of HotSpot or you've got an option of Graal. One of the things that you see with HotSpot, because it's in C++, it doesn't need to recompile. You will probably see faster startup times from HotSpot. With Graal, the things that start to become quite interesting is more of the dynamic languages that run on the JVM that don't necessarily have techniques around boxing. Things like that, where previously you would have been slow, you can potentially get better benefits from being able to write phases that accustom to that particular language. For example, some Scala applications running on the JVM have seen in the order of 30% to 40% improvements by using Graal.

Participant 3: Twitter.

Gough: Probably Twitter, actually.

Participant 2: Do you know if Oracle has any plans to kill one? Obviously, which one? Probably, HotSpot?

Gough: I would say they're supplementary technologies. I don't work for Oracle. I couldn't say what was in their plans. I reckon that, crystal ball, they'll at least be around for the next couple of years. The aim of this talk is to show some of the new experimental stuff. I think we have to bear in mind that a lot of the applications that we run are super-fast as a result of HotSpot. A lot of the stuff we talked about is still under experimental mode.

Participant 2: Do you know what happened to Excelsior JET? They were developing ahead of time compiler and they stopped around the time that Graal was announced.

Gough: I don't know, I'm afraid.

 

See more presentations with transcripts

 

Recorded at:

Oct 05, 2020

BT