BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Understanding JIT Optimizations by Decompilation

Understanding JIT Optimizations by Decompilation

Bookmarks
38:30

Summary

Chris Seaton shows how they have developed a pseudo-code decompiler for optimized Java code, and how it helps them understand how the Java JIT compiler is working in order to improve their code.

Bio

Chris Seaton is a Researcher (Senior Staff Engineer) at Shopify, where he works on the Ruby programming Language, and a Visitor at the University of Manchester. He was formerly a Research Manager at the Oracle Labs Virtual Machine Research Group, where he led the TruffleRuby implementation of Ruby, and worked on other language and virtual machine projects.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

Seaton: I'm Chris Seaton. I'm a senior staff engineer at Shopify, which is a Canadian e-commerce company. I work on optimizing the Ruby programming language. I do that by working on a Ruby implementation on top of Java and the JVM and Graal called TruffleRuby, which is why I'm here today. I'm the founder of TruffleRuby. I write about compilers and optimizations and data structures. I maintain the rubybib.org, which is a list of academic writing on Ruby. In this talk, when I talk about compilers, I mean the just-in-time or JIT compiler, that's the compiler that runs as your program is running. I don't mean javac in this context. I spend a lot of time trying to understand what the Java compiler is doing with my code, and how I can change my code to get the result I want out of the compiler. I also spend a lot of time trying to teach other people to do this.

Tools for Understanding the Compiler

There's a few tools for understanding the compiler. You can look at the assembly code that's produced by the compiler. You can use a tool like JITWatch to look at the logs that the compiler produces as it produces the code. The ultimate option is to reach into the compiler and actually look at the data structures it uses to understand, optimize, transform, and compile your code. All these options are quite complicated and they aren't very accessible. I'm experimenting with some new ways to understand what the compiler is doing, including by trying to give you back pseudo Java code from the compiler after it's finished running, or part of the way through its running. The idea is that anyone who can understand Java, which should be most Java programmers, can look at what the compiler is doing in terms of Java code, which they already understand.

At Shopify, I maintain a tool called Seafoam, to help us look at these data structures, and to do this decompilation from optimized Java back to pseudo Java. It works within the context specifically of Graal, so if you're not a Graal user already, it may not be immediately applicable to you, but maybe it's another good reason to experiment with adopting Graal. Using it, we can gain a bit more of an understanding of what the JIT compiler really does. I'm always amazed that people argue online about what the JIT compiler does do or doesn't do for some given code. Let's simply dive in and check.

What the Just-In-Time (JIT) Compiler Does

Most of the time, your Java program will start as source code in Java files on disk. You'd normally run those through the Java compiler, so javac, to produce bytecode, which is a machine readable representation of your program. Not everyone's aware that there's a second compiler, the just-in-time compiler. This takes your bytecode while the program is running, and convert it to machine code which can run natively on your processor. There's also another compiler you can use, an ahead-of-time compiler that produces the same machine code, but instead of you keeping it in memory, like the JIT does, it can write it out to a file on disk, or an executable file, or a library, or something like that. That's a bit more popular these days due to native-image, which is part of the GraalVM. It's been an option for a long time, but it's getting more popularity these days.

In this talk, the compiler we're interested in, and the configuration we're interested in is using the JIT compiler to compile bytecode to machine code at runtime. Some of the ideas apply the same for the AOT as well, but we'll just keep it simple for this talk. We had a little arrow there for the JIT, but really, the JIT is a big thing. It's a very important thing for the performance of your Java application, getting the performance out of your application you'd like. It does a lot of things. One of the problems with it is, it's a bit of a black box. If you're not sure why it's giving you the machine code it is, why it's optimizing in the way it is, what it's going to do with a given program. It's quite hard to figure it out, because it's definitely seen as a monolith. It's quite hard to see inside. Really, there's lots of things going on there, there's multiple processes. It parses the bytecode, so it re-parses it like it parsed your Java source code originally. It produces machine code. In the middle, it uses a data structure called a graph, which is what this talk is about. It's about looking inside the JIT compiler at that data structure.

Why Would We Want To Do This?

Why would we want to do this? Just for interest for one thing, it's interesting to see how these programs work, especially if you spend a lot of time using the Java JIT. It'd be interesting to see how it's running and why, just for interest. You may want to understand what the JIT is doing for your program, actually, for your work. You may want to understand what it's doing with it. You may want to figure out why it isn't optimizing as you were expecting. If you're trying to get a particular performance out of your program, you may want to understand why the JIT compiler is doing what it is in order to get the best out of it. Or perhaps you're working on a language that runs on top of the JVM. For example, I work on TruffleRuby, which is a Ruby implementation, but it runs on the JVM, which is why I'm speaking in a JVM track at a conference. Or if you're working on the Java compiler yourself, obviously, that's a bit more niche. There are people doing that. What we can use it for is we can use it to resolve online discussions where people are guessing what the JVM and the JIT does, and we can find out for real by actually looking inside and asking the JIT what it does. Nobody is advocating that this should really be a normal part of your daily work to analyze what the Java JIT is doing as part of your workflow. Nobody is suggesting that. It can be useful sometimes.

The GraalVM Context

This talk is all in the context of the GraalVM. The GraalVM is an implementation of the JVM, plus a lot more. It runs Java. It also runs other languages, such as JavaScript, such as Python, such as Ruby, such as R, such as Wasm, and some more. It also gives you some new ways to run Java code, such as the native-image tool I mentioned earlier, which allows you to compile your Java code to machine code ahead of time. This is available from graalvm.org. If you're not using GraalVM, then a lot of this won't be applicable, I'm afraid. Maybe it's another good reason to go and look at GraalVM if you haven't done it already.

Assembly Output

Understanding what the JIT compiler does. We said that the output of the JIT compiler was machine code, so the simplest thing we can do is look at the machine code. A human readable version of machine code is called assembly code. If you use these two options, so if we unlock the DiagnosticVMOptions, and if we print assembly, then it will print out the assembly for us every time it runs the JIT. This option depends on a library called hsdis, that isn't included with the JVM. It can be a little bit annoying to build, which is an unfortunate thing. You can't just use these flags out of the box, unfortunately, and get actual assembling. That's what it would look like. It gives you some comments which help you orientate yourself, but it's pretty hard to understand what's been done to optimize here. It's definitely hard to understand why. This is the most basic of tools.

A better tool is something like Chris Newland's JITWatch. If you are not for DiagnosticVMOptions, again, you can TraceClassLoading, you can LogCompilation, and the JIT will write out a log of what it's done, and to some extent, why it's done it. Then you can use JITWatch to open this log. It's a graphical program, where it can run headless. It will do something to explain what's going on. For example, in this view, it's showing us the source code, the corresponding bytecode, and the assembly. If we zoom in, you're still getting the same assembly output here, but now you can get a bit more information about which machine instructions correspond back to which bytecode in which line in the program. This is a better option. I'm not going to talk more about JITWatch here, it's got loads of really useful tools. I will consider using JITWatch most of the time.

Problems with Assembly and JIT Logs

What's the problems with assembly and these JIT logs, though? You're only seeing the input and the output, still, not really the bit in the middle. JITWatch will show you the bytecode. Some people think it is a bit in the middle, but really, it's the input to the JIT compiler, and then it shows you the output as well, the assembly code. You're trying to understand what was done and why by looking at the lowest level representation. When you look at assembly, most information is gone, so it's not useful to answer some detailed questions. Assembly code is very verbose as well, so it's hard to work with.

Graphs

We said in the middle of the JIT compiler is this data structure, and this is a compiler graph. That's graph as in nodes and edges, not graph as in a chart or something like that. It's this data structure we're going to look at. We're actually going to reach inside the JIT, and we're going to look at this data structure in order to understand what the compiler is doing and why.

How to Get Compiler Graphs

How can we get the compiler to give us its internal data structure? Graal's got a simple option, so graal.Dump equals colon 1. Colon 1 is a notation you can use to specify what things you want. It's got some complexity but colon 1 gives you what you probably want for most stuff. Here's an interesting thing. Why is this a D system property? That's because Graal is more or less just another Java library, so you can communicate to it using system properties like you would any other Java library or application. This then prints out the graphs when the compiler runs.

What to Do With Graphs

What can we do with these graphs? Like JITWatch, there's a tool called the Ideal Graph Visualizer, usually shortened to IGV. This lets you load up the graphs into a viewer and analyze them. This is a tool from Oracle. It's part of the GraalVM project. It's being maintained by them at the moment. We can zoom in on the graph. I'll explain what this graph means when I start to talk about the tool I'm specifically using. This is what Ideal Graph Visualizer will show you. At Shopify where I work, we use a tool which prints out the graph to an SVG file or a PDF or a PNG. That's what we're going to use instead. It's just the same data structure, it just looks a bit different and it's generated by an open source program instead. Seafoam is this work-in-progress tool for working with Graal graphs, and we can produce these images from them.

What else do we do with these graphs? How do we read them and understand them? Here's a simple example. I've got an example arithmetic operator, so a method: it takes an x, it takes a y, and it returns adding together the x and the y, and they're both integers. To read this graph, we've got boxes, nodes, and we've got edges, which are lines between them. It's a flowchart basically. In this case, P(0) is a parameter, first parameter, P(1) is the second parameter. They are the x and y for this ADD operation. The 4 just means it's node number 4, all the nodes are numbered. The result of that flows into returning, so we return the result of adding parameter 0 and parameter 1. Then, separately, we have a start node. What we do is we run from the start node to the return Node. Then every time we need a result, we then run whatever feeds into the result. It's a flowchart, and it's a data graph, and it's a control flow graph at the same time. It'll become a bit more clear when we look at a slightly larger example. I've got a website where I talk about how to look at these graphs and how to understand them a bit more.

A slightly more concrete example is an example compare operator. This compares x and y and returns true if x is less than or equal to y. We have our less than operator here. Again, we have the two parameters. We notice this is less than rather than less than or equal to. What the compiler has done is it's using a less than rather than a less than equal to, and has swapped them around. Instead of saying this is less than equal to this, it's saying this is less than this. The reason it's saying that is something called canonicalization. The compiler tries to use one representation to represent as many different types of programs as possible. It uses one comparison operator if it can, so it uses less than, rather than using less than and less than equal to. That returns a condition, and then Booleans in Java are 0 or 1 under the hood. We then say if it's true, return 0, if it's false, return 1. Again, we have a start node and a return node.

It starts to get more clear how these edges work when we start to talk about local variables. Here we do a is x plus y, and then we do a times 2 plus a. If you notice here, we have x plus y. Then that is used twice. This represented the value of a, but it's never stored in something called a in the graph, it simply becomes edges. Anyone who uses a simply gets connected to the expression which produces a. Also notice, we have the multiplied by 2 has been converted by the compiler into a left shift by one, which is a compiler optimization.

That red line becomes more complicated. The red line is the control flow, if we have some control flow, so we have an if. The red line diverges out so now there's two paths to get down to the return depending on which side was taken of the if. The reason for the StoreField in here is just to make sure there's something that has to happen, so the if sticks around, doesn't get optimized away. That you see the if takes a condition. As I said, because the Booleans represented 0, 1 in Java, it actually compares the parameter against 0, comparing it against false.

Why This Can Be Hard

Why could this be hard? This is a useful way to look at programs. I've shown fairly small programs before. There's lots of reasons why this gets really hard really quickly. It's still a trivial Java method written out as a graph. I can't even put it on one slide. It gets so complicated, so quickly, and it gets almost impossible to read, they get very large very quickly. Graphs are non-linear as well. This is an abstract from a graph. You can't read this. You can't read it from top to bottom very well. You can't read it from left to right. It's just an amorphous blob, we call it a sea of nodes, or a soup of nodes. If you notice, there's things beneath the return, but obviously, they aren't run after the return, so it can be hard to read.

They're inherently cyclic. They're not trees, they're not ASTs. They're graphs with circles. Here, this code has a loop in it. It loops from this node here, back up to this one and runs in a circle. This is a while loop or something like that. I think humans just aren't particularly great at understanding circles and things, to try to reason about where in the program it is, and the circle is, is complicated. They can be just hard to draw. Even when they're not large, they can be tricky to draw. This is an example of IGV, Ideal Graph Visualizer the other tool. This has lots of things crossing over each other when ideally they wouldn't. This is part of the reason why we built Seafoam at Shopify. Laying out these graphs can be very tricky, and it gets trickier as they get more non-trivial.

What Could We Do Instead, Decompilation?

What could we do instead? This is the idea. I'm floating it at this conference and in some other venues. How about we decompile these graphs? The JIT compiler takes these graphs that it's using to compile your Java code, and it produces machine code from them. Perhaps we could take the graphs and produce something else instead. Perhaps we can produce some pseudo Java code. Something that's readable, like Java is. Not a graphical representation, but still allows us to understand what the compiler is doing by looking at similar things within the compiler.

Here's a simple example. We've got the same arithmetic operator from before. What I'm doing now is I'm decompiling that to a pseudocode. It's the same operations we saw before, but now things are written out like a normal program I'd understand. T1 is the parameter 0, so x. T2 is the parameter 1, so y. Then t4 is t1 plus t2, and then we return t4. This is all within something we call a block, so there's a label like you would with gotos in C and things like that. It's a pseudocode. It's not really Java code. This helps understand what's going on. Now the graph we have is much more linear, and you can read it from top to bottom and left and right, like you would do with Java code.

If we look at an example with control flow. Here we have statements such as if t6, and t6 is t1 equals t5, and the parameters are comparing and things, then goto b1, else goto b2. You can see which code is run within those. What I'm trying to do over time is restore a structured if as well, so you see if and you see curly brace for the true and then else and the curly brace for the false.

Problems That Can Be Solved With This

What problems can be solved with this? This is the so what, and the interesting bit from this whole talk. We said we'd like to understand what the JIT compiler is doing, gain more knowledge of how the JIT compiler works and what it does, and maybe resolve some queries in the workplace or online about what the JIT compiler is doing or why. We can give a couple of concrete examples here that I've seen people actually debate and not know how to answer without looking at what the compiler is actually doing.

Lock Elision

Lock elision, you may not be aware that if you have two synchronized blocks next to each other, and they synchronize on the same object, then, will Java release the lock between these two blocks? It's synchronizing on one object, and then it's synchronizing on the object again. Will it acquire and release the monitor twice, or will it acquire it once and then keep going? You may think, why would anyone write code like this in the first place? Code like this tends to end up after inlining, for example. If you have two synchronized methods, and they're called one after the other, and they're both synchronized, then will Java release the lock between them or will it keep hold of them? We can answer this for ourselves using Seafoam, this tool we're talking about. What we have when the compiler starts, if we look at the graph decompiled the pseudocode before optimizations are applied, we can see a MonitorEnter, which is the start of a synchronized block, and MonitorExit which is the end of a synchronized block. We can see the StoreField inside it. Then we acquire it again, so we MonitorEnter again, we store the second field. Then we MonitorExit again. At the start of compilation, there are two separate synchronized blocks and we acquire the lock once, and we release it and acquire it again and release it.

The first thing the compiler does is it lowers it. It lowers the program. This means it replaces some high level operations with some lower level operations. The program gets a little bit more complicated before it gets simpler, but we can still see here we've got an enter and an exit, and an enter and an exit. It's just there's some more stuff about how to use the object that's been expanded out. Then as the compiler runs, we can look at the graph, the data structures of the compiler at a slightly later point. We can see actually, it's combined the two synchronized blocks because we can see, now there's only one MonitorEnter and one MonitorExit, and the two field writes are actually right next to each other. We can answer this for ourselves. Yes, the Java JIT compiler, or at least Graal, I think HotSpot does as well, will keep the lock while it runs two back to back synchronized objects. We can answer that for ourselves by using decompilation and compiler graphs to look at what it's doing and why.

Escape Analysis

Another example, say we've got a vector object, so it's got symbol x and y. Let's say we've written everything in quite a functional way, so it's final, and adding produces a new vector. Then if we want to sum but only get the x component, we would do a Add and then get x. The query is, does this allocate a temporary vector object? Some people will say, yes, it will. Some people will say, no, it won't, the JIT compiler will get rid of it. Let's find out by asking the JIT compiler. Again, this is covered in a blog post in much more depth. Here we go. When the JIT compiler starts running before it starts optimizing, we can see it creates a new vector object. We can see it stalls into the vector, and then it loads out just x to return it. It returns t10, and t10 is loading out the x from the object it just allocated. If we let the compiler run, we let escape analysis run, which is an optimization to get rid of object allocations, we can see, all it does is it takes the two vectors in, you can see it loads x from the first one, loads x from the second one. It adds them and returns them. There we wrote a method which looks like it's allocating objects, it looks a bit wasteful, we can see actually JIT compiler can show us that it is removing that allocation. Actually, it's doing what you might do if you manually optimized it.

Something that Seafoam can also do if you still want to see assembly, Seafoam can also show you assembly, so it includes a tool called cfg2asm. This tool doesn't need that annoying hsdis file, and it will give you some assembly output as well. We can see the assembly if we want to with our tool, but we can also use it to answer questions like, will the JIT compiler combine my synchronized blocks? We can use it to answer questions like, will the JIT compiler remove the allocation of this object, which I think isn't needed?

Summary

That's just a little jaunt through the Graal JIT compiler and the Graal graphical intermediate representation, and Seafoam and decompilation, and how I think it can be used. It can also be used for other applications, such as looking at how Ruby is compiled by TruffleRuby, or looking at how your code is compiled by ahead-of-time or AOT compilers, like native-image from the GraalVM if you were using that. It's a prototype development tool, not a product. It's open source and on GitHub.

Questions and Answers

Beckwith: I think maybe the audience would like a little bit of background on hsdis, because I know you spoke about it. It's basically associated with the HotSpot disassembly and that's why it's HS for HotSpot. Would you like to provide how it's different for different architectures and how it's dependent on disassembly?

Seaton: It's effectively a plug-in architecture. The idea is that HotSpot can dump out machine code, by default will just give you the raw bytes, which almost nobody can use to do something useful. Even someone who has experience of working with machine code. There is a plug-in architecture where you can plug in a tool to do something else with it. Normally, you just print out the actual assembly code that you'd like to see, so if you used a debugger or something like that. For complicated licensing reasons that I don't fully understand, and I'm not a lawyer, so I will choose to try and explain, that it can't be bundled by default. I think it is built on a library that isn't compatible with GPL the way it's used in HotSpot. The reason it doesn't matter why the problem is that means that people won't distribute it normally. You have to go and find your own. You can go and download one from a dodgy website, or there's some reputable ones as well. Or you can try and build it yourself. It's a bit of a awkward piece of software just to build, so building parts of JDK on their own aren't very fun. They're trying to improve this actually. As well as using a tool like Seafoam where the machine code gets written to a log, and then you can use that tool offline to decompile it. I think they're trying to plug in now standard permissively licensed decompilers. The situation should get better in the future. At the moment these are really awkward little warts on trying to look at assembly.

Beckwith: It's true. When we did the Windows on Arm port, of course, we had to have our own Hsdis for Windows on Arm, and we used the LLVM compiler to do that. Now I think we're trying to get it out to OpenJDK so that it's better licensing agreement and everything, so it could be a part of OpenJDK. Let's see where we get with that.

Seaton: There's a question about native-image. Native-image is a compiler from Java code to native machine code, in the same way that a traditional C compiler runs. You give it class files, and it produces an executable, and that's a standalone executable that includes everything you need to run it. The great thing about Graal is it actually does this by running the same compiler as the Graal JIT compiler, just slightly reconfigured, so it doesn't need any extra support. Then they write the machine code out to disk. You can use Seafoam and the decompiler to look at how it's compiled that ahead-of-time code in exactly the same way, so you can see what code you're really going to run. I think native-image also produces some other locks in the same graph file format. I think it might give you some information about which classes call methods on which other classes, things like that. I think you can use to look at that as well. If you use Truffle, which is a system for building compilers automatically, you can use it to understand what Truffle is doing and why. Lots of other data structures in compilers are graphs. It's like a common point of communication and a common point of tools for understanding compilers, is being able to look at things in these graph representations.

Beckwith: There is another question about how this will help with finding out errors at the time of compilation.

Seaton: It's pretty rare that there's an error from the JIT compiler. Remember, we're making a distinction here between the Java source code to class file compiler, that's javac, and we're not talking about that. We're talking about when runtime or ahead of time it's compiled to machine code. It's extremely rare for the compiler to miscompile something. If it does, then, yes, Seafoam is a very good tool for using that. I think something's gone pretty wrong if an advanced application developer is trying to debug the JIT compiler. We could expand your definition of errors to be compiled in a way you didn't like, so if you were expecting the JIT compiler, or depending on the JIT compiler to work in a certain way. We have people who, for example, build low latency applications, and they don't want any allocations. For example, what they could do with Seafoam is they could look at all the graphs involved in their application, and they could programmatically detect if there were things they didn't like. You could actually use it to test that your program isn't allocating anything. You could do that for more subtle things as well.

Something we did at Shopify once is we were trying to add a new optimization for boxing. Boxing is where you have a capital I integer. We had some things being boxed and unboxed that we didn't think should be, and we wanted to argue to the Oracle team that they should implement a better optimization to get rid of them. Oracle said, we don't think it's that relevant, this probably doesn't appear in reality. What we did was we dumped out all the graphs for running our production application, and because Seafoam is a library, as well as a command line application, we wrote a little program to use the library to query how often this pattern of boxing and unboxing appeared. We could say, it appears in this percent of graphs, and this percent of time is being done unnecessarily, and things like that. You can use it to reason about your code that has been compiled. We think about using it in tests, just to test that something is compiled in the way we like, rather than manually checking it or monitoring performance. If you want to test it in CI, you can check the graph and say, it has been powered like I wanted. That's good, and assertive.

Beckwith: Do you find it helpful to optimize the running time of the code?

Seaton: Yes, so not all code is just-in-time compiled. If it doesn't get just-in-time compiled, then, by definition, you're not interested in what the JIT compiler would do with it, because it hasn't been called enough to make it useful to do so. Something you can do is you can understand why code is being recompiled. Often, you'll see, say you have code which goes to one of two branches, and it says if x, then do this, if y then do this. If your program starts off by just calling x, then it'll only compile x into the generated code, and it'll leave y as like a cutoff part that says that we've never seen that happen so we won't compile that. Then you can see the second time it's compiled, if you start using y, it'll compile y and you can see which parts of your code have been compiled in and not. Say you write a method that has something that's designed to handle most cases, and then something that's designed to handle degenerate cases that you only encounter rarely. If you look at your graph, and you see the degenerate cases being used, then you think, my optimization isn't quite working, I'd like to not compile that.

In native-image, if there are no additional optimizations or recompilations to machine code? Yes, native-image can't optimize quite as aggressively in all cases as the JIT compiler, that's because the JIT compiler can optimize your code as it's actually being used. It can observe that the runtime values flowing through the program, it's called profiling. Native-image has to be a little bit more conservative, because it doesn't know what value is going to run through your program. A good example in a system I work with, if you have a function which adds together two numbers, and one of the numbers is always the same. Say you have a function called Add, and it's only ever called with 1 and then another number, it will turn that into an increment operator automatically because it sees what the values actually are.

How'd you install on Windows? It's a Ruby application, so it should work on Windows. If you'd like to try running on Windows, and it doesn't do what you'd like, then please do open an issue and I will fix it as quickly as I can. I'm trying to do a website version of it as well, so you can just run it online. We're looking at doing like an Electron version. It's just a graphical application, so it's a bit easier to use.

How would you affect the running machine code? That's where it gets tricky. The graph can tell you what may not be ideal or may be wrong in your mind, but how you fix that is then up to you. We do a lot of deep optimization of Java stuff with my job, because we're implementing in another language, Ruby on top of Java. We're trying to make that fast, so it's fast for everyone else. What we do is we look for things in the graph, we think shouldn't be there that we don't want to be there. If we see a method call, we think that should have been inlined, I don't know why we're left with that call, then we'll go and examine why that call is still there. These graphs include a whole lot of debugging information so you can query where it came from, and why it's still there.

Beckwith: What are the other options with respect to the graal.Dump, you said one was the one that you used, but can Seafoam support other options?

Seaton: What you can get it to do, ultimately, is you can get it to dump out the graph before and after every optimization. I don't have any optimizations on Graal, I'm guessing I had 50 major phases. That turns out to be a lot of files very quickly. The number can set the verbosity of it. You may want to see major phases, not all of them, [inaudible 00:35:30] you get a huge amount of files on your disk. You can configure how much verbosity is. You can also get it to only print graphs for certain methods. You can just constrain what you see, so you get a low volume of stuff. It also makes your application run slower because it's doing a lot of IO to write out these graphs, things like that.

Beckwith: Why do you use Ruby on top of the JVM? Is it for performance or is it for the tools?

Seaton: It's mostly for performance, but also about tooling. GraalVM lets you run other languages on top of the JVM, and the JVM has absolutely world beating functionality in terms of compilers, and garbage collectors, and stuff like that. We'd like to reuse that for Ruby at Shopify. That's what we work on. We're reimplementing Ruby on top of the JVM. It's similar to another project called JRuby, which is also Ruby on the JVM, but trying in a different way. It's a polyglot thing.

Beckwith: In your experience, how often does Seafoam lead to refactoring of high level Java Ruby code versus recommending new JIT optimizations?

Seaton: It's almost always refactoring the Java code to make it more minimal. Java's optimizations work really well. They're almost always good enough to do what we want. It's just sometimes you have to phrase your things in Java in a slightly different way to persuade it to work, and maybe we can make the optimizations better. There's complicated rules in the Java language that the JIT compiler has to meet. The JIT compiler has to always be absolutely correct for your Java code as per the spec. Sometimes, it's just a case of slightly restructuring your code, a little bit refactoring. It's important then to comment because it means we end up with Java code that on the face of it, you think, why it would be written like that. There's a reason that pleases the JIT compiler. Now I say that, it doesn't sound great, actually, maybe we should make the JIT compiler optimizations better.

 

See more presentations with transcripts

 

Recorded at:

Sep 04, 2022

BT