1. We're here at Qcon London 2010. We're sitting with Kresten Krab Thorup. So, who are you?
I'm CTO of Trifork, which is a software company that is setting up Qcon in cooperation with InfoQ. I'm also very much a software developer at heart, I've a background in programming languages, implementation, language design, I worked at NeXT on an Objective-C Compiler runtime system, I've worked on the GNU compiler collection in number of different projects, on the Objective-C frontend, I've worked on GNU Compiled Java, I did a lot of work on the Palm Pilot back-end, I did the first port to Palm Pilot fifteen years ago, so I've been engaged for compilers for many years , I've also been on an expert group on Java generics and that was closely related to some of my PHD work, where I also worked on parameterized classes and type systems in that space. So, programming languages are close to my heart and today, for the last ten years I've been kind of a tech lead CTO in Trifork, now we are a company of a hundred and twenty people, so I'm constantly trend spotting and figuring out what we can do to best solve our customers problems. So, I'm a Jack of all trades in that space of programming languages and I do a lot of work also in spotting things that we should have at our conferences, I'm very engaged in the program committee work at our conferences and that's an opportunity to talk of programming languages with people, even if I don't get to do it in my day job.
2. One thing you've recently done is port Erlang to the JVM. Why would you do that?
Well, at Trifork we've been a quite successful Java company and maybe that is, by and large to be attributed to the fact that we were very early, we jumped very early on the Java bandwagon and we can jump on a new technology very easily when you're a small company. Today we're a much bigger company and I think that Java is starting to get to a point where it is like a - it's a new legacy, there is a lot of new issues that are coming up in our programs, in the software solutions that we do, that are not a very good fit for Java. There is a whole multicore issue of writing parallel programs and even though concurrency was part of the core design for Java, this model for concurrency, I think isn't quite right, and also on the other end of the scale we're doing a lot of integration systems, where we integrate the services, web services running across the net and that also has very similar kind of nature of coordinating things, coordinating parallel programming and coordinating in an integration setting, have this same kinds of problems of coordination, that really I think is going to be central to the next generation of programming languages.
So, I've been organizing these conferences for quite a number of years and I've started to meet all these people, Erlang people and somehow they had this magical ability to reason intuitively about concurrency, and I wanted to understand that, I wanted to get Erlang under my skin and to really understand what is Erlang's concurrency model, and why is it that they can talk so naturally and intuitively about concurrency because to me It was like they were talking like black magic, I didn't understand, so really that was the reason and in being a language implementer and language geek, there is no better way to learn a language than to implement it, so I set out to implement Erlang and Erlang VM on Java, which is a platform that I've come to know and love for the last ten years, ten-fifteen years, so that's the reason, explore this field, figure out if time is ripe to move on to the next plateau.
Java came with object oriented thinking and now I think that we need to extend that paradigm with means to comprehend and talk about coordination and concurrency in a more fundamental way than just adding threads and locks to the language. So, those are really the reasons why I've jumped onto this and then I've been in this company helping build it for ten years and doing more and more administrative things, for a while I was managing the conference business, and now I really needed a couple of months of serious hacking, of serious programming, so It's wonderful to just dive in, you know - head first into real hacking exercise, complete tunnel vision for two months and last fall - like November, December when I spent a lot of time hacking on this, that was really great.
3. And you have some significant results, you can already boot OTP, I think, and Erlang.
Yes, just two weeks ago we reached that point that has been our milestone for very long, for the last four or five months, to be able to get to the boot prompt of OTP and that doesn't sound like much, that essentially getting to the prompt where you have a read-eval-print loop, where you can enter Erlang code and you can evaluate that and print the result. It doesn't sound like much but actually getting to that point, we had to be able to load and compile some seventy Erlang modules and interpret, understand most of the instructions set of the BEAM virtual machine, and getting run-time infrastructure to run threads, run light weight threads, recursion, all kinds of interesting problems that needed to be solved to get to there and then in the last few weeks we've just got the compiler to run, that's another significant Erlang program, that you can now run the compiler, the Erlang compiler which is written in Erlang on Erjang, so it's a nice self-recursive point.
Erjang itself is Java code, at this point, like close to forty thousand lines of Java code, but half of that is the compiler that compiles Erlang bytecode called BEAM code into Java bytecode, and the other half is implementation of native functions, kind of equivalent of "native functions " in Erlang they're called BIPs, for built-in function, and the compiler's reasonably complete, there are still a few instructions at this point that we haven't finished, the runtime system, they're still lots of corners where we have not implemented and stuff, so there's lot of work to do.
Yes, that's one of them, the more interesting challenges where obviously the problem here is that you need to be able to create arbitrarily many processes, so a process has to be on the order of magnitude of a data structure, so use can't use a thread for every process, you have to have something where you can schedule processes on to threads, and in Erjang, actually as it is right now, it runs best on a single core because the scheduler I picked up somewhere else isn't really - needs some rework, so processes are something that are scheduled on to threads, so if you run Erjang on a multicore you have typically just a number of threads equivalent to the number of cores so you get the real concurrency and then the processes are scheduled out there, and the way we do that, by means of this third party framework called Kilim, and Kilim is a bytecode rewriter, so actually I just generate some "plain" Java program with some annotation in there and then Kilim takes that bytecode and rewrites it into Java bytecode again, just a pass through loop, that then is able to suspend a process and that means that if you have a set of method invocations and you call suspend then what will do is unroll the stack and every time it rolls back will save the state of all the local variables and then you can reapply that at a later point in time, meaning that it will jump on and make the same calls reestablishing the local variables and I'm using a third party library for that, even though I fixed several bugs in it and also made it, for my cases, at least twice as fast as the official Kilim solution, so if you can use Kilim for something else maybe you can check out my Kilim version that is on Github.
So, that's one of the major challenges, it's actually creating lightweight threads and it's really surprising how efficient the strategy is, is one of the interesting findings that you can imagine, actually doing a context switch, unrolling the stack and reapplying it takes up a lot of instructions, but as it happens if you lay out your memory data structures so that you don't typically hit your local cache in that sequence it just becomes a lot of instructions that CPUs can really do well, whereas, if it would be something where you need to allocate a lot of memory that's where things get expensive, because it needs synchronization, it needs access to main memory and all kinds of stuff. So, this is one of the things that blew my mind really is, how many instructions you can execute as long as you are careful not to touch memory which is not on the cache, so, amazing. That's one challenge. Another big challenge is encoding tail-recursion, this is something that has been discussed several times on JVM languages mailing list at Google and basically I'm using a trampoline style encoding for that, which is very similar to Per Bothner's Scheme implementation KAWA, but has better locality so, again, making sure that you only touch memory in the cache, actually doing tail-recursion in the general case isn't that expensive.
Yes, it uses a Java stack but essentially whenever you want to call a function, what I do is, I just call that function and then the return value can be either the real return value or a marker that says this is - a tail-recursive call happened, and then I'll enter a loop saying as long as I return this marker I'll access some thread local state to figure out what is the next step in that tail-recursion, so essentially it's turning the tail-recursion into a loop, but that means every function invoke that's a sequence of a call and a little loop, and the next Erlang level a semantic function invocation that's a call and then a little loop, and that's then that generates a lot of code - that's kind of the thing, it's amazing how many instructions you can execute as long as you don't hit the cache.
I haven't studied exactly what eventually the machine code that comes out of this looks like, but I would imagine in many of these cases you can optimize the loop away because if the prediction is that the function being called is such and such it knows that it will not return that special marker value, so it doesn't have to enter that loop, so can probably eliminate it, so there's all these - as you design the way you generate code, you have to think like a JVM, imagine where could - if I was a JVM how would I optimize things around here and that makes it very easy to push work on to the JVM, I think JVM is a wonderful platform for generating code because there are so many things you don't have to worry about, like a static function call in a Java level, that's something that can be very easily be inlined, so having each individual instruction almost call a static function is not an issue because the soon as the JIT hits this, it will be able to inline all the static function calls so many all the instructions of the BEAM virtual machine just turn into a static function call or a static method call and that if it makes sense or the JIT it will inline that, if it doesn't make sense it won't. So, there's a lot of things, tail-recursion, we've talked about threading, another thing is pattern matching - that's actually in this case very easy because the pattern matching is something that is generated into the bytecode, so the bytecode has this state exploration, is this a tuple, there's instructions saying "Is this a tuple, is it a two-tuple" and is extracting the first element is that a list, does it have a tail, etc.
So, each of these decision point is instructions, so all the big pattern matching statements has become sequences of instructions that are optimized for those decision trees, so I just have to implement individual instructions, so one of the challenges in that is actually generating an efficient code for that. One of the early things I did was to build a type inference for the Erlang bytecode format so that, so I can know as much as possible about state, types of local variables and that allows me to generate code that doesn't need as much virtual dispatch as much dynamic type checking because I know it's a tuple then I can type it as such, has always been one of the challenges of JVM language implementations, you have to fit in to that particular type system of Java. So, those are some of the major challenges and it seems to work out pretty well, I think there is lots of opportunities for doing things better, for doing things faster and lots of problems to be solved.
Well, I think having real tail-recursion in the language is the most important one, having real tail-recursion in the JVM, because many of the other features are still very language specific, I mean there's a talk about invoke_dynamic and it definitively fits a class of languages, that's fine, I don't have anything particular against, once you move cross that threshold where you have to add instructions, there is probably a whole list of good things. For Erlang it is tail-recursion and then it would be very nice to have, maybe some of the things that could help get access to local variables, to real lightweight threads, real lightweight processes.
I can easily imagine that there could be things that could make it more efficient, I haven't looked into VM extensions in that regard, but maybe some kind of continuation built straight into the language, I don't know, maybe you turn on heap allocated activation records, just heap allocated activation records, they will be quite expensive, but maybe it will be cheaper than what I'm doing now. So, out of these things that are public to talk about right now, I think tail-recursion is the most important and I think we'll see lots of people that are starting to realize the power and the significance of tail-recursion. The other things, I think they're more - it will be very nice to have tools that would allow us to understand better what the JVM does.
7. ... how the JVM optimizes code...
How it optimizes its code, what is the intuition of the JVM, I have some picture inside my head of what I think it is, based on my experiences with JVMs over the last many years and reading research papers and talking to the people that are implementing the VMs, and I know a lot of these techniques and I understand it, but really when I was working on the C compiler at Next, I was often called in to debug some strange situation and in that context I could immediately go into gdb and switch on instruction level debugging, you can debug in gdb one instruction at a time and actually understand this individual instruction, see what the compiler was generating for this expression and that level of understanding what's going on down there it's kind of lost in these layers of abstraction and I know you can actually get the generated code out of - like Hotspot - but it'd be nice to have some kind of way to see that translation, see those decisions as they happen and understand that in more detail, now I've just have to kind of go in my intuition about what a JVM would do in this kind of context.
When I look at some Java code I imagine if I was a JVM I would probably guess that this type is the same as this and because I made this type check and when I inline that I don't have to make a virtual dispatch and can just make a direct call and then you get detail of bla, bla, bla, so thinking like a VM in that way and that's very unsupported, it's very difficult to understand what's going on inside there. I'd rather see tools in that direction, I think that it will be a big value, but I guess not many people are interested in those kind of tools, I don't know.
Yes, there is a good - well, JVM languages have totally different focus, different JVM languages have different focus and for some of the performance is very critical, it's not as critical for all of them, for some of them compilation time is very important, I think many of the VM implementations are using ASM, I prefer ASM out of the ones that are there now, it's very low-level API for generating bytecode but it generates correct bytecode, which is one of the challenges. Typically I like to understand what the bits are on the wire but at the same time ASM gives me tools to ensure that they're reasonable bits. So, I use that, and there is talk about having various shared infrastructures, but in the end I think, each individual language, they're still so different, there are so many differences from language to language, that each language has kind of a different strategy for code generation, like in Erjang - all the functions are generated into static numbers of one class, so a module becomes a class with a lot of static members in, and that's kind of the focal point and stuff around that. In JRuby it is very different because every method has to be encoded into its own class, at least the last time I looked into that, into the code, so the kind of the coarse-grain things that are very different and in Erlang it is very important with this tail-recursion and that's the most important thing in Erlang language.
There are so different strategies for the code generation, for the run- time data model, run-time model and the differences in the run-time model is really what makes these various strategies very different. Some languages like Clojure has, like Clojure's also using ASM, Clojure has a strategy for run-time model that is very kind to exist in Java applications, so you can mix Clojure list and tuples with "plain" Java objects and that means that the generated code has to do dynamic checks everywhere. In Erjang, I went the other way and said, this was actually a design decision I made pretty early on it, it was very much in doubt whether I should do that strategy of being Java friendly where Erjang actually has its own top class in its own type hierarchy and assumes everything is an EObject and not a java.lang.Object, and that means that in Erlang code there can only be EObjects and subclasses of EObject, that's kind of hostile to existing Java code but that means I can make assumptions about types that make the code run faster, I hope. I haven't made the experiment of doing one or the other, really, so I still don't know if that design decision was the right one, but it definitively allows me to generate code that is using real Java types, to do dynamic dispatch that doesn't need to do instanceof and casts all the time.
9. So, if people want to contribute or just try out Erjang, where do they have to go?
Well, they can go to the github and look for Erjang, actually the domain Erjang.org points to the wiki at github right now, and from there you can download the code, build it and run the shell and then basically if you have some project, if you have some Erlang project you can just running it on there, and see what happens, right now we don't, as of this date we don't have network I/O implemented so we don't have network driver [Editor's note: network support has been implemented in recent versions of Erjang - for details see Kresten's blog: http://www.javalimit.com/ ], and that is probably a limitation for lots of existing Erlang projects, but hopefully in a month of time we will have a network driver. So, after that it should be not totally in the wild to get something like Couch DB or ejabberd or something like that running in there, so that's interesting if people want to try that, there is just looking at the test suite we‘re building a small test suite which works by having a program being run in Erjang and the same program being run on Erlang and then comparing the results, like all the basic operations, like operators for comparison, equals, not equals, etc comparing different values and making sure, so extending that basic test suite for the core VM will really be a nice contribution.
Right now, we're only - we haven't tested really anything significant on Windows and we're running two developers right now, we have one contributor, Eric, and, obviously it's going to be interesting to see as we move forward if we can pull more people on to the project, and now having reached the state where the shell is much more easy for people to pick up and try to run something, I'll be really interested in having people try to run some performance tests on it, on different Java virtual machines, figuring out what the bottlenecks are, you know, it's very easy to take some Erlang performance tests and try to run it down there, most Erlang programs should really run, so if there's a bug could be a runtime bug, or a compiler bug, or whatever, if there's something that doesn't work look into it, that's how I got into open source projects myself, I picked up GCC and did something in it twenty years ago, bug reports or whatever. So, it's as with any open source project, it needs to get off the ground, and right now we have two developers which is a hundred percent more than having just one developer, but we probably need three or four kind of active contributors for it to be sustainable system.
[Editor's note: Erjang is a fast moving project and a lot of work has been done on performance and other aspects between this interview and now - for current details see Kresten's blog: http://www.javalimit.com/ ]