In this episode, Ixchel Ruiz, Senior Software Developer at Karakun, and Gunnar Morling, Software Engineer at Decodable, sat down with podcast host Michael Redlich, Lead Editor of the Java topic at InfoQ, and discussed the recent publication of the InfoQ Java Trends Report. Topics covered included: the advantages of the Java six-month release cadence; Project Lilliput and compact object headers; nullability in Java; the impact of Python; and the One Billion Row Challenge.
Key Takeaways
- Java is here to stay despite those that have predicted its demise. Java may not necessarily be at the forefront of language evolution, but developers have, at times, come up with language constructs for which the world has never before seen.
- Java has also adopted constructs which have been proven in other languages.
- The six-month cadence allows developers the flexibility to pull back a feature from a given JDK release, if necessary. In the past, it was unknown when the next JDK release would be available.
- Compact Object Headers, part of Project Lilliput, and nullability in Java are exciting new features that are evolving in the Java programming language.
- The popularity of Python, especially in the university environment and scientific fields, has had a negative impact on attracting young developers to the Java programming language.
- While Python seems to be a natural choice because of its great libraries, it will not provide the functionality that is required in an enterprise application.
- Gunnar Morling, creator of the One Billion Row Challenge, did not anticipate how popular this challenge would become. The intent was for the Java community to learn something new and inspire each other in a collective way.
Subscribe on:
Transcript
Michael Redlich: Hello everybody. This is Michael Redlich. We're here to record the podcast for the 2024 InfoQ Java Trends Report. I'm here with Ixchel Ruiz and Gunnar Morling. I'll allow them to introduce themselves and then we'll get started. So why don't you start, Ixchel.
Introductions [00:44]
Ixchel Ruiz: Thank you very much for having me here. I'm super thankful and honored to be in such great company. As you mentioned, my name is Ixchel. I am from Mexico originally. I live in Switzerland. I work for Karakun, which is a consulting company. I do a lot of things. I wear several hats. So for example, I am a conference speaker. I organize conferences, too, and Java User Groups, so I'm very active in the community. I see my job as helping and sharing knowledge in the developer community, mostly in the Java community.
Michael Redlich: Awesome! Gunnar.
Gunnar Morling: Hey there, I'm Gunnar. I'm originally from Germany. I am still in Germany, so I didn't move far. I am a software engineer. I work at a company called Decodable, and what we do there is build a platform, a fully managed platform, for stream processing based on Apache Flink and based on other real-time tools. So it allows you to do real-time ETL, stream processing, all those kinds of use cases. And we use Java for that. So that's one of the many good things about it.
Besides that, I'm also, well, I would say involved with, you could call it outreach advocacy, around different things. I mean, around Debezium, which is a project I led in my previous job at Red Hat. It's a tool for Change Data Capture. So I talk a lot about this. I talk about stream processing with Flink, for instance. And I also like to talk about Java, right?
One of the things I did, maybe some people have heard about this earlier this year, was a coding challenge, The One Billion Row Challenge. And maybe we'll have time, a few minutes, to talk about it. That was one of the things why I tried to inspire and help to share knowledge within the Java community.
Michael Redlich: Awesome! Thank you. And both of you are Java Champions, right? So that's awesome. And I too am a Java champion, which-
Gunnar Morling: Was awesome.
Michael Redlich: The way it was announced, the way I found out was quite interesting. I'm Michael Redlich. I've been writing for InfoQ since 2016. I'm currently the Lead Java Editor in that space and retired from ExxonMobil in June 2023. But I spend all my days writing for InfoQ. I run a Java user Group, the Garden State Java User Group, and we meet in New Jersey. By the way, I'm a native New Jerseyan! Let's see, what else do I do? I do some Developer Advocacy for Payara on a contract basis and some technical writing.
The 2024 InfoQ Java Trends Report [03:30]
Michael Redlich: So let me preview the report that's going to probably come out by the end of this month. This is the sixth year that we are hosting this. The report will have links to the previous five years. So from InfoQ editors, it's myself, Ben Evans and Bazlur Rahman. And then from the outside, we have Holly Cummins, Grace Jansen, Emily Jiang, Ivar Grimstad, and Andrea Peruffo. So they've all provided input for this. So look for that at some point. This is a great way for people from the Java community to provide input on what's happening in the Java space.
What’s New and Exciting That We Didn’t Expect [04:02]
One of the things we ask the contributors is "What is new and exciting that we didn't expect over the past year?" So, Ixchel, if you want to expand on that a little bit and what you think that you found new and exciting.
Ixchel Ruiz: Well, this is not new because it's the fourth preview [JEP 495, Simple Source Files and Instance Main Methods (Fourth Preview)] ... Well, let's step back. I'm talking about the next release, Java 24. That is going to be early next year. So that for me is the new new and I am looking forward to that. And you will say that it's going to be very silly, or I hope it's not. And it's the fourth preview on how to invoke Java in a less ceremonial way. So they have been trying to do the "Hello, World" in the most simplified expressful or meaningful way.
And as I said, it's the fourth preview, so we have been working on it since I think Java 17 [Java 21, JEP 445, Unnamed Classes and Instance Main Methods (Preview)], which was the first preview, and now Java 24 will have the fourth one. And for me, this is a reflection of the Java philosophy. I mean, Java, it's a language that I like, first of all, because of the ecosystem.
Second of all, it's because of this idea that we are here to stay, so we are trying to keep up with the past and we're also trying to keep up with the future. And in these kinds of situations, sometimes on some developers, some architects, some languages are crucified in the present. And I think Java has been solving a lot of issues without falling into this category. It's providing functionality and clarity and things that help developers in the today while not betraying its past.
So when Java is deciding about the features and it's deciding about where to move their view, it's not into what is new and what is exciting right now or what has happened the last year. Java has a very broad view because the past for Java and the ecosystem and the platform, it's 20 years. It's actually more, but let's say 20 years.
Gunnar Morling: Yes, 25.
Michael Redlich: It's actually coming up on 30 years next year!
Ixchel Ruiz: Yes, I know. I know. If you start thinking about some of the things that we're adding in Java 23 and some of the things that are going to be added hopefully in Java 24 are affecting things that are part of the core of Java. So how do you bring these new things into what effectively is a legacy system? It is a challenge, and the fact that we are not breaking things just because it is amazing. So I'm excited about this because it reflects the Java philosophy from my point of view. There are other more functional ones that I can speak about, but I always want to highlight the whys and the hows that are behind programming languages.
Michael Redlich: That's awesome. We could talk about JDK 24 in a bit, but Gunnar, what did you find new and exciting from last year that you didn't expect?
Gunnar Morling: So I think, and it ties into a little bit of what Ixchel was saying. So yes, Java has been around for a while, but I think it also is actually very successful in continuously reinventing it and renewing it and keeping it up to date. And arguably it does it in an opportunistic way. So I would say Java is not necessarily at the forefront of language development, and they come up with language constructs or things which the world has never seen elsewhere before.
So this doesn't happen that often, but what rather happens is we adopt things which are proven in other languages. I think that's a very good approach actually, because you see what flies well, what doesn't fly well.
And Java takes those concepts which have been proven to work out well. Well of course adjust them to the Java spirit and Java idioms and so on. But I think it is very successful to take all this input and reinvent itself. And I think one of the things I'm the most excited about is how all this helps to overcome maybe restrictions and limitations, which we had in the past.
And one of the things in particular could be garbage collection and GC pauses, right? Because this always used to be a problem. Yes, I just have long tail latencies with my application because at some point there's a stop-the-world garbage collection pause, and suddenly there's requests and they take a hundred milliseconds or 500 milliseconds or whatever to come back to the user.
And it doesn't make for very good user experience, obviously. And now the things like ZGC, suddenly we have, and I shouldn't say suddenly because this has been an ongoing effort for many years, but now we are actually in a state where we have sub-millisecond pause times. And I don't know, I still feel kind of incredible about the language, and of course also the JVM, well, mostly the JVM, not that much language, but the runtime has come so far that we have this sort of amazing experience.
Similar thing with the upcoming compact object headers [JEP 450, Compact Object Headers (Experimental)], which will substantially reduce the memory consumption. So I think all those things, and there's many other things enable Java to, well, first of all stay relevant, but then also become relevant maybe in areas where Java was not considered, maybe the first choice back in the day.
Michael Redlich: You touched on a couple of things I'd like to expand on a little bit. So the one thing I put for the report, which you'll see is it wasn't necessarily exciting, but it was unexpected. The String Templates API that shouldn't have-
Gunnar Morling: We don't have any more for now.
Michael Redlich: Right, for now. And so my good friend, Barry Burd, who you may know, reached out to me one day. He goes, "What happened with String Templates?" I'm like, "I don't know," and I follow this all the time. I mainly follow the jdk.dev mailing list. I look at some of the other project ones [mailing lists], but I missed it. And it was back in June that there were some unintended consequences and Nicolai Parlog had a video on that.
So I know a lot of developers were really looking forward to this new feature, and I think it was going to be at a second or third preview, but to me that was something like, "Oh, what the heck happened?"
Gunnar Morling: Yes. I mean that's true, but I also think that actually goes to show the maturity of the process and the team behind it, right? Because maybe in that situation would be the easier way to just power through and try to wrap it up and finish it. But instead, I think you've got to be brave and you have to have some courage to say, "No, actually we found this doesn't work well, let's pull the plug for now and let's go back to the drawing board and re-envision it." I don't know.
I think that's really great to see. Also, if it means we will get it maybe a year later, but on the other hand, we have been waiting for 30 years, so I guess that's fine.
Ixchel Ruiz: Exactly. But you read something really interesting for me, for me is how many previews the JEPs got? For example, the Vector API, it's also I think in the seven, eight-
Gunnar Morling: Yes. Ninth incubator right now in [JDK] 24.
Ixchel Ruiz: Yes.
Michael Redlich: Oh, yes.
Ixchel Ruiz: Yes. In a way, we have several releases that show us something. Sometimes we backtrack a little bit. Sometimes we will forward faster. And this is something that makes me feel good about the Java releases because it provides you with this energy to try to move faster because the numbers start to pile up and you're like, "Oh my goodness, a lot of the old version is behind. Maybe I should push for going up into the Java that I am expert and I'm using."
But on the other side, it provides us with an opportunity to try things even though we know that maybe they are not going to stay like that. And that's a good thing because the teams get different feedback from different teams and we can tweak and change.
Advantages of the Sixth-Month Release Cadence [12:50]
Michael Redlich: Something you touched on, Gunner, about the different features and having the courage to step back from a feature. So that was evidence of that with JEP 404, General [Generational] Shenandoah, that was originally supposed to be part of JDK 21, but what I read was that the folks who were working on that felt as if it wasn't 100%, something to that effect. So this is one of the few times that once the feature set is frozen, that it actually got removed from JDK 21.
And then you mentioned the ninth incubator, Vector API, and I remember it was kind of interesting. It went one through eight incubations, through eighth incubation, and then when the new JEP was created, it just said incubator. I'm like, "Wait a minute, how did that happen?" So they changed it. I may have reached out to Sharat [Chander] on that one. What's going on? So then they changed it to ninth incubation. But then the reason they're doing that is they're waiting for some of the features from Project Valhalla to be completed before they move forward with Vector API, which is part of Project Panama.
Gunnar Morling: Exactly, yes. So coming back to the Generational Shenandoah got pushed out by a couple of releases as you mentioned. I think this actually also demonstrates the willingness to do that. It demonstrates, I would say the advantages of the very frequent and strict release cadence because back in the day, maybe if you had a feature which you wanted to be, I don't know, in Java 5, let's say, and then for whatever reason it would be at risk for getting into Java 5, you wouldn't know how many years it would take until Java 6 would come out, right? Whereas now if you don't get it into Java 21, well, okay, I mean it's not great.
I guess you may be disappointed for a day or two, but then, okay, Java 22, Java 23, they will come out in six months, 12 months later. So it's not the end of the world if you are just moving out to the next release. And I think this is one of the advantages of this release model. The disadvantages, actually, I'm curious about you guys. I'm starting to get confused. So what's the current Java version? I mean, back in the day, yes, of course it's Java 8. It's Java 7, you would know it, right? You could be woken up in the middle of the night and you would know. Yes, we are on Java 8 right now. Where about now, are we on 23 or what is it? Is it 24 already? Actually, I have to think about it.
Michael Redlich: So Yes, no, it's definitely 23 that came out in September, and they're working on all the features for 24, which will come out in March.
Gunnar Morling: Right, right, right. Yes, I'm aware of it, but you got to actually think about it actively. I have too, anyways.
Compact Object Headers [15:35]
Michael Redlich: Yes, compact object headers. Ben Evans wrote a great news piece on that recently. I don't know if you wanted to talk about that. That's part of Project Lilliput, I think.
Gunnar Morling: Right. Yes, I mean it's a pity we haven't [don’t have] Ben here. He would be in a much better position to talk about it. But Yes, so the idea is well, every object which lives on the Java heap, it has an object header which defines okay, which is the class obviously, which the object is about or has. But then also the header contains information about locking, about garbage collection and so on. Now every object has that header. So if we are able to shave off some bytes from that, it's obviously going to be a huge improvement in terms of the overall size, which we need for the heap. And this is, in a nutshell, what this is about.
And again, Ben or maybe Roman [Kennke], who is the main person working on that, they would be in a better position to talk about how this actually works and what the magic is behind being able to shrink that header. But it's merged now to [JDK] 24 and it's going to be released as an experimental feature in [JDK] 24. As you said, it's going to be out in March.
And actually I really just wait for the next preview build of [JDK] 24 because I really want to give it a try. I played with the Lilliput preview build a year ago or so. Definitely made some nice improvements. But Yes, I really want to try it again, run, I don't know, a Quarkus application, Spring Boot application, do some measurements to see, "Okay, what is the improvement in terms of memory consumption?" And then also it actually could go to reduce GC overhead because it's just less space which needs to be collected and so on.
There could be locality effects in terms of CPU caches and so on. So I'm super excited and they're not even stopping there. So there's already work going on to even further reduce the size of object headers. I think that's why I'm so excited about it, because it just goes to show, yes, we can use Java also, maybe for things we have back a few felt it uses too much memory for a given use case. So I think those kinds of optimizations ... they are invaluable. It's great to see this in Java at this point in time.
Michael Redlich: Oh, absolutely. Ixchel, did you want to say a few words about that or did you ...?
Ixchel Ruiz: Well, no, for me it's a general remark in reminding, well, probably the listeners know perfectly well that many of the features of the improvements in Java can happen into, well in several ways. But the more interesting ones are features like new syntax that you can use, or the ones that are not so visible for the developers, they will gain something without doing anything.
Gunnar Morling: Right. Just upgrading. Yes.
Ixchel Ruiz: Yes. The gains that they will see in terms of performance and in terms of memory footprint, it's going to be transparent. So magic will happen without them doing anything.
Gunnar Morling: Yes, I mean, the way I like to think about this from a high level view, that can be improvements in Java the language, as you say, language constructs like the method stuff which you mentioned and so on. There can be API improvements like the Vector API, and there can be improvements to the runtime.
And personally, I'm actually the most excited about runtime improvements. So this stuff, garbage collection, all those things, better observability, better JFR support. So that's what always excites me the most. Next, I'm excited about APIs. So like Vector API, I think it just enables us to do more things, which we couldn't really do in Java before. And I'm actually the least excited about language improvements. So Yes, I know it's nice, but usually I don't really care that much. But of course I still realize it's interesting to people and I guess other people would feel different about it.
Michael Redlich: Yes, it's a great topic, I think. And I think Project Lilliput was created last year or early this year, I forget which, but-
Gunnar Morling: Oh, I think it goes back longer than that actually.
Michael Redlich: That's okay. But Ben's news item on that just came out this past Wednesday, and he discusses where we are and what the concept is, and you had a diagram with how the bits are allocated with the compact. So Yes, folks listening out there, definitely check that out and we'll look forward to seeing more about that as we go.
Nullability in Java [20:00]
So another thing I wish Ben Evans [was here] talked about JSpecify, about nullability in Java and this new initiative, I think that's a combination of a few companies that came up with these annotations for nullability in Java. I think this is awesome.
Gunnar Morling: No, definitely. I mean, it'll hopefully reduce or maybe eliminate NullPointerExceptions
, I think it's definitely [a] good improvement. I mean there is this XKCD comment where you have 14 standards and now you, "Oh, let's do one standard to rule them all." And you end up with the 15th standard, right? Because they have initiatives like that before nullability APIs before. But yes, fingers crossed this one is going to be the one to rule them all. It's not just yet another one besides 14 other ones.
Michael Redlich: Yes, that's right. Yes, I haven't had a chance to really wrap my mind around that whole concept just yet. I just know some of the basics of that, so hopefully I get a chance to experiment with that a bit.
Gunnar Morling: Yes, and I mean think this is really where it just goes to show Java has a certain age, right? Because adding this after the fact, it's really hard, maybe it feels like bolted on. Whereas I guess now if you were to start with Java today or with any language today, this always will be part of the type system, right? I mean, if you do a strongly and statically typed language anyways, so you would just, by definition, you wouldn't have nullability concerns like that. But Yes, I think it's going to be an improvement to Java.
Michael Redlich: Oh, absolutely.
Ixchel Ruiz: We are still trying to attract new developers. So yes, for the people that have been working on the pain points of Java, yes, sometimes this new simplifications, for example, for the pattern types and primitives and things like that, feels like, "Huh." We even look at them with curiosity and a little bit of skepticism because we have been living without them for so long.
So suddenly they appear and we are like, "Oh, I didn't notice that I was suffering from this." The same thing with nullability. But the whole point is for us, it may feel a little bit strange, but the hope is that the new developers that are joining us in the Java world, they are adopting them without that sense of this is alien.
Gunnar Morling: Yes, no, I think that's a great point actually. I mean, yes, we tend to get stuck in our bubble and Yes, we take so many things. Of course it's like that, but actually, if you talk to a new person, to a person just setting out with their career in the space and you tell them, "Yes, just go to Maven Central and download your dependencies from there, and then you have everything in your class." I was like, "What? What are you even talking about? What do you mean Maven Central? What does all that mean?" So I 100% agree there are so many things which can and should be improved around that for sure.
Michael Redlich: There's so many things going on in the Java space. I personally think it's a great time to be in the Java community.
Gunnar Morling: Totally.
Python and its Impact on Attracting Young Developers to Java [23:08]
Michael Redlich: Not only for the awesome people, but the language itself. And then you had Java EE being donated to the Eclipse Foundation, you have Jakarta EE, it's open source, all these new APIs. And, Ixchel, you had mentioned about getting students or new developers involved and the implicit main.
Gunnar Morling: Oh, [JEP] 495, Simple Source Files and Instance Main Methods.
Ixchel Ruiz: Yes, simplifying the-
Michael Redlich: Yes, yes. Which has been renamed about four times, too. So this is a great thing, too. This is something that Brian Goetz, I think initiated whenever I mention it, I always put a link to his white paper on that. And this is great because Oracle has the Java in Education initiative and the Garden State Java User Group, for myself personally, we are involved in that. We have our meetings at a university, we have a lot of students at our meetings, and some of the folks in our leadership team have gone out to high schools and getting presentations there.
And there's other jugs that are involved too, and other folks. I think this is a great thing for getting young developers involved because Python seems to be the new shiny object, right? Java's been around almost 30 years now. So why would you want to get involved in Java? And some of the things we've done, too, is have lightning talks by the students at the university where we have our meetings and we usually, we've done it twice now, we have three students, they have 15, 20 minutes each, and then we have a main presentation after that.
So I think this is hopefully with our part, along with everyone else, hopefully we'll get some more Java folks, young developers into the Java system.
Gunnar Morling: Yes. And I mean, I think an important angle on that also is again, like this API and library ecosystem around the language, right? Because Python, I would say arguably one of the, or the reason why it's so popular these days is because it has such a strong ecosystem of libraries in the AIML space, data science space. So people in those domains, well, Python is the natural choice because it has all those great libraries for that. I mean, Java, we have things like LangChain4j, and there's other things around it, but feel that there is space for improvement.
So that's why I also think it's so important to also make sure we don't fall behind when it comes to those things. Because at the end of the day, people won't use the language because they like it or not like it. They will use it because they have a job to do and well, they will use the language which lets them do that job and the ecosystem which lets them do the job as good as possible, I think. Ixchel, you seem to have something to disagree with that.
Ixchel Ruiz: No, no, and this is your point of view. And this is the part that I like to tell people, yes, Python is really popular right now because of models and it was adopted by the scientific community very fairly easy because it was so, again, repeating myself, so easy to jump into it. It didn't have a lot of the ceremony that Java requires. Having said that, Python is not going to give you the functionality that you require in an enterprise application.
Python is not going to support thousands of millions of data points or requests. It's not because it was not thought in that regard. And this is what I keep telling people, you will have to train, you may need to train your models in Python, because that's where you have all this functionality. But once the models are trained and you have to provide the services to your users, you will probably have to, or you will choose a programming language like Java because it's the one that is going to give you the performance.
It was the one that is going to give you the reliability and the stability for building applications that are there to last. Yes, Python seems more popular when you are training, when you are trying new things, but in the evening you probably will return to ... [Java]
Michael Redlich: It's interesting. Just a quick story. So I worked in a research facility with Exxon and a lot of PhDs and everything was all fuel or oil research, the whole thing. And a lot of the PhD folks, the young ones, they came in having learned Python. And the IT folks, there was hardly any support for Java at our facility here in New Jersey. It was in Texas. So there was support for Python.
And then the company offered a six-day Python course back in 2018, which I took. You had mentioned before, Gunnar, about scientific data or libraries and the data frames and all that. And I was really impressed with that. So I can see why the younger folks did it. But then Ixchel, what you said as far as building an enterprise application with Python may not be so easy or almost impossible either way.
Gunnar Morling: By the way, I don't mean to make it like Python is bad or whatever. I mean, they all have their different pros and cons, right? I mean, we are one big tent for everybody.
Ixchel Ruiz: No, at this time in our lives, we cannot be monologue, we cannot only use one language. So we're in a term that you must be polyglot, but that also gives you the possibilities to pick and choose which one for what time, and what are the limitations and the boundaries that you have to keep into consideration. Well, not in my case, I am not the one working in that particular organization, but I see it in the same way. I have a friend and he is a Java developer working a lot with Python nowadays because he is in the IT department of the weather organization here in Switzerland.
So again, it's the same experience. A lot of the PhDs are writing their models, part of their teases, and they bring these very interesting models into the organization to predict the weather better. But suddenly in the last months, it's their job to put it into production. And that is when he has to come and in a way teach them all about software development, best practices, and how do we put this into production? And it's this real interesting clash of the worlds. That's why I said there is a clear boundary in terms of "How do you put this into daily life?"
Gunnar Morling: And actually that touches a little bit on one project I'm also very excited about, which is GraalVM in general. And I mean, there's so many angles around it, native binaries ahead of time compilation, but also there is this very strong polyglot angle around it. And actually there is GraalPy think that's what they call it. So the ability to run Python applications, including native binaries, because I mean, let's be honest, all or most of the exciting Python libraries, they are actually fronts for C libraries and C native code.
When this is actually able to be integrated into a Java application using a GraalVM, I think this could be super interesting. Also, for this stream processing space I'm working on. So for Flink, the framework which we use there is PyFlink. So this allows you to integrate Python into Flink jobs. And right now, this actually requires an inter-process communication because, well, Flink runs on the Java Virtual Machine, and then if you have your Python user-defined functions, they would run on the PythonVM. Now, if you can actually integrate all this onto the JVM using GraalVM, I don't know, this would be really cool, I think.
Michael Redlich: Yes. And just with the recent release of GraalVM for JDK 23, GraalPy and GraalWasm, WebAssembly, were elevated to, for lack of a better word, first-class citizens now. I think they graduated out of their experimental phase and now are more for production use. Yes, so that's-
Gunnar Morling: That's amazing. I can't wait to get my hands on that.
Michael Redlich: Yes, one of the things I wanted to do before I left Exxon was to provide a one-hour talk on GraalPy when it was in its infancy. But Yes, I never had a chance to do that.
Gunnar Morling: So now you can take back as a very, very well-compensated consultant and do the talk for them.
The One Billion Row Challenge [31:55]
Michael Redlich: Exactly. So I wanted to touch on The One Billion Row Challenge that you initiated back in January, and I thought that was... Yes, I write about it in the trends report.
Gunnar Morling: Oh wow, okay.
Michael Redlich: And you presented it at InfoQ Dev Munich, and then Karsten Silz, who's one of the editors in the Java space wrote a news item on that. But Yes, I thought that was a lot of fun. And they had 164 entries.
Gunnar Morling: Right, yes.
Michael Redlich: So tell us more about that.
Gunnar Morling: Right. Yes. So how much time do we have?
Michael Redlich: Not long, we can get up.
Gunnar Morling: Do we have two hours left? No, I'll try to keep it brief. So yes, what was it? Well, I mean it's pretty much in the name. The task was to process a file with one billion rows as quickly as possible. And the file, it was measurements, random measurements of temperature values keyed by weather stations. So you would have, I don't know, New Jersey, 20 degrees Celsius, whatever that's in Fahrenheit, I don't know, Hamburg 15 degrees, Basel minus 10 degrees, I don't know. And you would've like 1 billion rows of those measurements and you needed to calculate the min, the max and the average temperature value per station. So that was the task.
And I created this as a challenge for the Java community. I published it on January 1st, and then it ran for the entire month of January. So people had one month to work on it. And yes, it kind of went viral, I guess. So people really took it and went with it and put lots of effort into optimizing that. And it was a huge amount of work, which I created for myself because I didn't really think many people would participate. So I didn't prepare any automation for that. So whenever somebody sent me their pull request, "Hey, this is my new and improved implementation," I actually had to go to my machine, which I had rented in the cloud, run it and run it five times because I wanted to eliminate outliers, take the outcome of the program, put it into the leaderboard.
And so it was pretty much my job during January. So big shout out to my employer to let me do that because it really came [as] a bit of a surprise for everybody of us. But Yes, so I mean maybe why did I do this challenge? I should say that. Well, it comes a little bit back to what we spoke about before.
There are so many new APIs and new features and new improvements in Java. I mean, I don't know about you, but I actually have a hard time keeping up with all the new things because every six months we have this huge laundry list of new stuff. And so I felt this would be actually a nice vehicle to, well, put some of those things at work and see what they can do. So things like garbage collection turns out we can disable garbage collection because we can write this program in a way, we just don't need garbage collection because we don't allocate at runtime really after startup phase. So that was an interesting thing to learn. That is this Vector API, which we spoke about.
So this sort of SIMD instructions, single instruction, multiple data, this sort of problem, 100% benefits from that. And back before the Vector API, you just really couldn't use those CPU instructions, whereas now actually you can benefit from them. So again, just a nice way to learn about this API and put it into production. Native memory, quite a few other APIs, which people tried out, different distributions, of course, GraalVM native binaries. Because I mean, when I started this, my first implementation and it was implemented in a I guess naive way.
I just wanted to show people this is the program and this is what it's supposed to do. So this ran in five minutes, give or take, and people on my evaluation machine, which had 32 cores, 64 threads with hyper-threading, it would process this file with one billion rows with a size of 30 gigabytes. It would process this file in 300 milliseconds. So I know this was really incredible. And of course, if you get down to such a short amount of time, 300 milliseconds, well then the JVM startup is in a significant time of that.
So if you are able to bootstrap your application in four milliseconds, rather than a hundred milliseconds, this is significant. If you are at a range of values, and this is why most people actually use the GraalVM native binaries, other people use GraalVM as a JIT compiler because you can also use GraalVM as a JIT compiler on the JVM. And in this case, the workload benefited from that. So people did it.
So tons of learnings, and this was what I wanted to get out of this, so I wanted to learn something new and see what's doable, but I also wanted for others to learn something new and inspire each other and do this in a collective way, which is why this was actually set up in a way that people could collaborate and take inspiration from each other and see, "Oh, I see the person over there. They do this amazing trick, so let me also employ it." And, of course, I shouldn't take Ixchel’s implementation and just say, "Hey, that's mine, right?" That makes sense. But if I see you do something cool and I can add this to my implementation, this was totally fine so people could learn together.
So there are many more things to say about it. But this was, I would say in, I don't know, five minutes or whatever the gist of it. And yes, there is the talk which I did at the InfoQ Dev Summit, and there's the recording so people can learn more by going to the talk and watch it.
Michael Redlich: Awesome.
Ixchel Ruiz: Just to mention, and you should prod Gunnar later on, it was not only made in Java, there were other programming languages with very, very interesting solutions, too. There were sometimes very creative usage of the languages. Sometimes they were very extreme ways of forcing the programming languages. And that also provides us with some ideas of…you can do marvelous things with programming languages and you can force it sometimes even beyond their normal use. And one of the things that I loved about the Java solutions is that it remained.
Yes, of course we can disable the garbage collector. We could do some stuff that you were like, "I shouldn't do this at home with my own projects." It felt like cheating sometimes. But the point is it shows us all the capabilities and also how in a way, some of the solutions stayed true to the syntax and the features of the Java programming language. We didn't need to cheat that much.
Gunnar Morling: Right. Yes. And people actually tried to cheat and take advantage of everything you do or what's possible. So one of the things was, so Thomas Wuerthinger, he is the lead of GraalVM at Oracle, and he seems nothing to have to do because he spent lots of time on his implementation of the challenge. And I mean he also won, I should say that in the end. And he also made other people at Oracle also spend much time on the challenge. So it was very interesting to see, and by the way, I should say one of them, a person from the community, actually now got hired into the compiler team working on the GraalVM compiler because they showed their talent in the challenge. So this was really cool to see.
But what I wanted to say is [that] Thomas figured out at some point is that unmapping the file from memory would actually take a significant amount of time because most of the implementations at the top of the leaderboard, they would memory map the file in off-heap memory. So it's like they can manage it by themselves. And again, just like this new Foreign Function & Memory API, which people could leverage for that. And now, what you realize, so the program was done, we have created the result, it has been printed on standard out, and then the program continues to run for 500 milliseconds or so, half a second before the process completes. And as he found out this was because, well, it just takes so long to unmap that file from the memory.
And now you could say, "Okay, that's the run time of that program, so those 500 milliseconds, they should be part of your final result time." But he said, and he made the case, "Well, we have printed out the result and this is what is interesting to people and this is where we should stop measuring. And now if we can find a way that we don't have to account for this time to unmap the file, how about that?" And then he found a way, "Well, if we spawn up a separate process and we move the costs and time for unmapping the file to that other process, well then our first process is finished and it has shown the result and we have taken those 500 milliseconds out of the equation." So he came up with that and I was like, "Man, this feels like cheating, but is this all right or it's not all right?"
But then I felt, "Okay, what's the spirit of that?" And yes, I felt the spirit is, "Okay, I want to have the result. And okay, this is actually a valid solution. And actually also the Chrome browser does similar things in certain situations." So it was one way to speed up the result. So I allowed it. So we said, "Yes, okay, this is legit or valid." And then, of course, other people adopted this. So again, coming back to the learning inspires experience, and this is how all this went all the time, taking advantage of whatever they could.
Michael Redlich: I think we need to do more of this. Like you said, you didn't expect the response that you did when you created this. And I think it's just a sense of community.
Gunnar Morling: Absolutely, yes.
Michael Redlich: It's maybe a hundred billion road challenge or something like that.
Gunnar Morling: So people keep asking for that. So Yes, what is going to be the challenge next year? And actually one person said to me, "So next year [in] January, I will take a week off so I can fully concentrate on my implementation of the challenge." And I was, "Look man, I'm not even sure whether I want to do it again next year or not." But by now, actually I have decided I won't do it next year because actually I'm still recovering from doing it this year.
I still have to heal. And also frankly speaking, I don't really have the good idea. I think it should be something really new and cool, which is equally exciting to people, not just same same, but different. So maybe I will do it after two years in [20]26, but I won't do it again next year. So that's my current thinking.
Final Thoughts [42:49]
Michael Redlich: All right, why don't we go around the table and have some final thoughts on the Java space, what people are talking about, what's down the road in the Java or Jakarta EE [space] or stuff like that.
Ixchel Ruiz: Well, the question is what is new right now or what will be new?
Michael Redlich: Yes, I guess final thoughts on anything really in the Java space that you'd like to see or you'd like to be a participant in it or anything like that?
Ixchel Ruiz: Okay. There are some features that I'm already using from Java 23. I like to be an early adopter, so I have seen some improvements in the performance, so I'm real excited about that. As I said, some of the features, you can use new APIs and then you feel like you are programming in a modern world. But even if you are not programming in a modern world and you still want to upgrade or update your release version to Java version, your JVM or JDK version, it's worth it. What I'm excited about [in] the future is the next release, Java 24 because it's one of the largest.
So I think even if it's not an LTS, it's going to bring a lot of attention into the whole concept of the release process and their release cadence. And that is always important because as I said in my mind, there is no doubt that Java, it's one of the best languages to pick. I would still pick it as my first programming language. As I said before, it's not about just learning one programming language. It's not feasible at this moment, but it's one of the most solid decisions that you can make. So that is exciting for me because we are trying to reduce the complexity for the newcomers.
Even if there are C SUN developers or people already with experience in other areas or other industries [who] are coming to programming languages or computer science, Java, it's going to be a good opportunity for them without a lot of the issues that we had 20 plus years ago when we have first adopted it. Other things that I like about Java and the different versions is that we backport important things, for example, for security and things like that. And we can talk about the importance of doing that because for many, many reasons you can upgrade or not.
But my encouragement to people would be to upgrade as soon as possible, try it. Sometimes it is worth it just by the improvements in the performance that you will gain. Even if you don't adopt the new syntax or the new APIs, it is still worth the try. And we always need feedback. The whole point of having JEPs is they are not written on stone. So if you try them and some of them go against what you expected to happen, this is his time to go back and talk to the people behind them.
Michael Redlich: Gunnar, final thoughts?
Gunnar Morling: So I think the overarching theme I am really excited about is that Java becomes a viable option for many things, as I mentioned, where you may not have considered Java that much in the past. And so just coming back to this Vector API, I think that's a great example because if you wanted to use those SIMD instructions before the Vector API, you actually would have to use a language like Rust or C, which is just a little bit closer to the metal. And then you were able to, well specify, "Okay, those are the instructions on a CPU specific level, which I would like to use." And then you could do it. But of course this really meant, okay, you were programming for this specific CPU architecture or even version like Intel or ARM and the specific instructions at extensions.
And in Java, well, we didn't really have the opportunity to do that, the compiler sometimes with auto-vectorize code, take advantage of those instructions, but it wasn't something which we could consciously do. Whereas now with Vector API we can, because this API gets compiled down into those SIMD instructions, and I feel suddenly actually we are in a better spot than maybe in other languages because, well, this happens automatically and transparently for us, right? So I can write vectorized code and then the JVM will make sure this gets compiled into the corresponding instructions on ARM or on Intel or whatever other CPU may support those instructions.
Or it would just be executed in a scalar way if there is no SIMD support. So I think this is in continuing theme again, also the sub-millisecond pause times, again, something which we just couldn't do. And I think there's more stuff also coming up with Valhalla, the compact object headers and so on. So Java really becomes an interesting option for more and more problems where we didn't really consider it maybe before.
And touching on what you said, Ixchel, this is I think a great point. Yes, upgrading, it's like a vital thing. And what I would recommend is because sometimes people may have a hard time to argue why they should upgrade and what do we gain? And you want to use those fancy APIs, but we really should take the risk for that and let's not do it.
And I think the right way to argue for it is always maybe take a look at those JVM improvements. Because if we can make the case, yes, we have better tail latencies, we have better throughput, so maybe we need less CPUs, less machines to run the same workload. So this actually goes to save money. And I feel like whenever you talk, people who pay to check in the end of the day, they like to hear those sorts of arguments, right? And so I feel like this can be a very good avenue to talk about those things.
And then by virtue of doing that, you will also get to benefit from those APIs, which you desire to use. Bottom line is I think it's a very exciting time to be a Java developer. And I don't know personally, I cannot wait to see what's coming, 24, and then also of course all the subsequent versions. We're looking forward to it.
Michael Redlich: And something you touched upon, Ixchel, is [to] talk about JDK 25 is going to be the next LTS and look at 24 with 21 JEPs and counting, I fully expect Structured Concurrency to be part of JDK 24. It'll be a fourth preview. So maybe, and this is just speculation on my part, maybe a lot of these preview features might be final in JDK 25, but that's just my theory. But for me, with my weekly news roundup that I do for InfoQ, I follow on the whole weekly updates on the JEPs themselves among other things.
And then, since I'm involved in Jakarta EE, I'm a committer on Jakarta Data and Jakarta NoSQL, and I also just joined as a Jakarta EE Future Directions group. We just got started off with meetings, so there's a lot going to be happening. I'd like to see some AI development, whether it's in the MicroProfile space or in Jakarta EE. I think that might be exciting down the road, but we'll see what happens.
Gunnar Morling: Yes. By the way, there is one thing I need to say because I know my good friend, Nicolai Parlog will yell at us. So he will say, "Java 25 is not an LTS release because LTS just isn't a notion of OpenJDK itself. There are just random build providers, which happened to align on that version to provide LTS, but technically that's a very important difference." So I just want to say that to make Nicolai happy.
Michael Redlich: Yes. That reminds me, I think I've heard him say that in one of his podcasts or video screencasts or whatever. Yes.
Gunnar Morling: We fight over that all the time. Just like last week we met in the Netherlands at J-Fall. We decided to discuss this over beer because I look at it a little bit differently. So it's an ongoing fight between us on the topic. You should have him on the podcast and he can give you his perspective.
Michael Redlich: He presented at the Garden State JUG just as we were starting to get back in-person meetings, but he presented remotely, so I'd love to meet him and Adam Bein, for example.
Gunnar Morling: Oh, Yes. Yes.
Michael Redlich: Yes. So thank you very much, both of you for taking an hour out of your day or a little bit more, especially now that it's what…almost nine o'clock at night by you. But Yes, no, this was great. This was an absolute joy to have a conversation with you on this.
Gunnar Morling: Likewise.
Michael Redlich: And I hope to see you in person at an upcoming conference one of these days. And also to our InfoQ readers, if you are out at QCon San Francisco, QCon London, one of the other QCon events, please stop and say hello and thank you very much.
Gunnar Morling: Thank you for having us, this was fun.
Ixchel Ruiz: Thank you.
Mentioned:
- Java 24 to Reduce Object Header Size and Save Memory (Ben Evans)
- Null-Restricted and Nullable Types for Java (Ben Evans)
- JSpecify 1.0.0 and Nullability in Java (Ben Evans)
- Project Lilliput
- The One Billion Row Challenge
- JEP 404: Generational Shenandoah (Experimental)
- JEP 450: Compact Object Headers (Experimental)
- JEP 454: Foreign Function & Memory API
- JEP 495: Simple Source Files and Instance Main Methods (Fourth Preview)
- JEP 445: Unnamed Classes and Instance Main Methods (Preview)
- JEP 489: Vector API (Ninth Incubator)
- JEP 465: String Templates (Third Preview)