BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Paul King on the Groovy Ecosystem

Paul King on the Groovy Ecosystem

Bookmarks
   

1. Paul, can you tell us a little bit about how the Groovy language has changed over the last few years?

Yes. I guess Groovy is being used in more and more projects recently so there has been some exciting evolution of itself, but also quite a large number of frameworks and projects based on Groovy that have been sprouting up. In the language itself probably the biggest thing that happened recently is the introduction of what is called the compile-time metaprogramming, also knows as AST macros or AST transformations. So that is where the compiler can start playing with the abstract syntax tree that it builds up for the language. You can actually now write transformations that take the tree and transform it in some other useful way at compile time. So there are some very useful things you can do in terms of flexible language structures, writing DSLs and so on when you have that capability.

So that augments some of the built-in runtime metaprogramming features that have always been in Groovy. So that is a really cool thing that happened over the last couple of years in the language itself and that is actually been an enabler for some of the other frameworks, frameworks like Spock which is a testing framework, Easyb another testing framework of the same sort of nature and some of those frameworks leverage that capability; so that's been a key thing. There have been more frameworks as well, which is kind of an exciting thing. So Grails has been around for a few years now is this sort of premiere web framework in the Groovy space and we’ve now got Griffon which is what Grails is for web applications, Griffon is for desktop applications, so Swing applications.

So that is a new framework. We’ve got GPars in the concurrency space, so now for all your concurrency and parallel needs there is a nice library that handles all those things and Gradle is one of the very exciting build frameworks that is around now. I have just come out of a talk on Gradle. There are a lot of exciting things happening there. In my customer space, the enterprise space, one of the big pain points is the builds, and that is really good news. Another pain point is testing and that is where Spock and Easyb and some other testing frameworks that are around can help solve a lot of pain as well. So there are exciting things on a number of fronts. And probably the other thing that also happened is maturity of the tool support around Groovy.

It’s not perfect yet, there are still a few things I’d like to see improved, but we are now getting very good IDE support. So when you have Eclipse and IntelliJ and Netbeans they have all got pretty good support. When I am in IntelliJ, even though Groovy is a dynamic language, it's doing full static type inference. so the same smarts that it would be doing on the Java code or Scala code, it’s doing on my Groovy code, but instead of turning something a red squiggly it turns it yellow and says: "This looks a bit suspicious. You are trying to assign a date to a number. Are you sure you want to be doing that?" And if you have got some weird and hoopy metaprogramming and play it might be exactly what you want to do, but a lot of the time it's catching all the kinds of errors that static typing would give you right there at type time. So that is really good.

And augmenting that, other tools as well, we’ve got CodeNarc is a sort of CheckStyle for Groovy; in a dynamic language you can't do quite as much static analysis but it will give you warnings, the same kind of things that CheckStyle or PMD would give you, CodeNarc gives you for Groovy and it’s very easy to write your own rules. So you can start doing code style. So if you are looking at doing all the software engineering practices of test driven development, making sure you’ve got very good code style, making sure you don’t have much duplication. There are tools like Simian that have got very good Groovy support these days. So a lot of the things that you’d like to have when the dynamic languages initially came out weren't around you would kind of throw a lot away to get some of the bells and whistles that are in dynamic languages, a lot of those things are now in place.

So there is actually a pretty good story emerging in terms of not only being productive, which the dynamic languages give you, but also producing high quality outcomes, applying all your professional practices is quite easy to do now in Groovy.

   

2. Groovy was one of the first languages that was implemented on the JVM, apart from Java itself, and as the number of languages has grown on the JVM it seems that there is a lot of communication, a lot of sharing between those languages and also the JVM itself has been updated to be more friendly for languages. How has this helped the Groovy language?

I guess there is a number of different things we could talk about there. The fact that there are now a number of healthy languages on the JVM, there's a difference in popularity, but we’ve got Jython, JRuby, Scala Groovy and so on, Clojure is another recent newcomer. The fact is there's a bunch of them there and they have got various strengths is actually pushing the industry to accept that maybe a polyglot system where you do make use of some of these new languages is quite an acceptable way for software engineering to develop, whereas in the early days of Groovy there was big resistance from very conservative Java shops, "Oh we can’t be using Groovy, we’ll wait until closures are Java that shouldn’t take very long, should be in Java 1.6.2 which is only a few months away surely".

So there was a big resistance to using the new languages, but now there are many features, each of those languages has different features and depending on my problem domain, people aren’t so scared anymore to be when I have a need where one of those other languages meets that need, I can quite easily go out and use that. And you can see that coming into the framework, so Grails and Griffon and Gradle, some of the frameworks I was mentioning before, they have got very good polyglot support. So you want a controller in Scala, go ahead and do that. If you want to leverage the Clojure persistent data structure libraries by all means go and do that in your Groovy code and there is a lot of gained in that polyglot world as well.

And I guess some of the things that you are alluding to, things like invokedynamic and other things that are happening down at the JVM level. We are going to start seeing some differences that they are making, in some of the other languages that are on the JVM, so in some of the emerging versions of JRuby and there is some Perl on the JVM and some other things that are making heavy use of those facilities. Groovy is only just starting to make use of those kinds of facilities and it helps us with some things, but not as many things, because of the way we do things, but as those things mature and become available on people’s platforms you’ll see more and more of that start to emerge and Groovy be leveraging those features as well.

It will make some things that we take quite a bit of work to do now, be a lot less work for us and some cases will get some good performance improvements, but there will be other things where we don’t gain a lot and we’ll still have to do some stuff ourselves. But that is OK, that is all a good story that JVM itself is evolving.

   

3. Of the languages that you mentioned which run on the JVM, Groovy seems to have the most Java-like syntax. Do you think that Groovy starts to serve as an effect, a gateway drug for other languages on the JVM?

I guess it could do. Groovy’s always targeted itself of all the alternate languages on the JVM is targeting itself as the one closest to Java. It wants to keep those ties with Java as close as possible. And that has both pros and cons and I guess one of the pros is that if you are a Java programmer you are a Groovy programmer, cut and paste 99.9% of code is just going to run out of the box. And then you can slowly add Groovy idioms if you’d like. And I guess once you start leveraging that capability of there's things I can’t do in Java: "Hey look if I got over to Groovy I can have closures and a domain specific language support and concurrency support and all these other things today, I don’t have to wait for Java 8 or 9 or whenever we are going to get some of those things".

Once you say: "Wow, look at the world that’s opened up to me" then it’s much less of a jump to say: "Oh, I’ve also got Scala and Clojure and JRuby and all these other options as well". So in some sense it does open up the doors to other things as well and it’s part of the acceptance of those alternatives on the JVM, I think Groovy has helped pave the way for that but it’s certainly not the only thing that is changing people’s minds about these other languages.

   

4. What types of problems, challenges, solutions is Groovy best suited to?

It’s easy to be flippant about it and in most scenarios that I get presented with I can see a Groovy way to solve many problems. So it’s easier for me sometimes to answer when's Groovy not suitable, maybe that's a much smaller list. I guess there are strengths and weaknesses of all the different languages, so if you look at Scala or Clojure they are kind of opinionated about how you might go and do concurrency or parallelism, so if you are going to be doing lots and lots of work of that nature then maybe it might be easier to do that in Scala or Clojure. They will, in some sense direct you to develop your code in such a way that those things will be easy, whereas Groovy doesn’t have a sole purpose of supporting just concurrency or parallelism.

So it offers you a menu of options and you might start using Groovy and adopt some legacy practices from your Java heritage, which then make using parallelism and concurrency not as nice as you’d like and Groovy won’t stop you from doing that. But on the other hand if you are familiar with how you do concurrency in Java you can take that on board and take that into Groovy and still do things in a very similar way that you might do, but with libraries like GPars give you a nice little sort of concurrency DSL if you like, to make it easy to do that. Then if you want in Groovy you can start adopting some of the more concurrent parallel idioms, so you can start using actors if you wish, data flow if you wish and there is a menu of options that GPars gives you.

Software transactional memory is another option which currently isn’t in the release of GPars but might be in there in the future. So you’ve got a menu of options and I guess if you are comparing that to Scala or Clojure, they will give you a much more limited set that are purpose built for concurrency, but Groovy is going to give you more options and in some sense there is a burden there on the developer to know which of those options to use. So you’d want to get some guidance as to "when should I use actors?", it's not a given that you want to use actors all the time there will be a number of different options available to you and sometimes actors will be good, sometimes data flow will be good and sometimes you will be forced to go down to the lower level stuff and use some of the primitives that Java gives you, as well and so there’s places for all that.

So where does Groovy fit in, I tend to, when I go in to costumers, I tend to look at the context in which I am going to be using any language, any toolset and depending on the skill set of the developers, depending on their background, are they Java developers, have they ever done any functional programming, have they seen other languages before, what tools are they used to, have they only done stuff in a particular IDE and so on and depending on the context also actually I can see some really good ways in which Goovy can help you, other times I might say even though I can see lots of ways Groovy could help you maybe we could just stick to Java or maybe we should look at some of the other JVM alternatives as well.

So I think that’s partially answered your question. So groovy is very general purpose and there are a lot of scenarios that you can use it. Its downsides are is it’s not tailored to just do concurrency or just do parallelism, it’s not purposed built just for speed, so there are times when you do something in Groovy that might be a fair bit slower than you might typically do in a language like Java with the static typing. Now if you are smart, you can speed Groovy up and usually it’s not a big issue. I don’t have a single customer where the speed of Groovy has been an issue. I can always tune and tweak things to the point where the productivity gains of Groovy far outweigh any performance improvement.

There is quite a widespread scope for where Groovy might fit, but depending on the context, if you’ve got a niche area then there might be alternatives that are better.

   

5. You had mentioned the idea of concurrency with relation to Clojure and Scala. How does Groovy help you to write concurrent code?

There are a lot of answers to that question. There are a lot of bits and pieces. I guess if you think of all the different ways that you might try to write concurrent code between Groovy itself and the GPars library which is the incumbent concurrency and parallelism library or the most favored one, I guess, in the Groovy world, between those two they give you a whole menu of options. So if you are doing things concurrently, I guess there are a number of different approaches you can take to solving things. So if you are trying to have a shared state in a concurrent world that can be an issue. So there are things that Groovy does to make it easier to use, libraries that are built into Java in a concurrent way, you can take a collection and make it immutable and, for a certain class of problems, that makes is easier to solve certain kinds of problems.

Other kinds of problems might be best suited using something like actors, where you are doing message passing and GPars will give you a solution for that. There is a very nice style of solving problems called "data flow" which, back when I was doing my PhD, was a really hot topic and then it sort of disappeared and we found out that even though the concept was really good we found it very hard to actually solve things. It’s now making a reemergence and the GPars library provides a very nice data flow solution and a lot of the kind of programming that we see in traditional Java or C# is very imperative. The How, not the What. Whereas things like logic programming or data flow programming are very much the What not the How.

So I want these two numbers to be added together, not "Please add this number to this number". So data flow for a certain kind of problems is a very interesting and nice way to of doing things; it removes the whole lot of issues with race conditions and deadlock and livelock and a bunch of other things that you might face. Some of those can be totally removed using data flow, well maybe not totally removed but they're statically detectable when you have got a deadlock. To give you an example, if I have a data flow expression that says: "X becomes Y" and another says "Y becomes X"; this is obviously a cyclic dependency there and it can never be solved, so that’s a deadlock.

But the thing is I can detect through static analysis that that is a deadlock and if you are solving problems with data flow then you can write your unit test or use your static analysis tools and I’ll tell you the stuff upfront, but yes, there is a problem with that code and that kind of thing is much harder and you could certainly derive theorems about code and try to come up with a proof that there is no deadlock there or there are no issues, but it very difficult and it takes a lot of work, whereas data flow style languages are amenable to that kind of thing. And I guess, in the whole data flow area, there is another class of very similar sorts of languages, what is known as communicating sequential processes.

So CSP, Tony Hoare, again many years ago came up with a nice way of expressing how processes can talk to one another and there is a CSP implementation embedded in GPars and in some way it might look similar to actors or it might look similar to data flow, but it’s got some of its own unique properties which for a certain class of problems is an ideal way to solve that class of problems again taking a lot of burden off the programmer, "Do I have to go and prove that… I've developed my own threading model, I’ve got my own locks, how do I prove that I’ve got consistency in the way of set things together? I go an run my program everything runs fine and then I put it under high load and it’s not working anymore; "what is going on?" and things like CSP allow me, for a certain class of problems, allow me to design my systems in such a way where the system is helping me to come up with things that are not going to fail.

So there is a whole menu of options that are available between Groovy itself and the GPars library that allow me to do a whole lot of things that I want to do for concurrency and parallelism. So there is the @Immutable transformation built into Groovy, that is the ability to make collections immutable, there is the ability to use meta programming to intercept any attempts to change a collection for instance. So using meta programming I can say anyone tries to change this particular thing, even though I am inheriting a Java library the designers didn't cater for concurrency, they didn’t make things immutable, but I am stuck with that because it’s been in the system since year dot and they don’t want to change that part of the system; with a bit of meta programming I can set up a little interceptor, if you’d like.

So it’s very much like aspects or something, where I can say any attempts in this part of the code to go and change those structures, let’s not let that happen any more. And that can be very useful to, as a poor man's way of getting into this concurrency world, but bring some of the legacy classes with you. So there is really a whole suite of options that are available depending on what the particular needs are. It’s a very interesting area and there is going to be more coming down the pipeline. So software transactional memory is another one that is again for a certain class of problems, it’s an ideal way to express how you want the "state information", if you like. It’s all well and good to say 'Ban state', but depending on the level of abstraction you are at, the state might be a very good way to represent a particular system and to implement that particular system.

Things like software transactional memory allow you to express your system in such a way that you combine your idea of state in a way that allows you to evolve at state over time and get consistent snapshots of what is happening in my system at this point in time. So that is not built into GPars at the moment; there are some Java libraries, Clojure has got some libraries that are helping us come up with good solutions in that space and in the future we’re likely to see that kind of thing build into GPars as well. It’s a very exciting time so this should be good.

   

6. As software developers we’ve gotten pretty good about reasoning at how a sequential program is working like it does A, then it does B, then it does C, but one of the great challenges with parallel computing is that it can be very difficult for a software developer to try and figure out just what is going out in that system and to understand how it’s really working and how can we have a more intuitive approach to parallel programming so that we as developers can effectively reason about what is actually going to happen based on the code we’ve written? How do we make that more intuitive?

I presume by sometimes difficult you mean right impossible. I remember in the early days of doing multi threading, back in C++ land, that you would go and write your program and it will almost work, but there will be some thing quite wrong. So you throw it into the debugger which of course knew nothing about concurrent threads and all of a sudden everything just stopped working, all together and kind of all the sudden: "It was almost working and apart from one case" to "Nothing's working at all" because the debuggers in the early days didn’t even know about multiple threads and just by interrupting it you basically threw everything off and so on. So we’ve come a long way since then, the debugger is now smart enough to know about multiple threads and so on.

Maybe the multiple threads idea was not the best approach to start with, but even in the early days we kind of looked at alternative approaches. So if I express anything in a data flow language or a logic programming language, some of the 4GLs, they were all attempts to try to work out what is the best way to express this problem domain because quite frankly it’s a very hard problem domain to think about. And I think we're still in very early days even though we’ve had the same discussions 20 years ago and 10 years ago and so on, I still think we are quite in our early days of actually trying to solve this.

And we don’t yet have the GoF book on parallel programming patterns, the way you can say: "For this class of problems here is the obvious way to solve it, but it’s going to be intuitive." So we are kind of still exploring all our options and in some sense that is what GPars is doing; it’s giving you a menu. And in Groovy I can do a foreach in this collection, do something, so GPars says one of the ways I'll let you solve that is do each of these things in parallel and sometimes that solves my problem very elegantly, other times all hell will break loose because the thing that I am going to be doing in parallel hasn’t been designed to be thread safe. So there is a certain class of problems that can be solved really trivially and it could look just like my sequential program.

And at the moment there are other classes of problems that are not like that at all. And we’ve actually got a whole lot of legacy issues with, you dive into GUI programming and everything must happen and there is a certain thread with a special privileged role inside of the GUI and you are going to make sure you switch to the right thread to do things and so on. So we’re long way from being where these things are going to be simple, but I think the best approach is to start trying look at the context in which different kinds of solutions work really well. And that is what I like about the GPars framework at the moment because it’s giving me the actors solution to the problem for a certain class of problems that turns out to be a really good way to solve things.

But from many of the problems that I face it’s actually not even the predominant best way to solve things in my experience. So maybe there is 20-25% of algorithms that work really nicely when using actors. There are other ones that work really nicely with the parallel each that I was talking about before. There are other ones that work out best using data flow, communicating sequential processes, shared transactional memory, so depending on the circumstances, do I need the transactional behavior, do I actually have things running in multiple processes, across multiple machines, what’s the story? Depending on the context we are still looking at the best way to solve things.

So for me the best thing we can do is look at classes of problems, try to solve them in a number of different ways and note that in this context this solution works out really well and try to capture that knowledge in something like the GoF books that is the best thing we can do at this point in time. And maybe if there is a similar interview in a year or two this time, we’ll be able to say: "Now we know this kind of thing works out really well with software transactional memory/ this one is the best for actors/ this one is the best for data flow." And those sorts of things will be common knowledge and common practice, but that is not where we are today and I think that is exciting times actually looking at all that.

   

7. You had mentioned that the field of parallel and concurrent development, we are still kind of figuring things out, but kind of while we are in the process of learning this number of cores that we have in our machines seems to be growing rapidly. When does this start to become a real problem if we don’t have parallel and concurrent structures down pat and intuitive for different developers.

I guess there are a lot of people saying that: "Look at Moore's Law, look at the processors and by this time next year if we don’t have this solved properly we are sunk". Just a few weeks ago at JavaOne I was speaking to someone whose processor was a little slow, when they were trying to solve a particular problem 20-30 years ago, the fastest computer that was available in the system was the processor that was in their printer, so they actually sent the computation off to the postscript processor in the printer and got it to calculate the answer and went back to their computer. So this isn’t a new problem, it’s all about leveraging all the things that are out there. So I am not in the group that says that this stuff is happening so quickly that we are not going to be able to cope.

But we do know it’s inevitable, we have to start solving this problem. So we can get by a long time by Apple’s approach to solving this problem is we won't allow concurrent stuff even though the hardware is quite capable of doing multi threading, they say we'll impose some restrictions on the way we handle these sorts of things. But are they going to affect the user a little bit, but they will allow us to develop things a lot simpler. And we’ll see over time that those sorts of restrictions will be removed and we’ll come up with better ways to present information to users that allow a whole lot of things to be happening in parallel. It’s a problem we’ve had for a long period of time even though Moore's Law is making it something that we do need to start tackling without delay, but I don’t see it as something that is going to be the end of the IT world as we know it.

We will find ways to solve this and a lot of the knowledge we are gaining is helping us do that, but it will be a slow and steady growth as I see it. So I think what we will see is that we’ll still have compromise solutions like on the current iPhones, we’ll get given a limited computer environment to work in. If you look at a Java EE container it provides programmers with a simplistic model. We’ll pretend even though we’ve got a concurrent appserver here, we’ll pretend that every bit of code you write is running in a single thread and behind the scenes we will work out how to do everything concurrently. So there are a lot of things that we can do to enable ourselves not to have to deal with concurrency directly all the time, but we will have to work ahead and deal with concurrency as well.

And so we are going to see a whole different range of solutions to solving this problem and as we evolve to our programming languages and our toolkits and tools and our parallel patterns and so on, we will be able to handle the parallelism side of things as well.

   

8. Thank you very much.

Thank you.

Dec 27, 2010

BT