BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Rich Hickey on Protocols and Clojure 1.3

Rich Hickey on Protocols and Clojure 1.3

Bookmarks
   

1. Clojure has released version 1.2, is going to 1.3. What have you learned since the release of Clojure 1.0? Are there some big lessons that you’ve learned, any things that you want to do differently?

I don’t think there are any big lessons; I think we’re just moving forward on adding things for completeness that we sort of knew about and then refining things that we already had. We’re always trying to improve performance and 1.2 of course had a big addition, which was a different polymorphism system with protocols. I think Clojure probably didn’t go out with that because wanted more time to think about it, but I think what we ended up with was pretty flexible.

   

2. You mentioned protocols which are I guess a new tool for polymorphism, that’s between functions and multi-methods. Is that correct?

Right. They are very similar, depending on your perspective, there are either similar to Common Lisp generic functions or maybe the dynamic flavor of type classes. But they have the advantage that both of those systems have in that you can add polymorphic behavior to things that already exist without changing them, so that is nothing that relies on type intrusive inheritance kind of a thing.

   

3. It’s essentially solving the expression problem.

If you know what that is, it’s a way to do that. The expression problem makes it harder for people to understand. People understand that it can be difficult to extend the systems that require you to change your types. So when you have something that allows you to write new extensions, new functions that behave differently depending on the types they're passed without touching those types - that’s something that allows for independent extension. You can have new functions, with old data, new data and make it work with old functions - that’s the point.

   

4. In a way it’s different than for instance Ruby’s open classes, where for an open class you modify the class. Protocols are different. Can you scope them in some way?

Yes. Protocols are functions and in Clojure all functions are in namespaces. So if you had some abstraction that needed the function foo and I had an independent abstraction where foo also made sense. They would be in different name spaces and they wouldn’t conflict with each other where when you try to use classes as name spaces, you have conflict when people choose the same names for different semantics.

   

5. Can you use protocols to put a new function on String for instance but only in your namespace?

It’s not like you are putting it on String. You are saying "There is a function whose behavior I’d like to be different when it’s passed Strings vs. when it’s passed collections or something like that." That kind of polymorphism isn’t in the classes any more, it’s in the functions. Once that’s the case and functions are in separate name spaces, you have an open system that allows for mutual independent extension.

   

6. One difference with protocols is that it's single dispatch vs. multi dispatch.

Right. That’s a compromise essentially so that: a) it can be really fast and b) there can be a relationship between it and what the hosts provide. Because generally Clojure is hosted on systems that provide single dispatch. By limiting it to single dispatch, which is the 99% case, you get that ability to have a mapping which is important for bridging, so you are able to say "I’ve defined a protocol, but if you are in Java and you need to extend it from there, you can use the inheritance mechanism you are used to, because there is a corresponding interface."

Nothing about having protocols precludes or denies the value of multi-methods which are still in Clojure and are still valuable for those situations where you have something more involved, you want to dispatch on something other than the type or use multi-dispatch.

   

7. You can also hook into the JVM’s invoke_virtual, the infrastructure that makes that fast.

That’s part of why the single dispatch is fast. Single dispatch is generally easier to make fast because it has less to do.

   

8. Just needs a lookup table, essentially or an inline cache.

Like you said, if the host has got a single dispatch mechanism and you can map to it, then you can just take a free ride on Hotspot.

   

9. You are talking generically about host. The CLR port, is that still around? Is that being developed?

It’s still around. The development is probably slowed a little bit, it’s tracking Clojure, but it does not yet have the protocol stuff.

   

10. I think I saw that one reason for adding protocols was the Clojure in Clojure project. Is that right? Is there a relationship to that?

It’s sort of a recognition that out of the gate, as I said, Clojure didn’t come with an abstraction mechanism like protocols. The reason was I didn’t want to invent a new object system. I wanted to think about what the right solution was. I also wanted to encourage people to program to generic data structures, which has happened in the Clojure ecosystem with tremendous benefits because library interoperability is really great in the ability to suck data in from anywhere and use the generic suite of library functions is there. But Clojure is heavily built on top of abstractions so when you look at how Clojure was built, and what Clojure the language was delivering, you realize that it wasn’t yet delivering everything it needed to be built.

As a thought exercise, if nothing else, saying it would be nice if the language had everything required to implement itself, it was missing this abstraction system which protocols now provide in a way that I’m satisfied with. Getting around to redoing Clojure in that system is something that will take some time. I think the most pressing thing right now is to get the compiler rewritten in Clojure, because it’s currently written in Java and it’s sort of a monolithic transformation and a lot of tooling would like to step in the middle of that or at least get some of the results from the analysis phase of compilation and that will probably happen sooner or rather later.

   

11. The ratio of Java to Clojure code in Clojure 1.2 is still the same as in 1.0; have you changed other [Clojure sub]systems to Clojure.

It’s mostly about newness, so usually there is not a tremendous benefit to redoing something that’s already working. For instance we did introduce some new collections of primitives, they're written with the new deftype stuff, not in Java. The things that were written in Java are still in Java and they get maintained there until they get transferred. But most of Clojure’s runtime library, other than the collection classes is written in Clojure already, so we continue to do that. I don't think we’re writing a lot of new Java that’s not maintenance or minor tweaks to the compiler.

   

12. You mentioned these new special collection classes that use primitives, are they written for performance or to use less space?

They can use less space right off the bat, connecting all of the pieces together for performance is something that’s still yet to come. Obviously they were written and delivered in 1.2 before the new primitive support, which is in 1.3, which is the second step of probably 3 or 4 required to eventually end up with a system where you can take a higher order function that knows how to manipulate primitives and run it over a collection of primitives and end up with the same thing you’ve gotten from writing a primitive loop in Java, except it’s over a collection not over arrays, which are the only things in Java proper that give you primitives.

   

13. Since Clojure 1.0 you mentioned you’ve been doing a lot of work on performance, and I guess scalability, for instance the introduction of transients. I think transients are a performance feature. Transients were introduced in 1.1 and they are essentially a way to manipulate persistent data structures, is that right?

You can never manipulate a persistent data structure, but there is a constant time way to go from a persistent data structure to a transient data structure. Then if you use a transient data structure linearly essentially, not presuming that it’s modified in place, you can change it more efficiently than you could a persistent [data structure]. Then, when you’re done with that, you can get a persistent data structure out of it, again in constant time.

   

14. Essentially you’re making it transient between the ticks, changing it and then at the next tick it’s a value again.

Yes. Except that the notion of transient is important because to talk about changing it means that there is an "it" before and after the change, but you have to use the transient linearly, which means you can never refer to the value prior to any changing function. You can only just consume the results of the functions you call on it, you can’t keep around, so it sits in between "change in place" and "you can't change at all"; and functional transformation. When you use it, it feels much more like functional transformation. It just happens to be a more efficient than creating new values every time.

   

16. This was added in 1.1, but you now have new ideas for expanding transient and using the transient concepts more generically.

Transients do two things right now: they do the data structure part of the job, which is the constant time conversion to and from persistent data structures and the efficient transient changing functions, if you will, because I don’t think we have the right vocabulary for that yet. And they implement policy, so because you have this requirement that you use them linearly, currently transients self-enforce that they are used in a single thread. What I wanted to do is actually separate the policy part out from the data structure part. When you separate the policy part out, you look for a good place to put it and you end up with another reference type, like the existing reference types that will manage a transient internally. Right now I’m calling them Pods.

   

17. Clojure Pods are an abstractions of transients or reference types?

No. Clojure Pods are just another flavor of the time model, the reference types, is just another one of those, but they’ll automate the process of, you put a value in a pod, when it’s needed it will turn into a transient. The kinds of functions you send into a pod are functions of transients and when you look at the value of a pod, they’ll also automate that conversion from transient back to value. In that way, they get the user of transients to actually don’t touch transients any more. They are an implementation detail of the pods; they can enforce linearity now with different policies.

There might be pods with single-threaded policy, but there will also be pods that have multi-threaded policies where they can support multiple threads trying to produce the next value cooperatively and they’ll internally use locks to make sure that works correctly.

   

18. An interesting example you gave me at the talk was the pods enforce the order of locking. Is that the right? I mean if you have multiple locks.

Right. You don’t want to think about the locks, but the idea is, like STM, pods will let you have a single unit of work that impacts more than one reference. And because they use locks under the hood, obviously they have to take care of lock acquisition order, which they will, but you don’t have to think about that as a user.

   

20. That’s always good. We prefer for Rich to provide the abstraction. Another thing you are working on currently for the stream that’s going to become 1.3 is the numerical work, the work on fixing BigIntegers - what’s that about?

There is a bunch of things in 1.3 around primitives and better support for primitives. There is a few pieces to it; the first is that right now Clojure only supports objects as arguments and return values from functions and that’s to give a uniform system and it supports fast primitive operations inside the scope of a function. You can write a loop and have that be fast. But when that grows to a certain size, let’s say "This piece of code is getting too large; I want to take some part of this numerically intensive code and break it out, now you are suffering boxing overhead because you are using a function."

What is added to 1.3 is the ability to write functions that take and return longs and doubles in addition to objects. That removes the penalty from breaking something out into a function. That’s the first thing. The second thing is a unification of the semantics of arithmetic between boxed numbers and primitives and that basically is a move away from auto-promotion to checked primitive arithmetic. It’s the native primitive operation on the hardware, plus an overflow check which throws an exception and that’s very fast.

   

21. We’re looking forward to Clojure getting faster and seeing Clojure in Clojure at some point. Finally, you recently joined a company to do commercial backing, commercial development for Clojure? Is that right? Or what is that about?

Right. We formed Clojure/Core, it’s a joint endeavor between myself and Relevance and it acts as a consulting arm of Relevance and we’re providing support and mostly, right now, mentorship, training and consulting services to companies that are trying to adopt Clojure. You can come to us and get help, we can come to you and give you some advice, either looking at existing projects or helping projects get started.

   

22. So you could hire Rich Hickey to come to my company?

No, actually you can’t hire Rich Hickey to go anywhere. But you can hire Clojure/Core and I work with the team who will show up and I’m behind their efforts.

   

23. You are there improving Clojure.

I’m also there as the expert advice. You can get questions answered through the team which eventually may hit me because I’m the only one who can answer them.

Dec 20, 2010

BT