1. We’re here at GOTOCon Aarhus 2014, I am sitting here with David Nolen. David, who are you?
I’m an engineer at Cognitect, I work on various things there around Clojure, ClojureScript and Datomic.
Transit is a data format, as with many things we build these things for ourselves, we actually experience a problem and we want to solve that problem, Transit is directly tied to things we are working on and that we need, but as far as the story for the people on the outside, we wanted to be able to reach a very large number of programming languages and that forced us to look at JSON as basically as transport, so Transit piggybacks on JSON as an encoding, it’s great that there are fast JSON parsers that means we can use the same format when we speak to the browser. Remember often people say “Why not a binary format?”, binary formats don’t perform when you are just talking to browsers, they really don’t, so JSON is the only game in town if you really want to get between servers as well as to the browser, the clients. But the problem with JSON is even though JSON is nice, expressive, human readable is that when we left XML behind because it's too verbose, we lost one property of XML which is very powerful which is that it is extensible, you can extend XML with your own data types, so we recovered that with Transit, so with Transit you can add extensions to it, you can encode data on the server and send it to the client and get back the real thing, you don’t have to manually parse this stuff out of JSON yourself.
3. So, you say it piggybacks on JSON, what does it mean?
We encode, we write out a very strange-looking form of JSON, it looks very different and it’s through a lot of testing, the way that we wrote the JSON, it performs extremely well. We have a couple of tricks, number one we only write JavaScript arrays and JavaScript arrays are often parsed faster on many JavaScript engines than objects, it’s actually faster to parse arrays, the other trick that we use is that we use a caching mechanism which allows the payload to be compressed, but it has an extra feature which is that when you cache information and when you reach a cache code will actually read out that object out of a cache, so parsing Transit is often more memory efficient, not just a smaller payload, not just faster to parse, but it’s also more memory efficient because we’ll reuse objects that we already encountered. If you look at JSON for example, classic case in JSON, I run a query and I get back a huge array and I have objects with 50 fields and then I have another object with the same 50 fields over and over again, that’s just an incredibly amount of redundancy and actually in Transit the first time we encounter a key, like in a JSON object, if it was JSON, the second time we see it, we are going to replace that string with a cache code, so that allows us to compress it, so it’s a large reason why it performs so well.
4. How does it relate to something like MessagePack?
There is binary version of Transit that piggybacks on MessagePack. So MessagePack is like JSON for binary, so that’s great if you are going server to server and you want to use binary encoding because you get even better compression in the payload. So Transit piggybacks actually on two things, JSON, and it piggybacks on MessagePack, whichever one is most adequate, the one that you need for your situation.
Werner: So MessagePack is binary so if you have a binary parser ...
That’s right. You can emit MessagePack instead of JSON. For example, we have implementations in Python, Ruby, JavaScript, Clojure, ClojureScript. So the Java implementation you can actually choose to take to write either MessagePack or JSON. In JavaScript you can’t pick right now, eventually you may be able to pick when MessagePack performance improves in the browser, but currently you are limited to JSON there.
Transit ships in every language, we ship standard forms of things you cannot encode in JSON, JSON only supports numbers, Booleans, strings, null, arrays and objects, I think that’s it, there is only six data types. Transit does all those, but it does more, it does arbitrary precision floats and integers, it does dates, you can encode dates, you no longer have to coordinate what date format am I going to use here and here, you never think about that, we encode dates, we encode sets so you can actually pass a set from server to the client and you’ll get back the right thing, you can actually even compose, we actually added to the JavaScript implementation a whole bunch of data structures that are very close to what’s coming in ECMAScript 6, ECMAScript 6 has specified sets and maps and so they are not available yet but we wrote sets and maps for JavaScript that are very close to the ES 6 spec, so you will get back a real set in JavaScript if somebody sends you a set.
But the cool thing is ok, we give you a broader set of types out of the box, but say you decide to build an image editing application and we want to encode colors, we have a color type in our server side language and we have a color type in our client and normally what you have to do is coordinate between the frontend guy and the backend guy and you have to make sure everybody has the right thing, in Transit you can have handlers on the server side and handlers on a client on you are just going to get back the color object that you want and you can encode a color object and get back a color object; and again, it’s about removing coordination and sort of ad hoc code because people want to add more types and they want to not coordinate between the teams so much and so Transit allows you to do this in a much simpler way.
6. The only thing you need to ship are the extensions on both sides, right?
That’s right, but for example there are many cases when you have some data types on the server that is relevant for a particular client, even so there is a really cool trick in Transit where even if we don’t have the handler from the client, it doesn’t cause any error and we don’t corrupt the data, so we actually support round tripping. If you go from Java to Python to JavaScript for some reason, which happens, there are systems that look like this, and there is some type that JavaScript doesn’t understand yet needs to send it back to the server, it will go all the way through the entire system and it won’t be touched.
Transducers were informed by the design of reducers, they were a big a-ha moment for Rich Hickey, he came up with the idea, but they were created because , what we were seeing is that if there was a criticism of reducers is that in the standard library in Clojure and ClojureScript you had a function called map but in the reducer’s namespace you had a different, more efficient version of map and that in core.async which is sort of our CSP, async coordination abstraction, that is very popular, we also had a map. So we have three versions of map, even though they logically do the same thing, they were not reconciled, so transducers are a reconciliation. So, now actually we can have the same mapping operation in all cases, so we can use the same map that we use on collections as that we use on an async channel; it’s very powerful because, again, you get to reuse abstractions, abstractions are now reusable, you are not coordinating three different versions of the same idea, but there are also huge performance benefits, they are very, very big, in many cases especially for ClojureScript.
On the server side you will often have very powerful machines and very powerful garbage collectors, you are not quite so concerned in the server context, you have a lot of power, but a lot of our targets are mobile phones, and mobile phones are still 10- 20 times slower than a desktop or a laptop, very underpowered, and many times the GC just is not as good in these contexts and transducers for doing async programming they really reduce GC pressure, they involve a lot less allocation, so it’s just a really good story all around. So they are actually bigger than most people realize, it’s actually a very significant change for Clojure and ClojureScript and it’s probably the biggest overall performance boost in a very long time.
8. Is there a rewrite of the Clojure behavior?
It’s not a rewrite because rewrites are very dangerous; you don’t want to break backwards compatibility. So what we did was we did a very small trick, very clever, we had all these functions and these functions didn’t have extra arity, so we introduced a new arity in map, in filter, in all these other sequence-like operations and this new arity instead of returning a collection or sequence they return a transducer. A transducer is really a special kind of function that you can sort of compose into these transformation pipelines.
9. So transducers are an extension?
Yes, they are really not a change to the language, because again that’s not something that we want to do, we don’t want to break backwards compatibility, they are an extension. Again, we just added an extra arity to all these functions and that means that people can adopt them over time so they can gradually change their code from the old way of doing things to the new way, which is great, that means that people can gradually adopt a new approach.
10. You mentioned these combination pipelines, what do they look like in practice?
So, you can compose, this new arity returns what is called a transducer, a transducer is a function that has a very specific signature, probably it would take too long to explain in such a short amount of time, but I highly recommend reading Rich Hickey's posts and looking at the source code, definitely if you are a Clojure programmer and I think there are a lot of people now porting it, so as new implementations and new languages come out, look at how it’s been done in your language and you will see that conceptually they are pretty simple, but basically a transducer itself is just a function, you can use function composition to compose them, so I can go compose map and filter, you only pass one argument the function that you want to use so compose, map, perhaps I want to increment an integer, so it’s map and inc, and maybe I want to filter even numbers, I go compose-map-inc-filter and is-even? and that will produce a single transducer which does both operations and will both increment and filter, in one function.
11. How is that related to stream fusion or other things?
It’s related to stream fusion, so stream fusion comes from Haskell there are other concerns there that really don’t affect us, but what it lets us do is, again the fusing part that we are able to do is again when you compose transducers you get back a single transducer that does all the operations of the individual ones together. And so it’s really cool, because what it means you can do for example is I can take all these cool beautiful functional operations, compose them, get a single transducer and then in a for loop, inside of an imperative loop, I can use the transducer and that’s how we actually get the performance under the hood, often under the hood we use high performance iteration or some sort of loop with a transducer to get way better than what most people are doing.
12. Excellent. So transducers are in a future version of Clojure?
They are already available, I mean they are available in an alpha version, people love them so much they are already using them, because they are conceptually very simple and the implementations are relatively stable, but you can use them in alpha releases of Clojure 1.7 and they are actually just available in ClojureScript, so if you are using ClojureScript you can use transducers today. And again, people are very excited about it, we already shipped a version of core.async that supports transducers, so today you can write ClojureScript programs with transducers and they use core.async with transducers.
Werner: Excellent. I think you are the maintainer of ClojureScript.
Yes, I am one of the lead developers.
13. What is the uptake of ClojureScript ?
It’s changed dramatically, the story around ClojureScript has changed dramatically in the past year really, so I think we will probably talk about this in a bit, but the arrival of Om I think made a very strong story for using ClojureScript in the frontend, so there has been a ton of adoption of ClojureScript even among just Clojure programmers, and then excitingly interest outside, but I think somebody did a survey, a very important build tool to the Clojure community, something called Leiningen, and they do a survey every year and they’ve asked the question “Are you using ClojureScript?” and I think maybe two years ago maybe there was a third or a quarter, but the last time they did a survey 50% of all Leiningen users used ClojureScript so it’s now considered a critical part of the Clojure stack, it’s no longer a weird thing, and there are a lot of reasons for this, it’s become more stable, we have very good source mapping which is how you can debug ClojureScript instead of JavaScript in the browser, you have a lot of contributors, I think we have 85, there have been 85 contributors over the three years, so it’s very healthy, and it’s constantly improving.
14. You are already brought up Om, what is the story of Om?
It’s kind of a funny story because Om was an idea I had because in ClojureScript we’ve had high performance persistent data structures for a long time, I’ve demoed it at conferences around the world, this stuff isn’t slow, I show with modern JavaScript engines, persistent data structures can be very, very fast and in some cases outperform, it’s very counter intuitive, they can outperform mutable data structures.
So this has been true for at least a year with the new JavaScript engines that are out there. But really the big change was React, and React is a new Facebook library for doing UI development and building reusable components, it came out actually summer in 2013, people didn’t look at it, I also shot it down because when they released it, they sort of emphasized it’s a reusable component but they had this weird thing called JSX where you could mix XML syntax into your JavaScript and of course the JavaScript community was like “This is a horrible idea” because people remember these types of things in the past, so it was largely ignored, and it was not completely the JavaScript developers fault because the engineers around React were not explaining what is the value proposition, because the value proposition was not in the HTML integration but it really was that the React rendering model is just a smarter model, it allows for global optimizations in a way that currently no other JavaScript frameworks really provide because it really takes this treating the DOM as if it was a GPU to send it instructions to, it actually, without the user having to do any work, does optimizations around batching operations with the DOM because actually touching the DOM is very inefficient, if you mutate the DOM it triggers repaints, reflows, recalculations, these are all performance bottlenecks and it’s better to bring all your mutations into one batch and apply them in the DOM at once.
And that’s really the clever thing that React does, and it does this by when you build a React component you are actually constructing a virtual DOM, it’s not a real DOM, it’s a virtual DOM and then as the state of your application changes, React computes an entirely new virtual DOM and it takes the two virtual DOMs and it diffs them, like if you were diffing a source file, two versions of a file, it does a diff and computes the minimal set of changes the real DOM needs and applies them on your behalf, so you don’t ever need to touch the DOM yourself.
Werner: Until you apply it.
React does it for you, you don’t actually do it. React is sort of in control.
Werner: So Om comes in there with a crucial idea.
Yes. So the idea around Om is once I realized the algorithms in React are really awesome, it wasn’t just this weird HTML JavaScript thing, I actually got into a Twitter conversation with the original developer, Jordan, and he is the guy that originally came up with the design and we had a back and forth and he was like “Persistent data structures are great” and I was like “A JavaScript developer thinks persistent data structures are great” and I asked “Are you using persistent data structures at Facebook?”, and he said “Yes, we actually built a comments component, a comments module, and it was a little bit slow and I had to change one line of code and add persistent data structures and we got an order of magnitude performance boost” and as soon as I pick his brain about exactly how that was done, it was quite simple a hook that was needed, I started building Om right after that, and Om is just where he had optimized inside his application allowing immutable data structures to be the central way that you communicate with React, so you can organize your whole program around immutability. That’s really the big idea.
The one method that is really important to understand in React is that there is a method called shouldComponentUpdate. So, whenever you have the state of the application and then you have a new state, shouldComponentUpdate takes the old state of your component and then the new state and then you return a boolean value saying true or false; if you say false React will not consider that part of the DOM, it won’t even look at it, whatever that component represents, it’s ignored. If you say true, it will descend into that component and figure out exactly what the DOM needs to update.
So, it turns out that persistent data structures are a huge win, because why? With persistent data structures, and this is about memory, if I have the data structure and I have the new state, if they are equal, so in JavaScript you have a triple equals operators, they are actually the same location in the memory, if they are the same location in memory we know we can ignore it and even if it did change, as we look in an immutable vector, an immutable vector, and there are three things in it, all you have to do when we traverse it is to say “Is this pointer the same: yes, you can ignore that one; is this pointer the same, no, then we actually do this”. So you can do diffs very efficiently simply because deep equality is efficient in persistent data structures which is not true for mutable data structures. React in JavaScript, people in JavaScript, when they pass values to the component they pass arrays or they pass objects and because those are mutable, React has to walk the whole thing, it has to walk and check everything, it has to do a deep walk. Immutable data structures allow us to do usually just very shallow walks, so that’s a performance win.
Werner: Basically, it’s a difference between walking a tree and a point er comparison.
That’s exactly right.
Om is a little bit too radical for most people, unless you are already a Clojure programmer I think people are not going to asses Om, people are interested in it, and actually it’s been amazing to see every new JavaScript library around React seems like it’s just copying something from Om, which is really good, I mean I did Om as a proof of concept that I want to share with the world, I was stunned by the amount of traffic my blog post got and really that blog post was pivotal in actually changing the broader opinion about React, it was really through my blog post that people started shifting and reassessing React and taking the time to really see how it works and trying it out. So I see a lot, I lurk on the React IRC and I communicate with the devs a lot, it’s been amazing because you would see a JavaScript person “I read this Om post and well, ClojureScript is weird but where can I get these immutable data structures”; and so there is so much excitement about immutable data structures; like JavaScript, it is the mutable language there is, so it’s been great, it’s like the gateway drug to immutability, so it’s been cool to see this evolve over time. And it’s actually caused Facebook to come out with their own data structure library which is persistent immutable data structures.
Werner: In JavaScript?
In JavaScript, yes.
Werner: JavaScript API. I can agree, I definitely only heard about React because of you basically, I heard about it, saw these silly XML, thought it’s pointless, then David comes along and tells us all no, there is value there. So, David, thank you for this interview and thank you for giving us the gift of React.
You are welcome.