Cloudant the company behind CouchDB just released Java View Server for CouchDB. That means that not only Erlang and interpreted languages like Javascript or Python can be used to write Map-Reduce jobs but also JVM based languages. The approached will be discussed at the CouchDB community meeting this week. Currently it can be only used on Cloudant's hosted BigCouch service.
The main advantages that are cited is the massive amount of Java libraries that are available for all kinds of functionality that could be relevant in map reduce tasks. The second one is the more reliable static typing aspect (but that needs to be proven).
A performance comparison would be interesting, but by now there was no benchmark performed. The performance is expected to be lower than native Erlang views (Java and Erlang can be mixed within a view). There is some overhead due to JSON serializiation and deserialization by the org.json library.
For using the Java based Map Reduce views just implement a simple JavaView Interface that offers callbacks for map, reduce and rereduce. For example a simple view that aggregate word counts in configured JSON fields.
{ "_id":"_design/splittext", "language":"java", "views" : { "title" : {"map":"{\"classname\":\"com.cloudant.javaviews.SplitText\",\"configure\":\"title\"}","reduce":"com.cloudant.javaviews.SplitText"}, } }
InfoQ spoke with David Hardtke, the Director of Search at Cloudant, who is responsible for this project.
InfoQ: CouchDB runs on Erlang how does this interact with JVM code? What were the implementation challenges?
David: The Java View Server, like all CouchDB view servers (except native erlang), runs as an external process. There is a well defined protocol for communication between CouchDb and view servers.
Normally, communication occurs via standard io but we actually use the OtpErlang java-erlang Library for performance reasons (allows for multiple threads).
InfoQ: Any limitations on what code / libraries can be used in this context?
David: The main challenge was security, both at a System level and from a user data level. We are running this on a shared cluster. We use dynamic class loading to load user libraries. The class loader has a fairly tight security manager in place that restricts malicious calls. There is no FS access and a limited set of System calls allowed.
The current architecture of the view server is quite simple, it is just using java threading which is driven by the calls from the Erlang based CouchDB instances. If the Java server fails it is just shut down and restarted. Interesting approaches for such a server would be using the Scala based Akka framework or Jetty's non-blocking requests. They Java View Server runs on any JVM
A great potential lies in using the Java.next languages like Clojure, Scala or Groovy (and others) for this kind of work as they are much more concise and powerful than Java in expressing such tasks. According to David a Clojure based view server is in development by some other party.
To evaluate the new Java View Server a free account available from Cloudant's site can be used. Detailed instructions can be found in the couchjava github repository.