Last week, InfoQ reported that Groovy 2.3 has a much faster JSON parser than previous versions. While creating the article, we sent an email to Tatu Saloranta, founder of the Jackson JSON processor. We wanted to see what he thought about Rick Hightower reporting that Groovy and Boon provide the fastest JSON parser for the JVM.
InfoQ: Do you feel these benchmarks are accurate?
Tatu Saloranta: At a very low level, I think the test methodology is solid. JMH is a good framework, and with proper iteration counts etc., results are repeatable.
I think it is possible that Boon and Groovy are even faster than Jackson for some or many tests, but I do indeed have doubts about most extreme claims, and specifically about cherry-picking particular tests and/or test usage.
My concerns are due to three main things; they all sort of fall under next question.
Also, just to make sure -- the tests I have looked at are available on GitHub. I think there are many derivatives; some of my comments may be less applicable.
InfoQ: Do you think these benchmarks are testing real-world behavior?
TS: Real-world behavior, and real-world usage. I think they may represent a small section of possible usage. I think their emphasis has tended to underline "good cases", to put it bluntly. Three specific concerns I have are:
- Input source. Most commonly cited tests start with Java Strings. Strings are rarely used as input source, because they are JVM constructs -- all external input comes as byte streams. Strings are used in unit tests -- or, if framework (or platform; maybe Groovy does this?), only exposes Strings. Same for writing. This matters mostly because of two things: (a) Jackson heavily optimizes byte-stream case, since it is the bread-and-butter of REST services, or file storage; and (b) Boon has very aggressive optimizations for dealing with Strings; especially use of sun.misc.Unsafe to access and modify underlying char[] that String class offers to access to. So, use of source that is a minority use case, but where Boon does have a clear edge (it is faster with Strings, there's no denying that), seems suspicious.
- Processing/access style: "untyped" -- process Lists of Maps (instead of POJOs). The second part is less suspicious; but it seems odd to me not to mention that reads and writes only Lists-of-Maps objects and not real POJOs. All modern JVM REST frameworks focus on POJOs, although also allow use of "untyped". Different users have different preferences; so I think it is legitimate to test either, or both, but this should be documented.
- Lazy construction with tests that do not access or verify data. Boon has quite a bit of optimizations geared at lazy processing of input. This can be useful for use cases where only small subset of data is accessed. But the problem here is that performance tests do not do any access of data -- in fact, parser could return any Object, and test would not really notice it. So I feel that tests just happen to work in a way that gives optimal boost for lazy processing; and due to this, they do not represent performance one would get.
Perhaps I should rephrase all of above to say that the tests do not seem to start with actual valid usage patterns -- at best it feels artificial. They only read/write JSON, but make no use of it. I understand that this makes sense from one point of view -- trying not to add overhead of manipulation -- but, unfortunately, due to different trade-offs, it skews results. So when user uses, say, JAX-RS style REST handling, where all JSON data gets bound to a POJO, from an InputStream; and reverse direction goes from another POJO into OutputStream, performance experienced is very different from what a benchmark would suggest.
On the other hand, if the idea is to use "untyped" Objects, at least code should do some form of traversal; and, if same object is to be used for round-tripping, also modifications.
In case of Boon, what happens is that the use of overlays (indexing of raw input, to be able to extract data), along with lazy construction of Maps, hides the actual overhead that would be experienced. And if Strings are used as the source/target, encoding/decoding overhead (which varies between Jackson and Boon -- Jackson targets this step heavily), it further reduces Jackson's end-to-end relative efficiency.
InfoQ: Do you plan on making Jackson faster in the future or is it "fast enough"?
TS: At this point I can address small things, but I do not have major plans to focus on performance. I hope to address some findings (benchmarks have been useful!) to lower overhead when reading from String sources; and Jackson Afterburner module has some of these aggressive optimizations. But these will be incremental improvements most likely.
Performance has not been the number one goal since earliest 1.x releases; and while I do want to keep overhead moderate and low, there are more important things to focus on: ease of use, support for other formats (XML, CSV, CBOR, Smile), conventions, modular data-type handling libs (Joda, Guava) and so forth.
I guess it is fair to say I feel it is close enough to "fast enough", in the right ballpark.
InfoQ: Thanks for your candid responses!
TS: No problem -- Thank you for digging into this. I think Boon for JSON is a useful thing over all; and specifically it is great that Groovy gets modern high-performance support. But I do hope that comparisons are apples to apples, and claims are in line with supporting evidence. :)