In this interview we talk with Manik Surtani (MS below), the lead of JBoss Cache and Infinispan projects.
InfoQ: In a nutshell, what is Infinispan?
MS: Infinispanʼs an open source data grid platform. It exposes a simple data structure - a Cache - in which you can store objects. While Infinspan can be run in local mode, its real value is in distributed mode where caches cluster together, and expose a large memory heap. This is more powerful than simple replication in that it distributes a fixed number of replicas of each entry - providing resilience to server failure - as well as scalability since the work done to store each entry is fixed in relation to cluster size.
InfoQ: What does this offer developers?
MS: An easy mechanism to address a very large memory heap. If you have distribution tuned to maintaining 1 copy of each entry, and run a 100 node cluster and allocate each node a 2GB heap, you can collectively address 100GB from any instance in the grid. And this is all in-memory, and very fast. And Infinispan is JTA compliant so it plays nice with ongoing transactions. We also have a powerful new asynchronous API, which gives you all of the guarantees of synchronous network calls along with the parallelism and scalability of asynchronous ones. For example,
Future f = cache.putAsync(k, v)
allows your thread to block - by calling f.get() - to ensure the network calls succeed, or go away by ignoring f altogether. But more important, your thread can go do something else, i.e., be useful. And then come back later, and check whether the network call succeeded by calling f.get(). Think of it as NIO to traditional, blocking IO.
InfoQ: What about persistence?
MS: Infinispan exposes a CacheStore interface, and several high-performance implementations - including JDBC CacheStores, filesystem-based CacheStores, Amazon S3 CacheStores, etc. CacheStores can be used for “warm starts”, or simply to ensure data in the grid survives complete grid restarts. Or simply to overflow to disk if you really do run out of memory.
InfoQ: Why do you think this is different to other efforts?
MS: Most open source offerings are far more limited in scope - either to smaller clusters, not offering data distribution, or not offering a complete platform. And there is the obvious difference between Infinispan and proprietary offerings. Also, as far as I am aware, the asynchronous API is unique.
InfoQ: And what are your motivations behind the project?
MS: Iʼve been the project lead for JBoss Cache for a few years now. During that time I have seen a lot of demand for an open source data grid platform. The complaints I have always had have been that the commercial ones are too expensive and not, well, open source. And that the open source offerings have always fallen short - whether in API, usability, performance, stability, or scalability - JBoss Cache included. Hence the efforts in building a spiritual successor to JBoss Cache, but with much wider scope, greater goals.
InfoQ: Is this something that will only be useable within other JBoss projects, or can I use it elsewhere?
MS: All you need is a Java 5 compatible JVM. And being LGPL licensed, it is business and OEM friendly.
InfoQ: What is a data grid? How does it differ from a cloud, if at all?
MS: From my experience, clouds tend to refer to the provisioning on-demand of computing resources. This would include storage, processors, operating systems, memory. The current fashion is to use virtualization for this. Data grids are more of a service. A uniform sea of memory, spanning several servers. Typically, data grids would be deployed on top of a cloud.
InfoQ: So when's the right time to use a grid? And the wrong times?
MS: Any time you find that a database is becoming an unbearable bottleneck - and it usually becomes one pretty quickly as you scale out - use a data grid. :-) Data grids scale very well. In addition, if you use a compute grid to process tasks in parallel, you usually want a data grid superimposed as well, to provision the state for the compute grid to work off. I have seen data grids used for message passing though, this is a definite no-no. This can put a lot of unnecessary pressure on nodes where keys get mapped to. If you need to use a distributed tool for message passing, use JMS. Thatʼs what JMS is optimized for.
InfoQ: How does Infinispan relate to JSR-107, and JBoss Cache?
MS: Infinispanʼs Cache interface tracks the ongoing developments in JSR-107 and is, as such, compliant with the current snapshot of the specification. Infinispan implements all optional parts of JSR-107, including JTA compliance and clustering. Infinispan bears no relationship to JBoss Cache - except in some design features and perhaps a few reusable classes that were copied over. Fundamentally, though, Infinispan is all-new.
InfoQ: So does Infinispan need to run in a cluster?
MS: No. It is a perfectly viable and very high-performance local-mode cache as well. Weʼve implemented state-of-the-art concurrent container algorithms as our core, with minimal use of mutexes such as locks and synchronized blocks. Infinispan performs very well on multi-CPU and mult-core servers under high concurrency. The eviction algorithms are designed to perform well under high concurrency as well.
InfoQ: What else is new and cool on Infinispan's roadmap?
MS: There is a lot of cool stuff coming up, in addition to what I have mentioned above. People should follow the Infinispan roadmap on the project page for more details, but Iʼll mention two features I think are most exciting.
- We have an NIO-based server module on the roadmap. This will speak 2 protocols - a memcached-compliant RESTful one, and a custom binary one. The first protocol will allow any existing memcached client - in any language or platform - to work with Infinispan, widening Infinispanʼs appeal beyond just Java. The second binary protocol will contain additional information such as server cluster topology and consistent hash function, to allow for “smart clients” which could handle load balancing and failover. Weʼd provide a Java client for this, I expect to see more clients come up for other platforms.
- We also have a powerful Query API on the roadmap. Cached state can optionally be indexed, allowing the entire grid to be searched. This would typically happen in parallel, as each node receives and performs the query on its locally cached state. And returns results. Yes, it does look like Map/Reduce. :-)
InfoQ: Thanks for taking the time to talk with us Manik. More information can be found on the Infinispan project page.