BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News JGroups Implementation of Memcached Supports Failover and JMX

JGroups Implementation of Memcached Supports Failover and JMX

This item in japanese

Memcached is a distributed memory object caching system used in dynamic web applications to alleviate database load. It is used to speed up dynamic database-driven websites by caching data and objects in memory to reduce the number of times the database must be read. Memcached is based on a hashmap storing key/value pairs. The daemon is written in C, but clients can be written in any language and talk to the daemon via the memcached protocol. But it does not provide any redundancy (e.g. via replication of its hashmap entries); when a server S is stopped or crashes, all key/value pairs hosted by S are lost.

Bela Ban, JGroups and Clustering Team Lead at JBoss, recently wrote a JGroups-based implementation of memcached which allows Java clients to access memcached directly. The implementation is written completely in Java and has few advantages over memcached framework:

  • Java clients and PartitionedHashMap (org.jgroups.blocks.PartitionedHashMap) can run in the same address space and therefore don't need to use the memcached protocol to communicate. This allows servlets to access the cache directly without serialization overhead.
  • All PartitionedHashMap processes know about each other, and can make decisions as to what to do when a cluster membership change occurs. For example, a server to be stopped can migrate all of the keys it manages to the next server. With memcached, the entries hosted by a server S are lost when S goes down.
  • When a cluster membership change occurs (e.g. a new server S is started), then all servers check whether an entry hosted by them should actually be hosted by S. They will move all entries to S. This has the advantage that entries don't have to be re-read from the DB (for example) and inserted into the cache (as in memcached's case), but the cache rebalances itself automatically.
  • PartitionedHashMap has a level 1 cache (L1 cache). This allows for caching of data near to where it is really needed. For example, if we have servers A, B, C, D and E and a client adds a (to be heavily accessed) news article to C, then memcached would always redirect every single request for the article to C. So, a client accessing D would always trigger a GET request from D to C and then return the article. JGroups caches the article in D's L1 cache on the first access, so all other clients accessing the article from D would get the cached article, and we can avoid a round trip to C. Note that each entry has an expiration time, which will cause the entry to be removed from the L1 cache on expiry, and the next access would have to fetch it again from C and place it in D's L1 cache. The expiration time is defined by the submitter of the article.
  • Since the RPCs for GETs, SETs and REMOVEs use JGroups as transport, the type of transport and the quality of service can be controlled and customized through the underlying XML file defining the transport. For example, we could add compression, or decide to encrypt all RPC traffic. It also allows for use of either UDP (IP multicasting and/or UDP datagrams) or TCP.
  • The connector (org.jgroups.blocks.MemcachedConnector) which is responsible for parsing the memcached protocol and invoking requests on PartitionedHashMap, PartitionedHashMap which represents the memcached implementation, the server (org.jgroups.demos.MemcachedServer) and the L1 and L2 caches (org.jgroups.blocks.Cache) can be assembled or replaced at will. Therefore it is simple to customize the JGroups memcached implementation; for example to use a different MemcachedConnector which parses a binary protocol (requiring matching client code of course).
  • All management information and operations are exposed via JMX.

The main class to start the JGroups memcached implementation is org.jgroups.demos.MemcachedServer. It creates an L1 cache (if configured), a L2 cache (that's the default hashmap storing all entries), and a MemcachedConnector. The API is very simple and includes the following caching methods:

  • public void put(K key, V val): puts a key / value pair into the cache with the default caching time
  • public void put(K key, V val, long caching_time): same as above, but here we can explicitly define the caching time. 0 means cache forever, -1 means don't cache and any positive value is the number of milliseconds to cache the entry
  • public V get(K key): gets a value for key K
  • public void remove(K key): remove a key / value pair from the cache (L2 and L1, if enabled)

InfoQ spoke with Bela Ban about the motivation behind JGroups implementation of memcached. He said that JGroups implementation of memcached allows them to experiment with a distributed cache and see how the various caching strategies fit into JBoss Clustering. He also explained how this new memcached implementation compare with JBossCache caching framework:

We see caching as a continuum between distributing data across multiple nodes (hosts) in a cluster (without redundancy) and fully replicating data (total replication of every data item to every cluster node). Between distribution and total replication, we have buddy replication, which replicates data to a few selected backup nodes. This can be compared to RAID, where RAID 0 has no redundancy (distribution), RAID 0+1 has fully redundancy and RAID 5 has partial redundancy.

Currently, the PartitionedHashMap in JGroups provides distribution, and JBossCache provides total replication and partial replication (with Buddy Replication). The idea is to let the user define K *per data item* they place into the cluster, so K=0 means distribution, but if a node which hosts one or more stripes, crashes then the data is gone, to K=X (where X < N) which is RAID 5, to K=N which is total replication.

The memcached implementation in JGroups is a first step to experiment with K=0, which is pure data distribution without redundancy. This will eventually make it into JBossCache.

Where does memcached implementation fit in JBoss Application Server modules?

It will be part of the Clustering subsystem, provided by JBossCache. Note that our implementation is really written with "Java" clients in mind, so we don't have to use that terribly inefficient memcached protocol, with the marshalling/unmarshalling/copying overhead.

Talking about the typical use cases for using JGroups implementation of memcached, Bela said:

The server side code (e.g. servlets) running in a JBoss or Tomcat cluster, which accesses a DB and needs a cache to speed up things and remove a DB bottleneck. The other use case is similar, but instead of accessing a DB, access is to the file system. For example, an HTML page caching server (Squid comes to mind).

Are there any plans to introduce memcached into JBoss Application Server in the future.

Absolutely. The Data Partitioning feature will allow users to configure caching according to their needs. So having something like a distributed cache is not a new feature in itself, but a matter of JBossCache configuration. The cool thing is that this can be dynamic, so developers can decide which redundancy features (none=distribution, full=total replication or partial) they want per data item they put into JBossCache.

Regarding the future direction of the project in terms of new features, Bela listed the things are on the todo list:

  • Provide an eviction policy based on number of bytes rather than number of elements in the cache.
  • Store elements received from a remote server as byte[] buffers rather than objects. On first access, unmarshal the byte buffer into the object. This is used in JBoss' HTTP session replication code and has served us well: no hit on performance due to unneeded unmarshalling.
  • Implement the full memcached protocol: currently I only provide GET, GET-MULTI, SET and DELETE. The other stuff (APPEND, PREPEND, CAS) is easy to implement, but I've not done it (yet) because the main use case is for Java clients to reside in the same JVM as our memcached implementation, and therefore the memcached protocol is not needed.
  • Provide a better implementation of consistent hashing.

JGroups implementation of memcached and its library dependencies can be downloaded on their sourceforge website. Below is the command to launch the program:

java -jar memcached-jgroups.jar

Bela is looking for the feedback from the community. He said this is an experimental feature, but will become a supported feature of JBossCache, and community input will have a great influence on the direction of this feature.

Rate this Article

Adoption
Style

BT