Introduction
It is extremely common for applications to store various pieces of information, most of the time in relational databases. While they do a great job when working with regular data types, they are not very efficient when dealing with binary data, for example images or documents. File systems can be used as an alternative and while they offer better performance, there is neither a query language for searching information nor a notion of relationship or transaction.
Allowing third parties access to the stored data (a typical requirement that arrises as the application grows) is, in many cases, a lengthy and complex process which can not happen overnight. The internal structure of the storage can easily affect the API architecture and the way information is retrieved and traversed.
What is JSR-170?
Fortunately enough, JSR-170 also known as Java Content Repository(JCR), tries to address these problems (and many others) in an implementation-independent way; that is, the API will be the same regardless of the underlying resource (eg a database, a local or virtual file system). Sitting on top of the data storage, JCR offers content services like granular access control, versioning, content events, full-text search and filtering among others. With an impressive expert group behind JSR-170 led by Day Software, including Content Management Systems (CMS) vendors like Vignette, Hummingbird Ltd., Stellent and the usual Java-driven solution providers like BEA Systems, IBM and Oracle, the specification is likely to become the de-facto standard for content management and document storage.
Finalized in June 2005 after almost 2 and half years of work, the API contains around 50 classes (mainly interfaces and exceptions) under the javax.jcr package with the initial 1.0 version of the reference implementation (JackRabbit) released in early 2006.
JSR-170 Overview
Java Content Repository is based on the notion of repository which (besides the usual meaning of being a place where things may be put for safekeeping ) offers several features for working with data. The repository stores information in a 'tree structure' made up of nodes and properties as shown in the diagram on the right. The circles represent nodes while squares represent properties. A node can have one and only one parent, any number of children (sub-nodes) and any number of properties. A property has one and only one parent (which is a node) and no children and consists of a name and one or more values which can be of type Boolean,Date,Double, Long, String or Stream. Only properties can be used for information storage while nodes for creating a 'path' inside the tree. To some degree this tree resembles the structure of a file system with nodes being directories and properties actual files.
Repository functionality is divided into several 'compliance' levels, each providing a specific set of features:
-
Level 1
Level 1 is mandatory for all implementations and provides read access to the repository, which in short means:- read access to nodes and properties
- read access to property values
- export to XML/SAX
- query service with XPATH syntax
-
Level 2
Level 2 provides write functionality:- adding and removal of nodes and properties
- write access to property values
- import from XML/SAX
Note that JCR implementations are not required to comply with level 2 and beyond by the specification, so it is perfectly valid to work with a repository that is read-only.
-
Optional
The 'optional' level contains some advanced features which are not required from a read-write repository but are the ones that really add value to JSR-170. This level includes (among others):- Transactions - which make it possible for the repository to be enlisted as a resource along JMS or JDBC resources.
- Versioning - which allows the repository to record different states of its nodes that can be later on retrieved. The specification goes to length on this topic; it's possible to build a CVS clone with a JSR-170 backend.
- Events - also known as observation allows the client to be notified of any activity happening inside the repository.
- Locking - functionality that makes possible to freeze parts of the tree, effectively rendering sub-trees as read-only
Reviewing the API
When dealing with JSR-170 it is recommended to work with interfaces from javax.jcr package so that it can be easy to switch JCR implementations without having to make any code changes.
The central class of this API is Session which represents a connection between the client and the repository and is defined by the workspace name it is active on and the credentialssupplied. Sessioncontains method for both reading (level 1) and writing (level 2); using functionality not supported by the underlying repository will thrown an exception.
The package contains also interfaces that define the units that make up the repository:Workspace, Credentials, Node, Property, Item(super class for Node and Property) and Value . Queries are handled through the javax.jcr.query package, node type definitions through javax.jcr.nodetype while the rest of the packages address the optional level functionality: i.e.javax.jcr.version, javax.jcr.observation, javax.jcr.lock. One interesting package is javax.jcr.util which contains an implementation of ItemVisitor, a Visitor-pattern interface from the famous Design Patterns by the GoF (Gang of Four).
JSR-170 implementations
Google and SourceForge will list pages of JSR-170 implementations projects but most of them are in alpha stage without any releases. Below is a list of projects that can be freely downloaded and have been used by the author:
- Jackrabbit
is the reference implementation for JSR-170. It is part of the Apache foundation and provides level 1, 2 and optional functionality. At the moment of the writing the project has passed the incubation process and has an official public release considered stable enough for production usage. Moreover Jackrabbit is used as foundation for the commercial product of Day Software, the leader of JSR-170. Besides implementing all the features defined by JSR-170, JackRabbit adds extra functionality (like SessionListeners or CustomNode registration) and has an interesting suite of contributed projects which include a JCA connector, a taglib, a WebDAV interface and a virtual file system and JDBC backend. JackRabbit is licensed under Apache 2.0.
- eXo JCR
is part of the eXo platoform and contains all the mandatory features required by the specification and several optional ones. The last release (1.0RC7) was published on 22th June 2006 and is based on the final draft 2 of the specification. eXo JCR support JDBC-compliant databases such as MySQL, DB2 or HSQL (which is the default) as backend storage and is dual licensed (GPL and commercial). A final release date for the project is unknown.
- Jeceira
is a relatively new project compared to Jackrabbit and eXo JCR. It implements some requirements from both Level 1 and Level 2 and only Observation from Optional Level at the moment of writing. Unfortunately, the project is in an incomplete state and without a new release in the last 9 months. However, it was used by Magnolia, a popular java-based CMS next to Jackrabbit as JCR-170 implementation. The date of the final 1.0 release which is planned to contain all levels, is unknown at present. Jeceira is licensed under Apache 2.0 and uses HSQL database as its storage engine.
JCR module
Part of Spring Modules, JCR module's main objective is to simplify development with the JSR-170 API in a similar manner to that of the ORM package from the main Spring distribution. Features include:
- JcrTemplate which allows execution of JcrCallbackand exception handling (transforming checked JCR exceptions into unchecked Spring DAO exceptions). The template implements most of the methods from the JCR Session and can be easily used as a replacement. Moreover the template is aware of thread-bound sessions which can be used across several methods, functionality very useful when using a transactional repository.
- RepositoryFactoryBean which configures, starts and stops the repository instances. As the JSR-170 doesn't address the way the repository should be configured, implementations vary in this regard. The support contains predefined FactoryBeans for Jackrabbit and Jeceira and an abstract base class which can easily support other repositories.
- SessionFactory which unifies the Repository,Credentials and Workspace interfaces and allows automatic registration of listeners and custom namespaces.
- Spring declarative transactional support for repositories that implement the (optional) transactional feature.
- OpenSessionInView interceptor and filter which allow the usage of the same session per thread across different components. Along with JcrTemplate, the retrieval, closure and management of the JCR session is externalized and totally transparent to the caller.
For this article, the reference implementation (Jackrabbit) will be used however, since the JCR module is using the javax.jcr interfaces changing the implementation is just a matter of configuration. Let's see step by step how to use Java Content Repository on top of Jackrabbit and how Spring Modules can help.
Repository and SessionFactory configuration
<bean id="repository" class="org.springmodules.jcr.jackrabbit.RepositoryFactoryBean">
<!-- normal factory beans params -->
<property name="configuration" value="classpath:jackrabbit-repo.xml"/>
<property name="homeDir" ref="./tmp/repo"/>
</bean>
JCR support offers the RepositoryFactoryBean class for configuring Jackrabbit which requires a JackRabbit configuration file and a home directory. Note that RepositoryFactoryBean is useful if you are working with local file systems; for server environments where the repository is likely to be registered inside JNDI you can use the JndiObjectFactoryBean helper class (part of the Spring distribution) to retrieve it:
<bean id="repository" class="org.springframework.jndi.JndiObjectFactoryBean">
<property name="jndiName" value="java:comp/env/jcr/myRepository"/>
</bean>
or using Spring 2.0 schema namespaces:
<jndi:lookup id="entityManagerFactory" jndi-name="jcr/myRepository"/>
To simplify working with the JCR, the module adds the SessionFactory interface :
public interface SessionFactory {
public Session getSession() throws RepositoryException;
public SessionHolder getSessionHolder(Session session);
}
SessionFactory hides the authentication details inside the implementation so that once configured, sessions with the same credentials can be easily retrieved. To take advantage of the implementation features (not covered by the spec), the interface allows the retrieval of SessionHolder, a JCR module specific class which is used for transaction and session management with a default, generic implementation that works for every JCR implementation but does not support optional features or customized ones (such as JackrabbitSessionHolder which supports Jackrabbit'stransaction infrastructure). JCR Module provides an easy and transparent way of discovering SessionHolder implementations (which I will discuss in more detail later on) making it easy to plug in support for other JSR-170 compliant libraries.
The default implementation of SessionFactory is JcrSessionFactory which requires a repository to work against, and the credentials.
<!-— SessionFactory -->
<bean id="jcrSessionFactory" class="org.springmodules.jcr.JcrSessionFactory">
<property name="repository" ref="repository"/>
<property name="credentials">
<bean class="javax.jcr.SimpleCredentials">
<constructor-arg index="0" value="bogus"/>
<!-- create the credentials using a bean factory -->
<constructor-arg index="1">
<bean factory-bean="password" factory-method="toCharArray"/>
</constructor-arg>
</bean>
</property>
</bean>
<!-- create the password to return it as a char[] -->
<bean id="password" class="java.lang.String">
<constructor-arg index="0" value="pass"/>
</bean>
The bean declaration is very straight forward; the only 'catch' being the password supplied to SimpleCredential constructor: it only accepts char arrays and as work around, Spring factory declaration was used.
JcrTemplate
JcrTemplate is one of the core classes of the JCR module as it offers a convenient way to work with a JCR session freeing the caller of having to deal with opening and closing of the session, transaction rollback (if they are offered by the underlying repository) and exception handling among other features:
<bean id="jcrTemplate" class="org.springmodules.jcr.JcrTemplate">
<property name="sessionFactory" ref="jcrSessionFactory"/>
<property name="allowCreate" value="true"/>
</bean>
Again the template definition is simple and resembles other template classes from the Spring framework like HibernateTemplate.
Example
Now that repository is configured, let's "springify" one of the examples from Jackrabbit's wiki page :
public Node importFile(final Node folderNode, final File file, final String mimeType,
final String encoding) {
return (Node) execute(new JcrCallback() {
/**
* @see org.springmodules.jcr.JcrCallback#doInJcr(javax.jcr.Session)
*/
public Object doInJcr(Session session) throws
RepositoryException, IOException {
JcrConstants jcrConstants = new JcrConstants(session);
//create the file node - see section 6.7.22.6 of the spec
Node fileNode = folderNode.addNode(file.getName(),
jcrConstants.getNT_FILE());
//create the mandatory child node - jcr:content
Node resNode = fileNode.addNode(jcrConstants.getJCR_CONTENT(),
jcrConstants.getNT_RESOURCE());
resNode.setProperty(jcrConstants.getJCR_MIMETYPE(), mimeType);
resNode.setProperty(jcrConstants.getJCR_ENCODING(), encoding);
resNode.setProperty(jcrConstants.getJCR_DATA(), new FileInputStream(file));
Calendar lastModified = Calendar.getInstance();
lastModified.setTimeInMillis (file.lastModified ());
resNode.setProperty(jcrConstants.getJCR_LASTMODIFIED(), lastModified);
session.save();
return resNode;
}
});
}
The main difference is that the code is wrapped inside a JCR template which frees us from having to use try/catch blocks (due to IO and Repository checked exceptions) and handling the session (and transaction if there is any) clean-up. It's worth mentioning that hard coded strings like "jcr:data" are resolved through JcrConstants utility class which is aware of namespace prefix changes and offer a clean way of dealing with JCR constants. As you can see, I just made the example more robust with minimal impact on the actual business code.
Transaction support
One of the strengths of the JCR module is the ability to use the Spring transaction infrastructure (both declaratively and programmatically) with Java Content Repository. JSR 170 treats transactional support as an optional feature and does not impose a standard way of exposing the transactional hooks so each implementation can chose a different method. At the moment of writing, only Jackrabbit is known to support transactions (in most of its operations) and it does so by exposing an javax.transaction.XAResource for each JcrSession. JCR module offers a LocalTransactionManager which can be used for local transactions:
<bean id="jcrTransactionManager" class="org.springmodules.jcr.jackrabbit.LocalTransactionManager">
<property name="sessionFactory" ref="jcrSessionFactory"/>
</bean>
For declarative transaction demarcation, I use the standard Spring classes along with the transaction manager bean declared above:
<!-- transaction proxy for Jcr services/facades -->
<bean id="txProxyTemplate" abstract="true" class="org.springframework.transaction.interceptor.TransactionProxyFactoryBean">
<property name="proxyTargetClass">
<value>true</value>
</property>
<property name="transactionManager" ref="jcrTransactionManager"/>
<property name="transactionAttributes">
<props>
<prop key="save*">PROPAGATION_REQUIRED</prop>
<prop key="*">PROPAGATION_REQUIRED, readOnly</prop>
</props>
</property>
</bean>
<bean id="jcrService" parent="txProxyTemplate">
<property name="target">
<bean class="org.springmodules.examples.jcr.JcrService">
<property name="template" ref="jcrTemplate"/>
</bean>
</property>
</bean>
If a JTA manager is required, a simple yet elegant solution is to use the jca connector from Jackrabbit contrib package. You don't necessarily need an application server for it as you can use a pluggable JCA container like Jencks. Configuring the JCA connector is outside the scope of this article; however you can find an example using Jencks inside JCR module sample.
TransactionAwareRepository
For application where plain JCR code is required, the JCR module allows the transparent use of a transaction driven session with code that uses the JCR API directly. One can use TransactionAwareRepository which takes as a parameter a JcrSessionFactory so that any new session created with Session.login() that accepts the parameters defined on the JcrSessionFactory will return the thread-bound session if one is found. Note that if transactions are used, the JCR sessions are transactional, if not you have to manually set the allowNonTxRepository property to true, as in the configuration below otherwise an exception will be thrown:
<bean id="transactionRepository" class="org.springmodules.jcr.TransactionAwareRepository">
<property name="allowNonTxRepository" value="true"/>
<property name="targetFactory" ref="jcrSessionFactory"/>
</bean>
transactionRepository bean can be used as a plain JCR Repository without any awareness of the underlying mechanism or the thread-bound session, be it transactional or not (closing the session will commit the transaction if there is one).
Optional Features Support Detection
In order to maximize code reuse but still allow pluggable optional features like transaction support for different JCR implementations, JCR module uses the SessionHolder interface (which was already mentioned) along with SessionHolderProvider and SessionHolderProviderManager interfaces. Normally, users do not have to interact with them as they are internal to the framework; however they represent the main extension points for the JCR module.
The SessionHolder class is used internally by various components, mainly the transaction manager to handle the session, while SessionHolderProvider and SessionHolderProviderManager handle the way the sessionHolders are created and how the providers are used respectively. By default ServiceSessionHolderProviderManager is used, which does automatic discovery of features through JDK 1.3 Service Provider mechanism. The manager will search the classpath for META-INF/services/org.springmodules.jcr.SessionHolderProvider entries which contain the fully qualified name of the SessionHolderProvider implementation. The Jackrabbit support is configured this way, the JCR module distribution jar containing a META-INF/services file with only one line:
org.springmodules.jcr.jackrabbit.support.JackRabbitSessionHolderProvider
The default SessionHolderProviderManager is used internally by the JcrSessionFactory so at factory startup, any custom implementations can be picked up and used with the appropriate repository. However, one can easily switch to a different discovery strategy by setting the SessionHolderProviderManager on the JcrSessionFactory. An alternative to the default service discovery is the ListSessionHolderProviderManager which accepts a list of providers allowing for custom provides (for testing for example) to be easily used.
<bean id="listProviderManager" class="org.springmodules.jcr.support.ListSessionHolderProviderManager">
<property name="providers">
<list>
<bean class="org.mycompany.jcr.CustomHolderProvider"/>
<bean class="org.springmodules.jcr.jackrabbit.support.JackRabbitSessionHolderProvider"/>
<bean class="org.springmodules.jcr.support.GenericHolderProvider"/>
</list>
</property
</bean>
<bean id="jcrSessionFactory" class="org.springmodules.jcr.JcrSessionFactory">
...
<property name="sessionHolderProviderManager" ref="listProviderManager"/>
</bean>
Note that there should one provider per repository – if the list contains multiple providers that works for the same repository, the order becomes important as the first one matched will be used.
Future of Java Content Repository
Even though JSR-170 was finished in May 2005, the work on Java Content Repository hasn't ended: JSR-283, the official successor, will focus on enhancements like federation, remoting, client/server protocol mappings and extensions to content modeling capabilities just to name a few. There are also ideas and initiatives outside the JSR: a binding/mapping framework which can transform java classes into a JCR tree and the reverse (ORM backed by the Java Content Repository instead of a database), a WebDAV server built on top of JCR (see Jackrabbit contribution package) and others. JSR-170 connectors have appeared for various products like Alfresco, BEA Portal Server and IBM Domino to name a few.
As for JCR Module, the road map includes security integration with Acegi for several implementation, support for Spring 2.0 namespace schemas (which will reduce the configuration XML) and integration with other JCR implementations. Clearly, the future of JCR looks bright.