Motivation
In many applications, there is a user requirement to search or look up domain entities. It is either required as an entry point into the application or as a mechanism for filling in forms. Typically, this is solved either by navigation (presenting the domain hierarchically so the user can locate and choose an item) or by a search form (presenting the user with a form containing a number of fields they can search on).
The reality is that both approaches are sub-optimal from a usability perspective. The navigational approach quickly becomes slow and cumbersome when there are large numbers of entities. Also, the user usually knows exactly what entity they are looking for, yet they are forced to navigate the hierarchy to find it. The search-form approach is also limited by the number of fields that can be searched on. There is a trade-off between being able to search on enough fields and the complexity of the search form itself.
The answer to this, from a usability perspective, is to provide a single google-style search box where the user can enter terms which match on any field of the entity(s) they are searching for, and are presented with results that match those terms. It can be an auto-complete, google-suggest style entry field in a form, or a regular search with tabular results, but the essence of the solution is that the UI is simple, the user enters whatever criteria they choose, and the search does all the hard work. The only question now is how to implement this search functionality.
When faced with the task of implementing a traditional, multi-field search form, most applications turn to SQL. The search fields typically match columns that the SQL query will use in LIKE clauses. However, due to the complexity of the SQL required to match on so many fields, and potentially, the size of the text of those fields, the performance of this implementation is usually very poor. The second problem is the fact that there is no ranking for the search results, hits are returned not according to how relevant they are to the search query, but simply if they match the query or not. Thirdly, there is no support for highlighting the search terms that matched in the search results.
Very quickly most applications realize that what is required here is a search engine. All fields of an entity can be indexed as if it was a single document, and then regular text searches can be performed to retrieve the matching entities. One of the well known open-source search engines out there is Lucene. Lucene is a wonderful search engine, used by many applications successfully. It provides a low level search engine API, with the ability to index data using Lucene data structures (Document/Field), and the ability to search on them using a query API or a search engine query. It is available in a number of languages including Java, C# and C++.
If we analyze a typical web application, it usually has a very common structure and characteristics. Usually, the application works with a backend relational database. It has a domain model representing the major entities within the system, and uses an ORM framework to map the domain model to the database. Most times it uses a service layer framework to manage transactions, coordination and sometimes business logic, and a web framework. The question now is how to integrate Lucene into such an application.
When trying to integrate Lucene into an application, once over the relatively small hurdle of getting the first spike working, one quickly runs into a number challenges. The first problem is indexing the application data. Before long, quite a lot of boiler-plate code is devoted to mapping the application domain model into Lucene data structures and getting it back out again. The Lucene Document, Lucene's prime data structure, is a flat, Map-like data structure which only contains strings - so not an insignificant amount of code is devoted to "marshalling" and "unmarshalling" the domain objects in and out of it. Another problem is lack of transactions support in Lucene, making saving the domain model into the database and the search engine problematic. There are several other well known practices and patterns that should be implemented when using Lucene, caching and invalidating searchers, creating aggregate "all" properties for supporting the google-style search, having identifiable Documents for proper update semantics, and more... .
Introducing Compass
The aim of Compass is to simplify the integration of search functionality into any application. Compass is built on top of Lucene using a well defined search engine abstraction. Compass extends core Lucene and adds transaction support and fast updates, as well as the ability to store the index in the database. Also, most importantly, it does not try to hide Lucene's features - all of Lucene's functionality is available through Compass.
Compass Core API
Compass provides a simple and familiar API. The API is familiar since it mimics (where applicable) current ORM tool APIs in order to lower Compass learning curve, Compass API revolves around several main interfaces:
- CompassConfiguration: Used to configure compass based on a set of settings, configuration file, and mapping definitions. It is then used to create a Compass instance.
- Compass: A thread safe instance used to open Compass Sessions for single thread usage. Also provide some search engine index level operations.
- CompassSesssion: The main interfaces for performing search engine operations like save, delete, find, and load. Very lightweight and non thread safe.
- CompassTransaction: An interface for controlling Compass transactions. Its usage is not required for transactionally managed environments (like Spring/JTA).
Here is a simple example of using the API:
// Compass is configured and created on an application scope
CompassConfiguration conf =
new CompassConfiguration().setConnection("/tmp/index").addClass(Author.class);
Compass compass = conf.buildCompass();
// A request scope operation
CompassSession session = compass.openSession();
CompassTransaction tx = null;
try {
tx = session.beginTransaction();
...
session.save(author);
CompassHits hits = session.find("jack london");
Author a = (Author) hits.data(0);
Resource r = hits.resource(0);
...
tx.commit();
} catch (CompassException ce) {
if (tx != null) tx.rollback();
} finally {
session.close();
}
In order to simplify the transaction management code Compass provides several options, the first is using CompassTemplate which uses the popular template design pattern in order to abstract away transaction management code. The second option applies when working in a transactionally managed environment, where Compass integrates with transaction managers like JTA and Spring Transactions and runs within an already running transaction. In such a case, CompassSession can be used by using a proxy that automatically joins the transaction when a Session operation is performed. The proxy creation can be programmatic or using Spring Ioc (@CompassContext injection support in Spring 2).
Compass supports atomic transaction operations and integrates with different transaction management strategies including: Local transaction management, JTA Sync and XA for JTA integration, and Spring Synchronization integration.
Compass configuration is based on key value settings. Compass can be configured using programmatic configuration, xml DTD based configuration (defining mappings and settings), and an expressive xml schema based configuration. The xml schema based configuration can also be used with Spring 2 new schema based configuration support.
Search Engine Mappings
One of Compass main features is the ability to have declarative mapping from within the application model into the search engine. Compass Search Engine domain model comprises of Resource (a Lucene Document) and Property (a Lucene Field). This are abstract data objects used to index searchable content.
RSEM
The first mapping is RSEM (Resource/SearchEngine Mapping). This is a low level mapping from Compass Resource and Property search engine abstraction (map to Lucene Document and Field) to the search engine. Here is an example of RSEM definition for an Author resource:
<resource alias="author">
<resource-id name="id"/>
<resource-property name="firstName"/>
<resource-property name="lastName" store="yes" index="tokenized"/>
<resource-property name="birthdate" converter="mydate"/>
</resource>
Here, we define a Resource mapped against the author alias. The Resource mapping has an id associated with the resource, and several additional properties. Defining properties is optional, though they do allow to declaratively control the different properties characteristics, including the ability to associate one with a converter. The following is a code that fills an author resource with data and index it.
Resource r = session.createResource("author");
r.addProperty("id", "1")
.addProperty("firstName", "jack")
.addProperty("lastName", "london")
.addProperty("birthdate", new Date());
session.save(r);
Some of Compass features are exposed in the above code fragment. The first is thanks to the fact that a Resource is identifiable Compass will update the Resource if it already exists in the index. The second feature is the ability to declaratively assign a converter to a property, with the ability to use many of Compass built in converters. Here is the Compass configuration for the mentioned code (including the mydate converter configuration):
<compass-core-config xmlns="http://www.opensymphony.com/compass/schema/core-config"
xsi:schemaLocation="http://www.opensymphony.com/compass/schema/core-config
http://www.opensymphony.com/compass/schema/compass-core-config.xsd">
<compass name="default">
<connection>
<file path="index" />
</connection>
<converters>
<converter name="mydate" type="org.compass.core.converter.basic.DateConverter">
<setting name="format" value="yyyy-MM-dd" />
</converter>
</converters>
<mappings>
<resource location="sample/author.cpm.xml" />
</mappings>
</compass>
</compass-core-config>
OSEM
The second mapping supported is OSEM (Object/Search Engine Mapping). It allows to map the application object domain model into the search engine. The following is an example of the Author class, with OSEM definitions using annotations:
@Searchable
public class Author {
@SearchableId
private Long id;
@SearchableComponent
private String Name;
@SearchableReference
private Listbooks;
@SearchableProperty(format = "yyyy-MM-dd")
private Date birthdate;
}
// ...
@Searchable
public class Name {
@SearchableProperty
private String firstName;
@SearchableProperty
private String lastName;
}
OSEM supports marshalling and unmarshalling an object hierarchy into a Resource. When saving the Author object, Compass will marshall it into a Resource, with the Name class marshalled into the same Resource that represents the Author (thanks to the component mapping), and a reference to each book in the author list of books (which are stored in other Resources). The resulting resource will then be saved/indexed into the search engine.
Compass provides a very flexible mechanism for mapping the domain model into the search engine. The above sample is just a simple example of it. OSEM allows to specify custom converters, multiple meta-data (maps to Resource Property) per class property, analyzers, 'all' field participation, and many more.
Here is an example of how the author class can be used:
// ...
Author author = new Author(1, new Name("jack", "london"), new Date());
session.save(author);
// ...
author = (Author) session.load(Author.class, 1);
XSEM
The last search engine mapping supported in Compass is XSEM (Xml/Search Engine Mapping). This mapping allows to map xml data structures into the search engine directly based on xml mapping definitions (driven by xpath). The XSEM process goes through the same marshalling and unmarshalling process from and to Resources. Compass introduces an xml wrapper object called XmlObject that has different implementations (dom4j, W3C Document) which also allows for xpath expression evaluation. If we take the following xml data structure:
<xml-fragment>
<author id="1">
<firstName>Jack</firstName>
<lastName>London</lastName>
</author>
</xml-fragment>
And here is a possible XSEM definition:
<compass-core-mapping>
<xml-object alias="author" xpath="/xml-fragment/author">
<xml-id name="id" xpath="@id" />
<xml-property xpath="firstName" />
<xml-property xpath="lastName" />
<xml-content name="content" />
</xml-object>
</compass-core-mapping>
The mapping maps from the xml data structure using xpath expressions into the search engine. xml-content mapping allows to store the xml structure as is within the search engine so it can be used when loading/searching the data. Compass supports several xml dom libraries (for xml-content mapping) including JSE 5, dom4j (SAX and XPP) and custom implementation can be easily implemented. Here is an example of how it can be used:
Reader reader = // construct an xml reader over raw xml content
AliasedXmlObject xmlObj = RawAliasedXmlObject("author", reader);
session.save(xmlObj);
// ...
Resource resource = session.loadResource("author", 1);
// since we have xml-content, we can do the following as well
XmlObject xmlObj = session.load("author", 1);
Compass Gps
Compass Gps is a module within Compass aiming at integrating Compass with different data sources. The most popular data source integration is Compass integration with different ORM tools. Compass support JPA, Hibernate, OJB, JDO and iBatis.
If we take Hibernate as an example, Compass introduces two main operations: Indexing and Mirroring. Indexing allows to automatically index the database content using both Hibernate mappings and Compass mappings. Objects that have both mappings will be automatically fetched from the database using Hibernate and saved into the search engine. Mirroring allows to automatically mirror operations done using Hibernate API into the search engine by registering event listeners with Hibernate. This allows to keep the index up to date with any changes done to the database through Hibernate API. The following show how to use Compass Gps Hibernate integration:
SessionFactory sessionFactory = // Hibernate Session Factory
Compass compass = // set up a Compass instance
CompassGps gps = new SingleCompassGps(compass);
CompassGpsDevice device = new Hibernate3GpsDevice("hibernate", sessionFactory);
gps.addDevice(device);
// start the gps, mirroring any changes made through Hibernate API
// to be mirrored to the search engine
gps.start();
// ....
// this will cause the database to be indexed
gps.index();
// this will cause Hibernate to store the author in the database
// and also index the author object through Compass
hibernateSess.save(new Author(1, new Name("jack", "london"), new Date()));
Summary
This article is a brief introduction to Compass and its main features, but it only covers the basics of how to use Compass (most notably, Compass has an extensive integration module with Spring). Compass also covers many of the mundane and small nuances when using a search engine, and has extensive configuration support. The main goal of Compass, as stated before, is to simplify the integration of Search into any type of application, and this brief article covers the basics of how it can be done.
About the Author
Shay is the founder of the Compass open source project, a unique solution enabling search capabilities into any application model. He started working on mission critical real time C/C++ systems, later moving to Java (and never looked back). Within the Java world, Shay has worked on a propriety implementation of a distributed rule engine(RETE) server, your typical Java based web projects, and messaging based projects within the financial industry. Currently, Shay is a System Architect at GigaSpaces.