Lucene Oracle Integration Looks to Surpass Oracle Text

New work to enhancement LUCENE-724 (http://issues.apache.org/jira/browse/LUCENE-724) by Marcelo Ochoa allows for better integration of Lucene as a domain index from within the Oracle database. The updates, that were commissioned by Lending Club (www.lendingclub.com), allow greater flexibility than using Oracle Text - the results of the work is open source that anyone can use.

Why put Lucene in the database? Marcelo Ochoa answers this question with:

Oracle include a full feature Enterprise Engine named Oracle Text made in C and fully integrated to the Oracle Engine, but:

you can not control which functionality will be included into next release

and you can not easily customize it for your needs

The OJVMDirectory, a Lucene Integration running inside the Oracle JVM will allow developers to:

Indexing both structured and unstructured data
Create Lucene "synthetic document" that are comprised of fields populated from diverse tables, which eliminates the need for complex joins at query time
Query the Lucene index directly from Java

The features of the new release include:

Synchronized with latest Lucene 2.2.0 production

Replaced in memory storage using Vector based implementation by direct BLOB IO, reducing memory usage for large index

Support for user data stores, which means you are not limited to only one column/index at a time (limited imposed by Data Cartridge API on 10g); now you can index multiples columns at base table and columns on related tabled joined together

User Data Stores can be customized by the user, it means writing a simple Java Class users can control which column are indexed, padding used or any other functionality previous to document adding step

There is a DefaultUserDataStore which gets all columns of the query and built a Lucene Document with Fields representing each database columns these fields are automatically padded if they have NUMBER or rounded if they have DATE data, for example

lcontains() SQL operator support full Lucene's QueryParser syntax to provide access to all columns indexed, see examples below

Support for DOMAIN_INDEX_SORT and FIRST_ROWS hint, it means that if you want to get rows order by lscore() operator (ascending,descending) the optimizer hint will assume that Lucene Domain Index will returns rowids in proper order avoided an inline-view to sort it

Automatic index synchronization by using AQ's Call Back

Lucene Domain Index creates extra tables named IndexName$T and an Oracle AQ named IndexName$Q with his storage table IndexName$QT at user's schema, so you can alter storage's preference if you want

ojvm project is at SourceForge.net CVS, so anybody can get it and collaborate ;)

Tested against 10gR2 and 11g database

Examples and information can be found at dbprism/ojvm project, and the code can be downloaded from the project also.

More details on the integration can be found at Marcelo's blog.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Java topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter