MongoDB Introduction
Today's NoSQL landscape includes a number of very capable contenders tackling big data problems in many different ways. One of these contenders is the very capable MongoDB. MongoDB is a document-oriented schema-less storage solution that uses JSON-style documents to represent, query and modify data.
MongoDB is well documented, easy to install and setup and just as easy to scale. It supports familiar concepts like replication, sharding, indexing and map/reduce. The MongoDB open source community is very large and active. MongoDB boasts many large and high-traffic production deployments including Disney, Craigslist, Foursquare, Github and SourceForge. MongoDB is an open source project created and maintained by 10gen.com, a company founded by former DoubleClick execs. In addition to the superb community support (in which 10gen participates), 10gen offers commercial support.
MongoDB and NoSQL: Pitfalls and Strenghts
MongoDB has the advantage of being a very approachable NoSQL solution. When I first delved into the NoSQL database world I sampled a number of Java based solutions and found myself taking a lot of time figuring out what column families were, what Hadoop's relationship to HBase is and what exactly is a ZooKeeper? While I eventually figured it all out and also found that offerings like Cassandra and HBase are obviously very solid and very provoen solutions to the NoSQL conundrum. MongoDB was easier to grasp with less concepts to overcome before I could start writing code compared to other solutions.
Like any software, MongoDB is obviously not without it's flaws. During my time spent with MongoDB I've come across a few things that I would consider "Gotchas":
- Don't treat it like an RDBMS. This may seem obvious, but MongoDB makes it so easy to create and execute complex queries that you may find yourself going overboard and run into issues with performance when trying to use it for real time queries. (like i did)
- MongoDB's indexes are binary trees. If you aren't familiar with what a b-tree is, you should probably look it up. The order in which you provide your query criteria needs to match the order in which you've created your index.
- Design your indexes carefully. This ties into the B-tree bullet point above. My first few indexes contained many fields from the document, you know - "just in case" I needed to query on them. Don't make the same mistake. One of my indexes on a pretty small collection (~10 million records) grew to over 17GB in size, larger than the collection itself. You probably don't want to index an array field if it's going to contain hundreds or thousands of entries.
- MongoDB takes a very interesting approach to addressing NoSQL; it uses BSON for storage, JSON for representation, and JavaScript for administration and Map/Reduce. As a result odd little issues like this one (broken equals operator for NumberLong) are bound to crop up now and again while MongoDB catches up in age to the more popular big data solutions.
MongoDB, Console and Drivers
Administration of MongoDB is typically done using a JavaScript client console application making complex tasks like data migration and manipulation a breeze and programmed completely using the JavaScript programming language. In this article, we will show examples of using this console. There are a myriad of production quality MongoDB clients offered today that the MongoDB community refers to as drivers. Typically there exists a driver per programming language and all of the popular programming languages are covered, as well as some of the not-so-popular. This articles shows using the Java driver for MongoDB and compares it to using an ORM library (MJORM).
Introducing MJORM: an ORM solution for MongoDB
Among the many interesting problems to solve that the recent trend in NoSQL data stores have brought to the lives of application programmers is Object Relational Mapping. Object Relational Mapping (ORM) refers to the mapping of persisted data, traditionally stored in an RDBMS, to objects used by the application. This makes working with the data more fluid and natural to the language that the application is written in.
MongoDB's document-oriented architecture lends itself very well to ORM as the documents that it stores are essentially objects themselves. Unfortunately there aren't many Java ORM libraries available for MongoDB, but there are a few like morphia-(A type-safe Java library for MongoDB), and spring-data( the MongoDB implementation of the Spring Data umbrella project).
These ORM libraries make heavy use of annotations, something that is not an option for me for a number of reasons, the most important being the portability of the annotated objects across many projects. This lead me to start the mongo-Java-orm or "MJORM" (pronounced me-yorm) project; a Java ORM for MongoDB. MJORM is MIT licensed and available as a google code project. The project is built with maven and the maven artifact repository is currently hosted by the google code subversion server. As of this writing MJORM's latest stable release version is 0.15 and is being used by a few projects in a production environment.
Getting started with MJORM
Add the MJORM library to your project
Maven users will first add the MJORM maven repository to their pom.xml file to make the MJORM artifacts available to their projects:
<repository> <id>mjorm-webdav-maven-repo</id> <name>mjorm maven repository</name> <url>http://mongo-Java-orm.googlecode.com/svn/maven/repo/</url> <layout>default</layout> </repository>
And then the dependency itself:
<dependency> <groupId>com.googlecode</groupId> <artifactId>mongo-Java-orm</artifactId> <version>0.15</version> </dependency>
This will enable you to import and use the MJORM classes in your application. If you're not using maven then you will need to download the MJORM library manually along with the dependencies listed in the MJORM pom.xml.
Create your POJOs
Now that the dependencies are in place it's time to start writing code. We'll start with our Java POJOs:
class Author { private String firstName; private String lastName; // ... setters and getters ... } class Book { private String id; private String isbn; private String title; private String description; private Author author; // ... setters and getters ... }
What we've described with this object model is that authors have an ID, a first name and a last name while books have an id, ISBN number, title, description and an author.
You may have noticed that the book's id property is a String
, this is to accommodate MongoDB's ObjectId type which is a 12-byte binary value represented as a hex string. While MongoDB requires that every document in a collection have a unique id, it doesn't require that the id be of type ObjectId
. Currently MJORM only supports ids of type ObjectId
and represents them as String
s.
You also may have noticed that the Author
object doesn't have an id. This is because it will be a sub document of the Book
document and is therefore not required to have an id. Remember, MongoDB only requires ids on the root level documents within a collection.
Create XML mapping files
The next step is creating the XML mapping files that MJORM will use to map MongoDB documents to these objects. We'll create a document per object for this demonstration, but it is perfectly reasonable to put all of your mappings into a single XML file or separate them as you see fit.
Here's Author.mjorm.xml
:
<?xml version="1.0"?> <descriptors> <object class="Author"> <property name="firstName" /> <property name="lastName" /> </object> </descriptors>
And: Book.mjorm.xml
:
<?xml version="1.0"?> <descriptors> <object class="Book"> <property name="id" id="true" auto="true" /> <property name="isbn" /> <property name="title" /> <property name="description" /> <property name="author" /> </object> </descriptors>
The mapping files are fairly self explanatory. The descriptors
element is the root element and must be present in every mapping file. Beneath it are object
elements that define each class that is being mapped to a MongoDB document. object
s then contain property
elements that describe all of the properties on the POJO and how they map to properties on the MongoDB document. A property
must contain a name
element at a bare minimum, this is the name of the property on the POJO and the name of the property on the MongoDB document. Optionally a column
attribute can be added to specify an alternate property name on the MongoDB document.
property
s with the id
attributes are considered to be the unique identifier for the object. An object
may only contain one property
element with an id
attribute. The auto
attribute tells MJORM that it should auto-generate a value for this property when persisting it.
Head over to the MJORM project website on google code for a more detailed description of the XML mapping file.
Putting it all together
Now that we've created our data model and created our mapping files to tell MJORM how to marshal and un-marshal our POJOs in and out of MongoDB we can start with the fun stuff. First we must open our connectionion to MongoDB:
Mongo mongo = new Mongo( new MongoURI("mongodb://localhost/mjormIsFun")); // 10gen driver
The Mongo
object comes from the Java driver written by the guys over at 10gen. This example opens a connection to a local MongoDB instance and uses the mjormIsFun
database. Next we create our MJORM ObjectMapper
. Currently the only implementation of the ObjectMapper
interface available in MJORM is the XmlDescriptorObjectMapper
that uses the XML schema described above although future implementations of MJORM may include support for annotations or other configuration mechanisms.
XmlDescriptorObjectMapper objectMapper = new XmlDescriptorObjectMapper(); mapper.addXmlObjectDescriptor(new File("Book.mjorm.xml")); mapper.addXmlObjectDescriptor(new File("Author.mjorm.xml"));
We've created our XmlDescriptorObjectMapper
and added our mapping files to it. Next we create an instance of the MongoDao
object provided by MJORM:
DB db = mongo.getDB("mjormIsFun"); // 10gen driver MongoDao dao = new MongoDaoImpl(db, objectMapper);
What we've done first is get an instance to the 10gen driver's DB
object. After that we create our MongoDao
providing it the DB
object and the ObjectMapper
that we created earlier. We're ready to start persisting data, lets create a Book
and save it to MongoDB
Book book = new Book(); book.setIsbn("1594743061"); book.setTitle("MongoDB is fun"); book.setDescription("..."); book = dao.createObject("books", book); System.out.println(book.getId()); // 4f96309f762dd76ece5a9595
First we created the Book
object and populated, after that we called the createObject
method on the MongoDao
passing it the collection name "books
" and our Book
object. MJORM then turns the Book
into a DBObject
(the underlying object type that 10gen's Java driver uses) using the XML mapping files that we created earlier and persists the new document into our "books
" collection. Then MJORM returns your instance of the Book
object but now with it's id
property populated. It is important to note that by default MongoDB doesn't require that you create databases or collections before using them; it creates them when needed, this can sometimes lead to confusion. A look at this new Book
in the MongoDB console may look similar to this:
> db.books.find({_id:ObjectId("4f96309f762dd76ece5a9595")}).pretty() { "_id": ObjectId("4f96309f762dd76ece5a9595"), "isbn": "1594743061", "title": "MongoDB is fun", "description": "..." }
Lets take a look at what that createObject
would look like if we were not using MJORM and instead using 10gen's Java driver directly:
Book book = new Book(); book.setIsbn("1594743061"); book.setTitle("MongoDB is fun"); book.setDescription("..."); DBObject bookObj = BasicDBObjectBuilder.start() .add("isbn", book.getIsbn()) .add("title", book.getTitle()) .add("description", book.getDescription()) .get(); // 'db' is our DB object from earlier DBCollection col = db.getCollection("books"); col.insert(bookObj); ObjectId id = ObjectId.class.cast(bookObj.get("_id")); System.out.println(id.toStringMongod()); // 4f96309f762dd76ece5a9595
We can now query for the object:
Book book = dao.readObject("books", "4f96309f762dd76ece5a9595", Book.class); System.out.println(book.getTitle()); // "MongoDB is fun"
The readObject
method reads a document by it's id from the given collection, turns it into the appropriate class (again, using our mapping files from earlier) and returns it.
An astute reader will have noticed that our Book
doesn't have an Author
, yet it was still persisted. That is due to MongoDB's schema-less nature. We can't require that a document in a collection contain any properties (other than the _id property) so creating a Book
without a Author
is perfectly OK by MongoDB. Lets add an Author
to our book and update it it:
Author author = new Author(); author.setFirstName("Brian"); author.setLastName("Dilley"); book.setAuthor(author); dao.updateObject("books", "4f96309f762dd76ece5a9595", book);
And now our Book
contains an Author
and it is persisted in MongoDB. Now lets have a look at our Book
in the MongoDB console:
> db.books.find({_id:ObjectId("4f96309f762dd76ece5a9595")}).pretty() { "_id": ObjectId("4f96309f762dd76ece5a9595"), "isbn": "1594743061", "title": "MongoDB is fun", "description": "..." "author": { "firstName": "Brian", "lastName": "Dilley" } }
As you can see, our persisted Book
now contains an author. Here's the same thing again without MJORM:
Author author = new Author(); author.setFirstName("Brian"); author.setLastName("Dilley"); book.setAuthor(author); DBObject bookObj = BasicDBObjectBuilder.start() .add("isbn", book.getIsbn()) .add("title", book.getTitle()) .add("description", book.getDescription()) .push("author") .add("firstName", author.getFirstName()) .add("lastName", author.getLastName()) .pop() .get(); DBCollection col = db.getCollection("books"); col.update(new BasicDBObject("_id", bookObj.get("_id")), bookObj);
An in depth description of all the methods provided by the MongoDao
is beyond the scope of this article. Anyone interested in using MJORM in their own projects is urged to take a look at the documentation provided by the MJORM project or the MongoDao
interface that it provides.
Conclusion
Hopefully this article has sparked some interest in MongoDB and MJORM. MongoDB is a an excellent NoSQL data store with a huge number of awesome features and is sure to be around for a very long time. If you end up using it in a Java project than you may also consider using the MJORM library for your ORM needs, and if so any feature request, bug reports, documentation or patches to the source code would be greatly appreciated!
Author Bio
Brian Dilley is an experienced senior engineer and team leader with over thirteen years of experience who specializes in Java/Java EE /Spring Framework/Linux internals and admin. Brian has a lot of experience in ground level (0+ employee) internet startup companies, getting them to market, and building/maintaining their product. He is an expert in IaaS, cloud, PHP, and Linux admin from procurement, installation and configuration of production and corporate hardware and software infrastructure including load balancing, database, web, etc. You can follow Brian on twitter at Twitter.