Shortly after the 1.4 release of MongoDB (from "humongous") on March 25th, its creator Dwight Merriman (former CEO/CTO of DoubleClick) announced that 10gen, the company behind the open-source document database will offer commercial training and support for the product.
InfoQ took this opportunity to talk to Merriman about MongoDB, its features, applicability and place in the community of NoSQL databases. His answers are quoted in the appropriate sections of this article.
Introduction to MongoDB
MongoDB is a scalable, high-performance next-generation database. Data in MongoDB is stored as documents, which allow for representation of complex relationships, all within a single data object. Documents can be comprised of individual fields of primitive types, "embedded documents", or arrays of documents.
This flexibility allows a developer to model a large subset of problems in a manageable and flexible way without resorting to splitting data up into different tables. In cases where data is not optimally modeled as a single document, MongoDB has the concept of a "DBRef", which is a pointer from a field in a document to another document.
Retrieving and querying data from a MongoDB database is flexible - documents can be dynamically queried based on the main document, any field within the document, on any embedded document, or any document contained within an array. For adressing embedded documents a dot style notation is used.
Features
Written in C++, MongoDB features:
- Document-oriented storage (the power and flexibility of JSON-like data schemas)
- inner-objects, embedded arrays, geospatial information
- Dynamic queries
- Full index support, including secondary indexes
- Query profiling
- Fast, in-place updates
- Efficient storage of binary data large objects (e.g. photos and videos)
- Replication and fail-over support
- Auto-sharding for cloud-level scalability (alpha)
- MapReduce for complex aggregation
- Commercial Support, Training, and Consulting
Origin and Intent
On the goal for of MongoDB, their blog states:
MongoDB was never designed nor intended to be a niche database for a small subset of problems, but a new type of database, that solves lots of real world problems for a large subset of the developer community.
The focus of the MongoDB project is to combine the best traits of the non-relational model, including high scalability, performance, and ease of development, with important features common in traditional databases that are useful in primary operational data stores.
MongoDB wasn't designed in a lab. We built MongoDB from our own experiences building large scale, high availability, robust systems.
MongoDB was first released to the public 16 months ago, on Nov 2nd 2009. The philosophy behind explains that although transactional semantics are reduced in favor of scalability and performance a more full featured approach than just a pure key-value store is needed for general adoption and widespread usage.
Relation to DDD
The document paradigm is an interesting approach for persisting complex object structures. Especially the aggregates that are proposed by Domain Driven Design (DDD) where only the root entity can be linked to from other entities and the dependend entities and values are only accessible through the root. A MongoDB based Repository could be a simple approach to provide persistence in projects based on DDD. Another related notion is the fact that business domains often speak about documents when relating to business entities. So perhaps also using a document as representation internally makes a better fit than other datastructures or objects themselves.
Still with schema less document databases, data modelling is still important. There are several aspects of relationships that have to be considered carefully before creating documents that would otherwise lead to data duplication, poor performance and other issues.
Example and Tutorials
For example, a blog post with its main article, comments, and votes on comments would be split into multiple tables in a relational database. In MongoDB, a blog post could be represented as a single document, with the comments and votes contained as arrays of documents within the main post document. This approach makes data more manageable, and reduces the necessity for 'JOIN's that impede performance and horizontal scalability in traditional relational databases.
> db.blogposts.save({ title : "My First Post", author: {name : "Jane", id :1}, comments : [{ by: "Abe", text: "First" }, { by : "Ada", text : "Good post" }] }) > db.blogposts.find( { "author.name" : "Jane" } ) > db.blogposts.findOne({ title : "My First Post", "author.name": "Jane", comments : [{ by: "Abe", text: "First" }, { by : "Ada", text : "Good post" } ] }) > db.blogposts.find( { "comments.by" : "Ada" } ) > db.blogposts.ensureIndex( { "comments.by" : 1 } );
You can try this example directly in the interactive MongoDB web console shell which also embeds the online tutorial.
Alex Popescu the CTO of InfoQ runs the myNoSQL site with many news, reviews and comparisons of NoSQL data stores (including MongoDB) see for instance his take on production notes.
Teach Me To Code published a 3 part screencast introducing various aspects of MongoDB.
Pivotallabs provides an introductory presentation by 10gen's Michael Dirolf as video and audio version. A presentation providing a quite complete view of MongoDB from Kyle Banker is also available at slideshare.
Installation and Integration
The database is published under the GNU AGPL v3.0 license, the drivers from mongodb.org are licensed under the Apache License v2.0. Its C++ sourcecode is available from github and can be built on any operating system.
It can also be installed as binary package for Linux, MacOS X, Windows and Solaris.
MongoDB itself runs as the mongod
daemon process, the core database server, which is then accessed by the various drivers. Sharding support and database routing is provided by the mongos
service.
There are integration efforts to support MongoDB in almost every programming languages. Its drivers are available for C, C++, C# & .NET, ColdFusion, Erlang, Factor, Java, Javascript, PHP, Python, Ruby, Perl and many more.
MongoDB is also supported in other frameworks, like the "blueprints"-connector libraries of gremlin, the graph database library.
It was integrated by Debasish Ghosh as one of the available persistence modules of the scalable actors framework Akka.
Operations and Scalability
Operationally, MongoDB can be run in two modes depending on the needs of the application. The first is 'single master' mode, where there is a single master server for all writes. Reads can be performed off of this database - or can done from any number of read slaves for read scalability (usage scenario: Sourceforge)
For applications where the volume of data or frequency of writes is too high to handle on a single master, MongoDB's auto-sharding mode (currently in alpha) can be used. In this mode, writes are automatically distributed among any number of 'shards' (a shard is simply a group of one or more MongoDB servers), each of which takes responsibility for writes and reads of portions of the dataset.
In either case, MongoDB takes a 'strong consistency' approach (you would consider MongoDB a C-P system in the CAP theorem). High availability is achieved by replicating data to multiple MongoDB nodes, any of which can take the responsibility as the master in a shard at a point in time - and MongoDB handles this failover automatically. This approach allows you to have strongly consistent characteristics, which are important for a number of use cases, while still maintaining a very high level of write availability.
The mongodb site contains an Admin Center to support operations requirements like:
- Admin UIs
- Hosting Center
- Import Export Tools
- Monitoring and Diagnostics, DBA Operations from the Shell
- Database Profiler
- Sharding, Replication
- Production Notes
- Security and Authentication
- Architecture and Components
- Backups, Troubleshooting, Durability and Repair
Documentation, Support and Training
The MongoDB Documentation is available on the mongodb.org wiki (also as PDF) under a Creative Commons License.
10gen has designed MongoDB to solve real-world problems for a large subset of the application development community. In that light, we see (and as evidenced by customer deployments) MongoDB as the approach to data storage for a large proportion of database-backed applications.
Today, 10gen provides support, consulting, training, for clients who use MongoDB in their production applications. In the near future, cloud-based services (such as hosted MongoDB services), as well as advanced management tools for large MongoDB clusters will be available from 10gen.
Current Usage
Since version 1.3 MongoDB has been heavily used in production systems. Well known adaptors of the datastore are:
- Boxed Ice
- SourceForge
- Justin.tv
- GitHub
- The Business Insider and
- Disqus
Of course there are many more usecases for the document store.
Future development
The MongoDB team's vision about the Datastore is very broad. They consider the current current 1.4 release to contain about half the intended features, which they will work on in the next year.
- better replication: real time, replica sets, more options for data durability
- production ready sharding
- more features for working with embedded documents
- flushing out more atomic update operators
- single server durability
- full text search