Some time ago, when MongoDB 2.6 was released Kelly Stirman, Director of Products at MongoDB answered our questions regarding the latest release. Now with MongoDB 3.0 announced for March and MongoDB 3.0 RC-8 already available, it’s time to see what this release brings to NoSQL users, as answered by Kelly Stirman.
What is the benefit for a MongoDB “DBA” and the application developer from the new pluggable storage engine API?
With users building increasingly complex data-driven apps, there is no
longer a “one size fits all” database storage technology capable of
powering every type of application built by the business. Modern
applications need to support a variety of workloads with different
access patterns and price/performance profiles – from low latency,
in-memory read and write applications, to real time analytics to
highly compressed “active” archives.The new pluggable storage engine API allows MongoDB to be extended
with new capabilities through new storage engines, including optimal
use of specific hardware architectures. This approach significantly
reduces developer and DBA complexity compared to running multiple
databases. Users can leverage the same MongoDB query language, data
model, scaling, security and operational tooling across different
applications, each powered by different pluggable MongoDB storage
engines.What is unique to the industry – multiple storage engines can co-exist
within a single replica set. Therefore, the same data can be used to
serve applications with different workloads using different storage
engines. Furthermore, this design allows migrations between engines,
including upgrades to WiredTiger, to be performed with no downtime for
applications.
WiredTiger storage engine can now provide document level locking. What is the average performance gain that users may expect using it?
We expect users to experience 7x-10x performance gains for write
intensive workloads. For many applications, WiredTiger will provide
significant benefits in the areas of lower storage costs, greater
hardware utilization, and more predictable performance, especially by
reducing query latency in 95th and 99th percentiles.
NoSQL database design often leads to rich and deep nested documents. Are there plans to facilitate such design patterns with more fine grained locking in each document?
The document data model provides enormous flexibility in modeling data
for different use cases. We believe that document-level concurrency
provides the right balance of minimal overhead and fine-grained
control. It is analogous to row-level locking in relational databases.
Keep in mind the overhead for locks in MongoDB is extremely low – a
few microseconds. We do not expect users to observe significant lock
wait times in their systems. However, as hardware and use cases evolve
we will continue to work closely with the community to evaluate future
capabilities in this area.
What is the storage gain users can expect from data compression?
Up to 80%, dependent on the compression library and the type of data
being compressed. In addition to reduced storage space, compression
enables much higher storage I/O scalability as fewer bits are read and
written to disk.By introducing compression, operations teams get higher performance
per node and reduced storage costs. Teams have the flexibility to
configure specific compression algorithms for collections, indexes and
the journal, choosing between:
Snappy (the default library for documents and the journal), providing a good balance between high compression ratio – typically around 70%, depending on document data types – and low CPU overhead.
zlib, providing higher document and journal compression ratios for storage-intensive applications, at the expense of extra CPU overhead.
Prefix compression for indexes reducing the in-memory footprint of index storage by around 50% (workload dependent), freeing up more of the working set for frequently accessed documents.
Administrators can modify the default compression settings for all
collections and indexes. Compression is also configurable on a
per-collection and per-index basis during collection and index
creation.
MongoDB recently updated its MMS offering. How does MMS compare to MongoDB DBaaS providers?
With MongoDB managed by MMS, users get to control more and have access
to a richer feature set, but have to manage more (i.e., the underlying
hardware infrastructure, availability SLAs, backup schedules).With MongoDBaaS from a partner such as MongoLab, you don’t have to
manage as much, but you get less control (i.e. over hardware
selection, cluster design, scaling, performance optimization,
deployment locations).Whether running MongoDB with MMS or running MongoDB provisioned by a
DBaaS partner, you still get to build and run your apps on the fastest
growing database ecosystem on the planet!
In the recent years, there is a trend for SQL databases to become friendlier to NoSQL workloads. With Postgres adding first JSON support and recently rich operations in JSON values, how does MongoDB fare against SQL databases that come into the NoSQL space?
While this move is a powerful endorsement of MongoDB’s flexible
document model, it is more of a band aid that falls short of meeting
the market needs that have made MongoDB the fastest-growing database
on the planet.MongoDB is the only post-relational database, coupling the best
qualities of relational databases (expressive query language,
secondary indexing and strong consistency) with the flexibility and
scalability of NoSQL.Relational databases provide no standardized approach to sharding, and
are therefore unable to scale beyond a single server. To scale out,
users must typically manually shard their database at the application
level, which adds significant development complexity and inhibits the
ability to elastically scale the database as workloads evolve.Here are some questions to consider if you’re evaluating a relational
database for its JSON support. If you answer ‘yes,’ to any of these
questions, the relational database may not be able to meet your
requirements:
Do you need the ability to query, index and manipulate JSON data, including data embedded in sub-documents and arrays?
Do you need to scale (e.g., automatically partition your JSON data across multiple nodes)?
Do you need to deploy a geographically distributed environment?
Do you need native high availability (i.e., replication, automated failover), rather than bolting on external clustering frameworks?