BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News RavenDB 5 Improves Distributed Time-Series, Document Compression, and Indexing

RavenDB 5 Improves Distributed Time-Series, Document Compression, and Indexing

This item in japanese

RavenDB, a NoSQL document database with multi-document ACID transactions, smart document compression, adds distributed time-series support, and enhanced indexing in the RavenDB version 5 release.

The compression of documents is rarely about single large documents. As explained by Oren Eini, CEO of RavenDB:

Documents in RavenDB can be of arbitrary size. Technically, they are limited to 2 GB in size, but if you get anywhere near that, you have other issues. The worst case I have seen is a 700+ MB file, but RavenDB will issue warnings if you have documents that exceed the 5 MB range. This is mostly because the cost of sending those MB range documents back and forth. RavenDB itself is doing quite fine with those documents. However, most documents tend to be much smaller. The typical data size of documents is in the order of few to low dozens of KBs.

Instead, the challenges are typically related to compressing values inside RavenDB. Users of RavenDB would complain about repeating the same JSON structure on every document due to RavenDB not having a schema, and running into storage issues with a large number of rarely touched documents.

Other databases, such as PostgreSQL and MySQL, have mechanisms for value compression. For RavenDB 5, Zstd provides fast and efficient compression. Eini explains that,

If we can train the algorithm on the documents, we can get great benefits from removing redundancies across documents. What ends up happening is that as you write documents into a compressed collection, RavenDB watches your data and learn how to best compress it. The more you write, the more information RavenDB has to find the optimal dictionary to compress your data. This way, we are able to individually compress and decompress documents, while still retaining great compression rates.

RavenDB 5 also introduces handling data with time-series data-points with values ordered by time. Integrated into the RavenDB document model and distributed environment, time-series behavior extends specific documents to preserve context and keep operations simple. Time-series data get kept separate from the documents they extend to modify these data without changing the document rapidly. Distributed clients and nodes modify time-series concurrently, and modifications get merged without conflicts.

Time-series support in RavenDB 5 includes new APIs and GUI management, transactional guarantees, efficient querying and aggregation against large datasets, etc.

Indexing improves in RavenDB 5 to create static indexes for time-series and distributed counters and supports the use of compare-exchange values within indexes.

RavenDB 5 adds static indexing support for distributed counters values and compare-exchange keys from an index.

The RavenDB client API adds refinements for attachments, bulk insertion, compare-exchange, load balancing, patching, subscriptions, and serialization.

Further details on all RavenDB 5 improvements and changes are available in the RavenDB 5 changelog. Or watch Eini demonstrate key features of RavenDB 5:

RavenDB provides on-premise or cloud services options with AWS and Azure. RavenDB provides many open-source clients for various environments, including Node.js, Python, Java, Ruby, C++ and more.

The RavenDB client is open-source software available under the MIT license for communicating with the RavenDB application. All other RavenDB usage occurs under the AGPLv3 license. RavenDB commercial licenses including a free option are available for those who do not wish to follow the terms of the AGPLv3 license.

Contributions are welcome via the RavenDB contributions guindelines which includes a code of conduct.

Rate this Article

Adoption
Style

BT