Last year, the developers behind Neo4j improved usability, making it pleasant and easy to work with connected data. This year, the team has focused on making it easier to get data into the graph as well as improving many aspects of performance, including write scalability, memory efficiency for massive graphs, improving the ability to handle highly concurrent workload across large clusters of Neo4j servers, etc. During the last six months they have also witnessed a doubling in the size of their contributor community.
At Dataweek 2014, Neo4j's CEO Emil Eifrem busted the popular myth around the restrictive applications of graphs for social networks, by sharing customer use-cases in the areas of network management, routing and identity and access management. These classic graph problems as illustrated in his talk exist in enterprises across a number of verticals such as finance, telecom, healthcare amongst others. We briefly caught up with Emil after his talk to get a deeper perspective since our last update on the underlying technology and architecture.
Implicit indexing through a new primitive termed labels was one of these primary changes, which my collegue Charles Humble covered in an interview with Ian Robinson. The implementation of associated semantic constraints are a work in progress.
Another interesting development in version 2.1 was the partitioning of relationships on the basis of type and direction, which enables efficient graph traversal and searching by pruning away uninteresting relationships. This is particularly applicable for dense nodes, where a dense node can be defined as a node with an order of magnitude imbalance between incoming and outgoing relationships. For example, a celebrity on twitter will have more followers as compared to outgoing following relationships.
For visual thinkers, the Neo4j browser visualizer ships with Neo4j. It provides a spring layout of the data and provides various controls to manipulate the visualization using GRAph Style Sheet(GRASS) and to interact with the data using CYPHER.
For data engineers, the introduction of a built-in ETL capability in version 2.1, to migrate data from traditional relational stores to Neo4j is a blessing.
The next two releases are primarily focused on performance improvements such as
-
improving memory efficiency for clustered applications with high levels of concurrency with massive graphs, through off-heap caching of nodes & relationships
-
order-of-magnitude improvements in write throughput, by piggy backing writes in micro-batches
-
further improvements write efficiency by eliminating internal dependencies on JTA for keeping the graph and indexes in sync
-
a new & much faster query optimizer for Neo4j’s Cypher query language
-
eliminating the upper bound on the total number of nodes, currently in the order of tens of billions, by increasing the size of some internal pointers
-
a new version of the Spring Data Neo4j driver optimized for client-server (rather than embedded) use, with order-of-magnitude gains expected over previous remote access methods for Spring