Early March the Relevance team around Rich Hickey and Stuart Halloway announced a new database platform that they've worked on since 2010.
Datomic leverages recent developments in distributed computing, especially storage as a service, the availability of arbitrary numbers of application servers and the requirement for scaling reads more than writes. Other important aspects of datomic are:
- separation of read and write concerns,
- strong transactional guarantees on writes,
- the notion of immutable, append only databases
- database snapshots in time as queryable values
- and making time (transactions) part of the core datastructure the Datom being a fact of (Entity, Attribute, Value, Transaction)
- Datalog as a logic based, structural query language which allows for complex queries including inferred joins
Datalog Example (shipping costs exceeding product price) [:find ?customer ?product :where [?customer :shipAddress ?addr] [?addr :zip ?zip] [?product :product/weight ?weight] [?product :product/price ?price] // function call into application code [(Shipping/estimate ?zip ?weight) ?shipCost] [(<= ?price ?shipCost)]]
The Datomic architecture consists of these integral building blocks:
- a distributed, fast storage service (AWS Dynamo DB on SSD)
- a single transactor service, only responsible for serializing writes into a consistent data stream
- a peer library which is part of the application and handles querying and index/data fetching
Rich also discussed Datomic in an upcoming interview with InfoQ's Werner Schuster and spoke about it at his Keynote at Clojure/West.
Since the announcement which was accompanied by video presentations about the Datomic architecture and Datalog there have been a number of interesting discussions.
Sergio Bossa and Daniel Spiewak questioned some of the design decisions of datomic.
One is the reduced write throughput and the selection of a single transactor as single point of failure and main bottleneck.
Another being the decision to move massive amounts of data to the code (applications executing queries) instead of moving code to the data as many other approaches (like map-reduce) do right now.
Rich Hickey answered those on Alexandru Popescu's and Michael Fogus' blogs.
He pointed out that the transactor can be build as a highly available component and there is the possibilty to create multiple, parallel "sharded" datomic databases which can be cross-queried. He also outlined that Datomics sweet spot is not in extremely high write throughput but scaling of reads, rich querying and a consistent transactional system.
The answer to moving the data to the application discusses the current strain of database servers to take care of too many concerns, like querying, writing, sharding, optimzing, logging, monitoring and many more. Datomic tries to separate these concerns. Applications are much easier to scale out than database servers. They can as well take care of different query- and use-case responsibilities catering for different query characteristics and data needs (hot dataset).
Another interesting point was that Datomic can be seen as globaly distributed index. It is updated regularly in the storage. Additionally index-deltas are being constantly computed in the transactor and each application to be virtually merged with the main index. The cached immutable index segments allow the query engine to retrieve the targeted pieces of the database values directly without the need of transporting large parts of the database to the client.
The current offerings of Datomic covers:
- the peer library which also comes with an in memory implementation of transactor and storage service (for development)
- a virtual box appliance that contains a transactor instance and persistent data storage service (for testing and small apps)
- a public, commerical offering on AWS with a free tier (1000 hours) using Amazons' Dynamo DB