Former technical leaders from Twitter and Couchbase have created FaunaDB, a new general-purpose database.
Evan Weaver, former Architect and Director of Infrastructure of Twitter, Matt Freels, former Technical Lead of database teams at Twitter, along with Chris Anderson, co-founder of Couchbase, joined forces to lead the creation of a new “adaptive operational database” they wished they had while working at Twitter. The result is FaunaDB, an object-oriented relational distributed database that promises to scale linearly.
FaunaDB is a CP system built to provide consistency and partition tolerance. It can run across multiple datacenters and it supports the failure of a minority of them without service interruption. It can be scaled up both horizontally and vertically, running on a laptop, a server, multiple servers locally or in the cloud, including virtualized or containerized settings.
Similarly to Datomic, FaunaDB keeps all instances of a data, not overwriting them but creating new ones when a write is performed. This is useful especially when auditing data, verifying its evolution over time.
From a data modeling perspective, FaunaDB attempts to be everything to everyone: relational – not SQL, but supporting joins, foreign keys and indexes-, document, graph, and object-oriented. We asked Weaver a few questions to get more insight on this new database:
InfoQ: How do you define FaunaDB?
Evan Weaver: FaunaDB is a transactional, temporal, geographically distributed, strongly consistent, secure, multi-tenant, QoS-managed operational database. It's implemented on the JVM for portability, and it's relational, but not SQL. Instead, it's queried via type-safe embedded DSLs, like LINQ. FaunaDB is a return to the general database purpose model, but built for the cloud instead of the mainframes of the 80s.
InfoQ: How is FaunaDB different from other database services such as Amazon DynamoDB or Google Firebase?
Weaver: DynamoDB and Firebase are not general purpose. DynamoDB is a key/value database with some extensions, and Firebase is a hierarchical database--a model I haven't seen since MUMPS. Neither of them are geo-replicated, and they both lock you into a single cloud vendor forever with no on-premises or multi-cloud options.
InfoQ: I understand FaunaDB can be replicated across datacenters. Is this a real-time backup procedure or can users access in the same time instances hosted in different datacenters, choosing ones closer to them to reduce latency?
Weaver: The latter. Users are automatically routed to the closest datacenter, but their data is available in real time everywhere. Currently, our cloud spans AWS and Google Cloud Platform. Later this year you will be able to select exactly which regions you want your data to be in for data sovereignty purposes.
FaunaDB can be run on-premises or in the cloud. It is also provided as a service with no operational involvement currently running on AWS and GCP with the prospect on making it available on Azure soon.
FaunaDB was written in Scala and Java and runs on the JVM on multiple operating systems including Linux, Windows, and OS X. There are drivers for several languages – Scala, Java, Java/Android, JavaScript, C#, Python, Ruby, Go and Swift, but the database can also be accessed directly through a HTTP API.