BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Introducing Reladomo - Enterprise Open Source Java ORM, Batteries Included!

Introducing Reladomo - Enterprise Open Source Java ORM, Batteries Included!

This item in japanese

Key Takeaways

  • Reladomo is an enterprise grade Java ORM developed at Goldman Sachs and released as an open source project in 2016.
  • Reladomo provides a number of unique and interesting features, like a strongly typed query language, sharding, temporal support, real testability, and high-performance caching.
  • Reladomo is an opinionated framework based on a set of core values that guided its development.
  • Examples in the article illustrate Reladomo's usability and programmability.

Way back in 2004 we faced a difficult challenge. We needed a way to abstract out the database interaction in our Java applications, and it didn't fit into any existing frameworks. The application had the following needs that were outside the customary solutions:

  • The data was highly sharded. There were over 100 databases with the same schema, but different data.
  • The data was bitemporal (we'll explain this in part 2 of this article, stay tuned!).
  • The queries against the data were not necessarily static, some had to be dynamically created from a complex set of user inputs.
  • The data model was reasonably complex - several hundred tables.

We began Reladomo development in 2004. The first production deployment was later that year and we've been releasing regularly ever since. In the intervening years, Reladomo has become widely adopted at Goldman Sachs, and the applications that use it have guided the major new features we've added. It is now used in multiple ledgers, middle office trade processing, balance sheet processing, and dozens of other applications. Goldman Sachs released Reladomo (short for relational domain objects) as an open source project under the Apache 2.0 license in 2016.

Why build another ORM?

Quite simply, our core requirements were not met by existing solutions, and traditional ORMs had issues that needed to be addressed.

We decided to eliminate code level boilerplate and vestigial constructs. In Reladomo there are no connections to acquire, close, leak or flush. There is no Session. There is no EntityManager. There is no LazyInitializationException. API's are provided in two fundamental ways: on the domain objects themselves and highly enhanced via the strongly type List implementations.

The Reladomo query language is another critical point for us. String based languages were a poor fit for our applications and also for object oriented code in general. Concatenating strings together to form dynamic queries does not work well for anything but the most trivial queries. Maintaining these dynamic queries based on string concatenation is an exercise in frustration.

Sharding is another area where we needed complete native support. The sharding in Reladomo is very flexible and can deal with the same primary key value appearing in different shards, referring to different objects. The shard query syntax naturally fits in the query language.

Temporal (uni- and bitemporal) support, to help database designers record and reason about temporal information, as described by Richard Snodgrass in his book Developing Time-Oriented Database Applications in SQL  is a truly unique feature of Reladomo. It's applicable in many places, from all manner of accounting systems, to reference-data to any place that requires full reproducibility. Even simple application, such as a project collaboration tool, can benefit from a uni-temporal representation, enabling a user interface to act like a time machine and show how things have changed.

Real testability was high on the list of things to get right and we decided early on that the only way to do that right was to eat our own cooking: the large majority of Reladomo tests are written using Reladomo's own test utilities! We have a pragmatic view of testing. We want tests to add long term value. Reladomo tests are easy to setup and enable execution of all production code against an in-memory database, allowing for continuous integration tests. These tests help developers understand the interactions with the database, without ever having to configure a development environment with an installed database.

Finally, we didn't want to compromise on performance. One of the most important, technically sophisticated parts of Reladomo is its cache. It's a keyless, multi-index, transactional object cache. An object is cached as an object and its data is guaranteed to occupy a single memory reference. The object cache is augmented by a query cache that references the same objects. The query cache is smart - it will not return stale results. The cache works correctly when multiple JVMs write to the same data using Reladomo. It can be configured to be on-demand or full-on cache at startup. For the right kind of data and application, the objects can even be stored off-heap for large scale caching with replication. We have caches that exceed 200GB running in production.

Principled Development

Reladomo is positioned as a framework, not a library. A framework goes beyond what a library provides by being prescriptive and opinionated about which coding patterns fit well and which don't. Reladomo also generates code and the generated API is expected to be used liberally throughout the rest of your code. It is therefore imperative that the framework code and the application code have a unified outlook on what works well and what doesn't.

We define our core values so our potential users can decide if Reladomo is right for them:

  • Target code that is meant to run in production for many years or even decades.
  • Don't repeat yourself.
  • Make code changes easy.
  • Write code in a domain based object oriented way.
  • Do not compromise correctness and consistency.

These core values and their consequences are explained in detail in our Philosophy and Vision document.

Usability and Programmability

We'll use a couple of small domain models to demonstrate some of Reladomo’s features. First, a non-temporal model about pets:

Second, a model for a text book ledger:

In this model, an account trades securities (product), and the product has any number of identifiers (known as synonyms). The accumulated balances are kept in the Balance object. Balance can represent any number of accumulated values about the account, such as quantity, taxable income, interest, etc. You can see the code for these models on github.

As we will soon see, this is an example of a bitemporal model. For now, we will ignore the temporal bits, they don't get in the way.

The model is defined by creating a Reladomo object definition for each conceptual object, and using them to generate some classes. We expect the domain classes you define to be used as your real business domain. After the initial generation, the concrete classes in your domain are never overwritten. Their abstract super-classes are generated every time the model or Reladomo version changes. You can - and should - add methods to these concrete classes and check those into your version control system.

The majority of the APIs provided by Reladomo are on the generated classes: PetFinder, PetAbstract and PetListAbstract in our pet example. PetFinder has the normal get/set methods and few other methods for persistence. The really interesting parts of the API are on the Finder and List.

As the name suggests, the class specific Finder (e.g. PersonFinder) is used to find things. Here is a simple example:

Person john = PersonFinder.findOne(PersonFinder.personId().eq(8));

Notice that there is no connection or session to obtain and close. The object retrieved is a valid reference in all contexts. You're free to pass it to different threads and have it participate in a transactional unit of work. findOne throws an exception if more than one object is returned.

Let's break down this expression. PersonFinder.firstName() is an Attribute. It's typed (it's a StringAttribute): you can call firstName().eq("John"), but not firstName().eq(8) or firstName().eq(someDate). It also has special methods that you don't find on other types of attributes, for example:

PersonFinder.firstName().toLowerCase().startsWith("j")

Methods like toLowerCase(), startsWith() and many others are not available on say, an IntegerAttribute, which has its own set of specialized methods.

All of this creates two important usability points: First, your IDE can help you write correct code. Second, when you make a change to your model, the compiler will find all the places that need a change.

An attribute has methods on it that create an Operation, such as eq(), greaterThan(), etc. Operations in Reladomo are used to retrieve objects, via Finder.findOne or Finder.findMany. Operation implementations are immutable. They can be combined together with and() and or():

Operation op = PersonFinder.firstName().eq("John");
op = op.and(PersonFinder.lastName().endsWith("e"));
PersonList johns = PersonFinder.findMany(op);

Applications that perform a lot of IO tend to load data in bulk. That can mean use of in-clauses. If we construct this operation:

Set<String> lastNames = ... // a large set, say 10K elements
PersonList largeList =
    PersonFinder.findMany(PersonFinder.lastName().in(lastNames));

Behind the scenes, Reladomo analyzes your Operation and generates a corresponding SQL. What sql would be generated for a large in-clause? In Reladomo, the answer is: "It depends". Reladomo can choose to issue multiple in-clause statements, or use a temp table join depending on the target database. The choice is transparent from the user perspective. Reladomo's implementation will return the correct results efficiently based on the operation and the database. The developer doesn't have to make choices that will invariably be wrong if the configuration changes, or write complex code to deal with the variability. Batteries are included!

Primary keys

Primary keys in Reladomo are any combination of an object's attribute. There is no need to define a key class or treat these attributes any differently. Our philosophy is that composite keys are very natural in all models and there should be no barrier to using them. In our simple Trade model, the ProductSynonym class has a natural composite key:

<Attribute name="productId" 
    javaType="int" 
    columnName="PRODUCT_ID" 
    primaryKey="true"/>
<Attribute name="synonymType" 
    javaType="String" 
    columnName="SYNONYM_TYPE" 
    primaryKey="true"/>

Of course, synthetic keys are useful in some scenarios. We support synthetic key generation using a table based, high performance approach. The synthetic keys are generated in a batch, asynchronously and on demand.

Relationships

Relationships between classes are defined in the model:

<Relationship name="pets" 
    relatedObject="Pet"
    cardinality="one-to-many" 
    relatedIsDependent="true" 
    reverseRelationshipName="owner">
   this.personId = Pet.personId
</Relationship>

Defining a relationship provides three read capabilities:

  • Get method on the object, possibly with a get method on the related object if the relationship is marked bidirectional via the reverseRelationshipName attribute, for example, person.getPets()
  • Navigation for the relationship on the finder, for example, PersonFinder.pets().
  • Ability to deep fetch the relationship on a per query basis.

Deep fetching is the ability to retrieve related objects in an efficient way, to avoid the well-known N+1 query problem. If we retrieve some person objects, we can ask for their pet objects to be loaded efficiently.

PersonList people = ...
people.deepFetch(PersonFinder.pets());

Or a more interesting example:

TradeList trades = ...
trades.deepFetch(TradeFinder.account()); // Fetch accounts for these trades
trades.deepFetch(TradeFinder.product()
                    .cusipSynonym()); // Fetch the products and the 
          // products’ CUSIP synonym (a type of identifier) for these trades
trades.deepFetch(TradeFinder.product()
                    .synonymByType("ISN")); // Also fetch the products’ ISN 
                                            // synonym (another identifier).

Any part of the reachable graph can be specified. Note how this is not implemented as part of the model. The model has no notion of "eager" or "lazy". It's the particular piece of code that specifies this concern. It's therefore not possible for a change to the model to drastically change the IO and performance of existing code, which in turn makes the model more agile.

Relationships can be used when creating an Operation:

Operation op = TradeFinder
                  .account()
                  .location()
                  .eq("NY"); // Find all trades 
                             // belonging to NY accounts.
op = op.and(TradeFinder.product()
                  .productName()
                  .in(productNames)); // … and whose product name 
                                      // is included in the supplied list
TradeList trades2 = TradeFinder.findMany(op);

Relationships are implemented without actual references in Reladomo. That makes adding a relationship cost-free in terms of memory and IO.

Relationships in Reladomo are quite flexible. Consider a textbook example of a Product object that has many different types of synonyms (e.g. CUSIP, Ticker, etc). We've defined this example in our Trade model. The traditional one-to-many relationship from Product to ProductSynonym is hardly ever useful:

<Relationship name="synonyms" 
    relatedObject="ProductSynonym" 
    cardinality="one-to-many">
   this.productId = ProductSynonym.productId
</Relationship>

The reason for that is that it's quite rare to want all of a product's synonyms to be returned in your query. Two types of advanced relationships make this common example a lot more useable. A relationship with constant expressions allows important business concepts to be represented in the model. For example, if we want to access the product's CUSIP synonym by name, we add this relationship:

<Relationship name="cusipSynonym" 
    relatedObject="ProductSynonym" 
    cardinality="one-to-one">
   this.productId = ProductSynonym.productId and
   ProductSynonym.synonymType = "CUS"
</Relationship>

Notice how we already used this cusipSynonym relationship in the deepFetch and query examples above. That has three benefits: First, we don't have to repeat "CUS" all over the code. Second, we don't pay the IO cost of retrieving all synonyms if all we want is the CUSIP. Third, the query is much more readable and more idiomatic to write.

Composability

One of the biggest problems with string based queries is that they are very hard to compose. By having a type safe, domain-based object oriented query language, we've taken composability to the next level. To illustrate this, let's look at an interesting example.

In our Trade model, both the Trade object and the Balance object have relationships to Account and Product. Let's say you have a GUI that allows retrieving Trades by filtering on Account and Product. A different window allows retrieval of Balances by filtering on Account and Product. Naturally, because we're dealing with the same entities, the filters are the same. With Reladomo, it's easy to share the code between the two. We’ve abstracted the Product and Account business logic into several GUI component classes which we then use:

public BalanceList retrieveBalances()
{
   Operation op = BalanceFinder.businessDate().eq(readUserDate());
   op = op.and(BalanceFinder.desk().in(readUserDesks()));

   Operation refDataOp = accountComponent.getUserOperation(
      BalanceFinder.account());

   refDataOp = refDataOp.and(
      productComponent.getUserOperation(BalanceFinder.product()));

   op = op.and(refDataOp);

   return BalanceFinder.findMany(op);
}

This issues the following SQL:

select t0.ACCT_ID,t0.PRODUCT_ID,t0.BALANCE_TYPE,t0.VALUE,t0.FROM_Z,
       t0.THRU_Z,t0.IN_Z,t0.OUT_Z
from   BALANCE t0
       inner join PRODUCT t1
               on t0.PRODUCT_ID = t1.PRODUCT_ID
       inner join PRODUCT_SYNONYM t2
               on t1.PRODUCT_ID = t2.PRODUCT_ID
       inner join ACCOUNT t3
               on t0.ACCT_ID = t3.ACCT_ID
where  t1.FROM_Z <= '2017-03-02 00:00:00.000'
       and t1.THRU_Z > '2017-03-02 00:00:00.000'
       and t1.OUT_Z = '9999-12-01 23:59:00.000'
       and t2.OUT_Z = '9999-12-01 23:59:00.000'
       and t2.FROM_Z <= '2017-03-02 00:00:00.000'
       and t2.THRU_Z > '2017-03-02 00:00:00.000'
       and t2.SYNONYM_TYPE = 'CUS'
       and t2.SYNONYM_VAL in ( 'ABC', 'XYZ' )
       and t1.MATURITY_DATE < '2020-01-01'
       and t3.FROM_Z <= '2017-03-02 00:00:00.000'
       and t3.THRU_Z > '2017-03-02 00:00:00.000'
       and t3.OUT_Z = '9999-12-01 23:59:00.000'
       and t3.CITY = 'NY'
       and t0.FROM_Z <= '2017-03-02 00:00:00.000'
       and t0.THRU_Z > '2017-03-02 00:00:00.000'
       and t0.OUT_Z = '9999-12-01 23:59:00.000' 

The ProductComponent and AccountComponent classes are fully reusable for Trade (see BalanceWindow and TradeWindow). But composability doesn't stop there. Let's imagine the business requirements changed and for the Balance window only, the users wanted balances that fit account filter or the product filter instead. With Reladomo, that would be a one line code change:

refDataOp = refDataOp.or(
      productComponent.getUserOperation(BalanceFinder.product()));

The issued SQL is now very different:

select t0.ACCT_ID,t0.PRODUCT_ID,t0.BALANCE_TYPE,t0.VALUE,t0.FROM_Z,
       t0.THRU_Z,t0.IN_Z,t0.OUT_Z
from   BALANCE t0
       left join ACCOUNT t1
              on t0.ACCT_ID = t1.ACCT_ID
                 and t1.OUT_Z = '9999-12-01 23:59:00.000'
                 and t1.FROM_Z <= '2017-03-02 00:00:00.000'
                 and t1.THRU_Z > '2017-03-02 00:00:00.000'
                 and t1.CITY = 'NY'
       left join PRODUCT t2
              on t0.PRODUCT_ID = t2.PRODUCT_ID
                 and t2.FROM_Z <= '2017-03-02 00:00:00.000'
                 and t2.THRU_Z > '2017-03-02 00:00:00.000'
                 and t2.OUT_Z = '9999-12-01 23:59:00.000'
                 and t2.MATURITY_DATE < '2020-01-01'
       left join PRODUCT_SYNONYM t3
              on t2.PRODUCT_ID = t3.PRODUCT_ID
                 and t3.OUT_Z = '9999-12-01 23:59:00.000'
                 and t3.FROM_Z <= '2017-03-02 00:00:00.000'
                 and t3.THRU_Z > '2017-03-02 00:00:00.000'
                 and t3.SYNONYM_TYPE = 'CUS'
                 and t3.SYNONYM_VAL in ( 'ABC', 'XYZ' )
where  ( ( t1.ACCT_ID is not null )
          or ( t2.PRODUCT_ID is not null
               and t3.PRODUCT_ID is not null ) )
       and t0.FROM_Z <= '2017-03-02 00:00:00.000'
       and t0.THRU_Z > '2017-03-02 00:00:00.000'
       and t0.OUT_Z = '9999-12-01 23:59:00.000' 

Notice the structural differences between this SQL and the previous one. The requirement changed from "and" to "or", we changed the code from "and" to "or" and it works. Batteries included! If this were implemented with string based or any query mechanism that exposes "joining", the requirement change from "and" to "or" would be much more involved.

CRUD and Unit of Work

Reladomo APIs for CRUD are on the object and list implementations. The object has methods like insert() and delete(), whereas the list has bulk methods. There is no "save" or "update" method. Setting a value on a persisted object will update the database. Most writes are expected to be performed in a transaction, which is implemented via the command pattern:

MithraManagerProvider.getMithraManager().executeTransactionalCommand(
tx ->
{
   Person person = PersonFinder.findOne(PersonFinder.personId().eq(8));
   person.setFirstName("David");
   person.setLastName("Smith");
   return person;
});

 

UPDATE PERSON
SET FIRST_NAME='David', LAST_NAME='Smith'
WHERE PERSON_ID=8

Writes to the database are combined and batched, with the only constraint being correctness.

The PersonList object has a lot of useful methods on it that provide collection based APIs. For example, you can do:

Operation op = PersonFinder.firstName().eq("John");
op = op.and(PersonFinder.lastName().endsWith("e"));
PersonList johns = PersonFinder.findMany(op);
johns.deleteAll();

From all appearances you might think this first resolves the list and then deletes the person records one by one, but that is not the case. Instead, it'll issue this (transactional) query:

DELETE from PERSON
WHERE LAST_NAME like '%e' AND FIRST_NAME = 'John'

That's nice, but it's not the only type of bulk delete that a real production application requires. Consider a situation where the application needs to purge old data. The data is obviously no longer in use, and therefore has no requirement for a holistic transaction around the entire set. The data needs to be removed, probably in a background process, in a best effort way. For that, you can use:

johns.deleteAllInBatches(1000);

This issues different types of queries depending on the destination database:

MS-SQL:

delete top(1000) from PERSON 
where LAST_NAME like '%e' and FIRST_NAME = 'John'

PostgreSQL:

delete from PERSON 
where ctid  = any (array(select ctid 
                         from PERSON 
                         where LAST_NAME like '%e' 
                         and FIRST_NAME = 'John' 
                         limit 1000))

And it tries very hard to do its work, handle temporary failures and return when all is done. That's what we mean by "batteries included" - common patterns are baked in and easy to do.

Ease of integration

We've structured Reladomo to make it easy to integrate with your code.

First, Reladomo has very few dependencies. At runtime, there are only six jars (main library jar and five shallow dependencies) on the classpath. For a full production deployment, you'll need a driver class, an slf4j log implementation and your own code. This gives you great freedom to pull in whatever else you need without having to worry about jar conflicts.

Second, we are committed to providing backwards compatibility in Reladomo. You should be able to upgrade your version of Reladomo without breaking your code. If we plan a change that would cause a backward incompatible change, we'll ensure you'll have at least a year to switch to the new API.

Conclusion

While we value usability very highly ("batteries included!"), we recognize that there are many different use cases and trying to be everything to everyone does not work.

One of the problems that plague traditional ORMs is leaky abstractions. Our core values, when properly implemented, create a very compelling system that eschews these leaky abstractions. There is no native query or stored procedure support in Reladomo and that's not an accident. We try very hard not to write documentation that reads "feature X is supported if Y is supported by the underlying database".

There is a lot more to Reladomo than we haven't covered here. Feel free to visit us on Github, have a look at the documentation and the Katas (our set of tutorials to learn Reladomo). In the second part of this article (coming in June) we'll showcase some of the performance, testability, and enterprise features of Reladomo.

About the Author

Mohammad Rezaei is a Technology Fellow in the Platforms business unit at Goldman Sachs. Mohammad is the chief architect of Reladomo. He has extensive experience writing high performance Java code in a variety of environments, from partitioned/concurrent transactional systems to large memory systems that require lock free algorithms for maximum throughput. Mohammad holds a B.S. in Computer Science from University of Pennsylvania and a Physics PhD from Cornell University.

Rate this Article

Adoption
Style

BT