Netflix recently introduced Hollow, a Java library and toolkit designed to efficiently cache datasets not characterized as “big data.” Such datasets may be metadata for e-commerce and search engines, or in the case of Netflix, metadata about movies and TV shows. Traditional solutions for processing such datasets include the use of a datastore or serialization, but typically suffer from reliability and latency issues. Hollow’s getting started guide summarizes the core concepts and nomenclature:
Hollow manages datasets which are built by a single producer, and disseminated to one or many consumers for read-only access. A dataset changes over time. The timeline for a changing dataset can be broken down into discrete data states, each of which is a complete snapshot of the data at a particular point in time.
The producer and the consumers handle datasets via a state engine that is transitioned between data states. A producer uses a write state engine and a consumer uses a read state engine.
Hollow replaces Netflix’s previous in-memory dataset framework, Zeno. Datasets are now represented with a compact, fixed-length, strongly typed encoding of the data. This encoding minimizes a dataset’s footprint and the encoded records are “packed into reusable slabs of memory that are pooled on the JVM heap to avoid impacting GC behavior on busy servers.”
Getting Started
To get started with a Hollow example, consider the following POJO:
public class Movie {
long id;
String title;
int releaseYear;
public Movie(long id,String title,int releaseYear) {
this.id = id;
this.title = title;
this.releaseYear = releaseYear;
}
}
A simple dataset based on the POJO above may be populated such as:
List<Movie> movies = Arrays.asList(
new Movie(1,"The Matrix",1999),
new Movie(2,"Beasts of No Nation",2015),
new Movie(3,"Goodfellas",1990),
new Movie(4,"Inception",2010)
);
Hollow translates this movies
list to the new encoding layout as shown below:
More details on this encoding can be found in the advanced topics section of the Hollow website.
The Producer
The first instance of a producer publishes an initial data state of a dataset (movies
in this example) and consumers are notified on where to find that dataset. Subsequent changes to the dataset are systematically published and communicated to consumers.
A producer uses a HollowWriteStateEngine
as a handle to a dataset:
HollowWriteStateEngine writeEngine = new HollowWriteStateEngine();
A HollowObjectMapper
populates a HollowWriteStateEngine
:
HollowObjectMapper objectMapper = new HollowObjectMapper(writeEngine);
for(Movie movie : movies) {
objectMapper.addObject(movie);
}
The HollowObjectMapper
is thread safe and can also be executed in parallel.
The producer writes the dataset (also known as a blob) to a defined output stream:
OutputStream os = new BufferedOutputStream(new FileOutputStream(snapshotFile));
HollowBlobWriter writer = new HollowBlobWriter(writeEngine);
writer.writeSnapshot(os);
Generating an API for Consumers
A client API generates necessary Java files based on the data model and must be executed before writing the initial consumer source code:
HollowAPIGenerator codeGenerator = new HollowAPIGenerator(
"MovieAPI", // a name for the API
"org.redlich.hollow.consumer.api.generated", // the path for generated API files
stateEngine); // the state engine
codeGenerator.generateFiles(apiCodeFolder);
The Consumer
Once the consumer is notified of a published dataset, the consumer uses a HollowWriteReadEngine
as a handle to a dataset:
HollowReadStateEngine readEngine = new HollowReadStateEngine();
A HollowBlobReader
consumes a blob from the producer into a HollowReadStateEngine
:
HollowBlobReader reader = new HollowBlobReader(readEngine);
InputStream is = new BufferedInputStream(new FileInputStream(snapshotFile));
reader.readSnapshot(is);
The data within the dataset can be accessed via the generated API:
MovieAPI movieAPI = consumer.getAPI();
for(MovieHollow movie : movieAPI.getAllMovieHollow()) {
System.out.println(movie._getId() + ", " +
movie._getTitle()._getValue() + ", " +
movie._getReleaseYear());
}
This will print the resulting output:
1, "The Matrix", 1999
2, "Beasts of No Nation", 2015
3, "Goodfellas", 1990
4,"Inception", 2010
The entire Hollow project can be found on GitHub.
InfoQ recently featured a detailed interview with Drew Koszewnik, senior software engineer at Netflix and lead contributor to Hollow, regarding Hollow’s specific implementation details.