InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Netflix Introduces Hollow, a Java Library for Processing In-Memory Datasets

Java

Netflix Introduces Hollow, a Java Library for Processing In-Memory Datasets

Jan 31, 2017 3 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Netflix recently introduced Hollow, a Java library and toolkit designed to efficiently cache datasets not characterized as “big data.” Such datasets may be metadata for e-commerce and search engines, or in the case of Netflix, metadata about movies and TV shows. Traditional solutions for processing such datasets include the use of a datastore or serialization, but typically suffer from reliability and latency issues. Hollow’s getting started guide summarizes the core concepts and nomenclature:

Hollow manages datasets which are built by a single producer, and disseminated to one or many consumers for read-only access. A dataset changes over time. The timeline for a changing dataset can be broken down into discrete data states, each of which is a complete snapshot of the data at a particular point in time.

The producer and the consumers handle datasets via a state engine that is transitioned between data states. A producer uses a write state engine and a consumer uses a read state engine.

Hollow replaces Netflix’s previous in-memory dataset framework, Zeno. Datasets are now represented with a compact, fixed-length, strongly typed encoding of the data. This encoding minimizes a dataset’s footprint and the encoded records are “packed into reusable slabs of memory that are pooled on the JVM heap to avoid impacting GC behavior on busy servers.”

Getting Started

To get started with a Hollow example, consider the following POJO:

    
public class Movie {
    long id;
    String title;
    int releaseYear;

    public Movie(long id,String title,int releaseYear) {
        this.id = id;
        this.title = title;
        this.releaseYear = releaseYear;
        }
    }

A simple dataset based on the POJO above may be populated such as:

    
List<Movie> movies = Arrays.asList(
    new Movie(1,"The Matrix",1999),
    new Movie(2,"Beasts of No Nation",2015),
    new Movie(3,"Goodfellas",1990),
    new Movie(4,"Inception",2010)
    );

Hollow translates this movies list to the new encoding layout as shown below:

More details on this encoding can be found in the advanced topics section of the Hollow website.

The Producer

The first instance of a producer publishes an initial data state of a dataset (movies in this example) and consumers are notified on where to find that dataset. Subsequent changes to the dataset are systematically published and communicated to consumers.

A producer uses a HollowWriteStateEngine as a handle to a dataset:

    
HollowWriteStateEngine writeEngine = new HollowWriteStateEngine();

A HollowObjectMapper populates a HollowWriteStateEngine:

    
HollowObjectMapper objectMapper = new HollowObjectMapper(writeEngine);
for(Movie movie : movies) {
    objectMapper.addObject(movie);
    }

The HollowObjectMapper is thread safe and can also be executed in parallel.

The producer writes the dataset (also known as a blob) to a defined output stream:

    
OutputStream os = new BufferedOutputStream(new FileOutputStream(snapshotFile));
HollowBlobWriter writer = new HollowBlobWriter(writeEngine);
writer.writeSnapshot(os);

Generating an API for Consumers

A client API generates necessary Java files based on the data model and must be executed before writing the initial consumer source code:

    
HollowAPIGenerator codeGenerator = new HollowAPIGenerator(
    "MovieAPI", // a name for the API
    "org.redlich.hollow.consumer.api.generated", // the path for generated API files
    stateEngine); // the state engine
codeGenerator.generateFiles(apiCodeFolder);

The Consumer

Once the consumer is notified of a published dataset, the consumer uses a HollowWriteReadEngine as a handle to a dataset:

    
HollowReadStateEngine readEngine = new HollowReadStateEngine();

A HollowBlobReader consumes a blob from the producer into a HollowReadStateEngine:

    
HollowBlobReader reader = new HollowBlobReader(readEngine);
InputStream is = new BufferedInputStream(new FileInputStream(snapshotFile));
reader.readSnapshot(is);

The data within the dataset can be accessed via the generated API:

    
MovieAPI movieAPI = consumer.getAPI();
for(MovieHollow movie : movieAPI.getAllMovieHollow()) {
    System.out.println(movie._getId() + ", " +
    movie._getTitle()._getValue() + ", " +
    movie._getReleaseYear());
    }

This will print the resulting output:

    
1, "The Matrix", 1999
2, "Beasts of No Nation", 2015
3, "Goodfellas", 1990
4,"Inception", 2010

The entire Hollow project can be found on GitHub.

InfoQ recently featured a detailed interview with Drew Koszewnik, senior software engineer at Netflix and lead contributor to Hollow, regarding Hollow’s specific implementation details.

This content is in the Java topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Netflix Introduces Hollow, a Java Library for Processing In-Memory Datasets

Write for InfoQ

Getting Started

The Producer

Generating an API for Consumers

The Consumer

This content is in the Java topic

Related Topics:

Popular in Java

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter