Genie is a distributed, RESTful task orchestration engine for the data platform from Netflix. Genie has two primary use cases: the first is for creating and submitting custom data-processing task requests, the second is for setting up local environments to develop and test new applications and tasks to running on a Genie cluster.
Netflix announced that Genie 3 has several new features, including a redesign of the earlier task execution engine, security functionality, dependency caching, and API changes.
Earlier versions of the genie engine didn't have leadership election, resulting in workers unnecessarily executing the same tasks. Now, cluster leadership is supported through Zookeeper or as a manual configuration property set to a single node's IP address. The single run script for all task runtimes in earlier versions outgrew what could safely separate concerns, and reduced the project maintainers' ability to isolate risk when introducing code changes as the project grew. The Genie 3 approach is to keep runtimes and their configuration modular, descriptive, and versioned with an improved data model.
In Genie 3, tasks are composed of several abstractions that ensure scalability . Application runtimes and their executable commands are configurable via their API's. Genie generates application run scripts for things like Spark, Hadoop, Pig, Hive, PrestoDB, and Sqoop independent of the specifc runtime configuration, or data to process. As a developer, the local-mode workflow generates run-scripts for various runtimes, but also integrates with the underlying implementation REPL's and stdout for testing and development support.
The Genie 3 API components cover the scope of Genie's functionality. The Clusters API, Commands API, Applications API, and a Jobs API provide the semantics required to operate Genie 3. The Cluster API is responsible for managing Genie's logical worker cluster and doesn't include the infrastructure itself. The Clusters API helps manage metadata about the cluster, the baseline state of the worker cluster on set-up, Commands available to the cluster, and baseline package installations. It's much like Puppet or Chef in that manner, but focuses on work-cluster specific packages and configurations needed by Genie application commands at runtime. This allows for flexibility and bootstrapping the startup time for applications that would otherwise have to download and install dependencies on startup.
Command api semantics operate on the underlying Application and have to reference to a specific Application, but also allow for their own startup and runtime configuration.
The Jobs api allows Genie users to schedule or executes sequences of Commands, and tracks data about the status of job execution. Genie 3 ships with JDBC drivers for MySql, PostgreSQL, and HSQLDB for configuration storage supporting Genie and its API.
Genie 3 added also OAuth2 and X.509-based public key certificate support.