Tumblr have released Genesis, an open source tool for data center automation that consolidates the process of discovering new machines, reporting hardware details to Tumblr’s Collins inventory management system, and providing a mechanism to perform hardware configuration.
Genesis consists of a stripped down Linux image suitable to boot by PXE and a Ruby-based domain-specific language (DSL) for describing tasks to be executed on the host. Tasks are created using the Genesis DSL, which makes it easy to install packages and run commands. An example of a task provided in the Genesis Github repository is the TimedBurnin task, which performs a stress test on the system in order to increase the probability of detecting hardware errors before putting the system into production.
The Genesis DSL also provides functionality to perform hardware configuration such as altering BIOS settings and configuring RAID cards before provisioning an operating system on to the host. An example of a hardware configuration task is BiosConfigrR720, which configures the BIOS on Dell R720s to Tumblr’s exact specifications.
When a machine managed by Genesis is booted it reports the machine to Tumblr’s Collins Configuration Management Database (CMDB) application, which allows Tumblr to automate machine inventory management. The Collins Github project webpage states that this application is the source of truth and knowledge for Tumblr’s entire infrastructure. Everything about Tumblr production environments is stored and encoded in Collins, and this data is used to drive all of Tumblr’s data center automation.
The Collins application was created as a system to manage all of the physical servers, switches and racks in Tumblr production environments, and has evolved to also support inventory of hardware, IP addresses and software. Tumblr discovered that the Collins API and data provided an excellent mechanism to drive automation processes. Today Collins provides push button cluster deployment, drives configuration generation when hardware cluster topologies change, drives infrastructure updates when software configuration changes, and helps to manage software deploys.
In order to utilise Genesis to manage machines in a data center, several additional server dependencies are also required:
The Genesis Github project INSTALL.md provides further instructions and also includes the necessary server configuration options required.
The project’s Github README.md describes the general workflow of a machine that is managed by Genesis. Initially when a machine boots, the DHCP server instructs the PXE firmware to chain boot into iPXE. iPXE presents a user with list of menu choices, fetched from a remote server. When the user makes a choice the Genesis kernel and initrd (from the file server) are loaded along with parameters on the kernel command line.
Once the Genesis OS has loaded, the genesis-bootloader fetches and executes a Ruby script describing a second stage where required base RPMs and Ruby gems are installed, and Genesis tasks are fetched from the remote server. Finally, the tasks are executed. The project’s Github README.md provides a real world example.
Consider a brand new server that boots up. It makes a DHCP request and loads the iPXE menu. In this case, we know that we haven't seen this MAC address before, so it must be a new machine. We boot Genesis in to discovery mode, where the tasks it runs are written to fetch all the hardware information we need and report it back to the Collins. In our setup this includes information such as hard drives and their capacity and the number of CPUs, but also more detailed information such as service tags, which memory banks are in use, and even the name of the switchports all interfaces are connected to. We then follow this up with 48 hours of hardware stress-test using the TimedBurnin task.
Genesis includes a virtual test environment based on VirtualBox to enable end-to-end testing of changes to the framework and new tasks. More information about the test environment and configuration instructions can be found in the project’s Github ‘testenv’ sub-directory README.md.
The Tumblr blog states that Genesis has replaced an older system which was a collection of shell scripts, and the Ruby DSL has enabled a more flexible, easy to understand, and easy to maintain system that more of their staff can use and extend.
Genesis is open-sourced under the Apache License, and is still in the early stages of development. Tumblr have requested that bug reports, feature requests or questions can be raised via the project’s Github repository or the Genesis-users Google Group.