Orbitz Worldwide, a leading global online travel company, has open sourced two monitoring tools Extremely Reusable Monitoring API (ERMA) and Graphite, a persistence and visualization component. ERMA is a home grown Java API and library that has been used in several web applications at Orbitz to capture monitoring statistics in the applications at run-time. It is the source of streams of events that are processed in order to raise alerts when a service is down or running slower than defined response time thresholds.
The scope of monitoring and managing the online travel agency web sites at Orbitz includes dozens of web applications, hundreds of VMs and thousands of Jini services. ERMA is based on three technologies: Complex Event Processing (CEP), Java Management Extensions API (JMX) and Aspect Oriented Programming (AOP). Orbitz uses Aspects to inject monitoring logic into the application code and CEP to handle monitoring events. CEP offers a high throughput and low latency solution for processing large amounts of monitoring data. The resulting system has improved manageability, by reducing the mean time to resolution (MTTR) for customer-impacting events caused by software availability, reliability and performance issues. As a result, Orbitz has access to data for more than one hundred thousand distinct event types with minimal development cost. The ability to handle such large volumes of data enables the company to monitor services at a very fine-grained resolution as needed. Also, the hardware cost of adding new monitoring applications is minimal with this technology.
Doug Barth and Matthew O'Keefe did a presentation at JavaOne 2008 conference on Complex Event Processing at Orbitz which is a case study of the application level monitoring in Orbitz web sites. In the technical session, they demonstrated how ERMA and Graphite tools are being used in the applications at Orbitz, with samples of monitoring statistics and the visualization console. A multimedia recording of the presentation is available online with a free Sun Developer Network membership.
ERMA
ERMA is a Java technology-based instrumentation API which is as simple to use as a logging API yet flexible enough through configuration to satisfy most requirements for logging, monitoring, analytics, and other event processing needs. It dynamically correlates events occurring on a single application thread. The resulting event patterns can also be joined across distributed VMs servicing a user request, enabling efficient drill-down root cause analysis for errors and latency as well as bottom-up impact analysis.
The two main elements of ERMA are the Monitor and MonitorProcessor interfaces. Application code is instrumented using the Monitor implementations. Developers can also use annotations like @Monitored to mark the java classes that need to be monitored. The @Monitored annotation can be declared at the type (class or interface) level or it can be declared at the method level. Orbitz uses a mix of direct ERMA instrumentation and AOP-style instrumentation using Spring and AspectJ based configuration.
MonitorProcessors are responsible for making use of the raw data produced by instrumented code. Orbitz has a number of MonitorProcessor implementations, including one that streams ERMA events from hundreds of VMs to a CEP engine in real time where it is aggregated and processed with high throughput and low latency.
ERMA is available for download on its Launchpad site. The team has completed the testing of new ERMA 3.0 release. All of their internal wiki docs related to ERMA have been copied over to the new ERMA Wiki site.
ERMA uses the Apache Commons JEXL library to provide simple configuration syntax for declaring which Monitors go to which MonitorProcessors. This allows the application to dynamically route the flow of information at run time. The JUnitPerf testing framework is used during development to make sure that no significant latency is introduced by adding monitoring logic to the application code.
The visualization part of the solution includes a Netcool/OMNIBus console displaying SNMP events, and Graphite, a persistence and visualization system developed in house. Orbitz has used JoeSNMP API from OpenNMS project to develop an operator that can deliver streams of data from the event processing application to the Network Operations Center via SNMP.
Graphite
Graphite is a Python web application that has been developed to provide scalable storage and visualization for numeric time-series data. It receives the output from the CEP engine, which consists of data aggregated for over 70,000 metrics. A Graphite portlet was developed in order to integrate with a monitoring portal, which is based on JBoss Portal framework. The portal presents tabular and graphical views of vital system statistics and RSS feeds for alarms. Users can subscribe to feeds for particular alarm severities and/or affected applications. The Graphite web application itself is used to display visual graphs of monitoring statistics generated using RESTful URLs. Graphite data can be presented as line graphs, pie charts or raw CSV data. Graphite also provides a web-based command line interface, which power users can use to very quickly and easily create and share dashboards containing collections of related graphs.
Graphite can be downloaded from the Launchpad site. Chris Davis and the Graphite development team have also created a wiki with documentation on how to use the visualization tool. If you want to send any feedback to the development team, there is a form on the site you can use (requires a user registration on the site). The team will receive an e-mail whenever a question is posted on the site.
The future of complex event processing at Orbitz includes event pattern monitoring capabilities. This includes real-time click stream analysis with correlation capabilities. The company is developing a solution that will reduce the volume of alarms delivered to the operator by bundling customer-impacting event information with root cause estimation determined by detection of patterns of discrete events. As its business grows, it is imperative that its Operations team can manage the system in a scalable manner by relying on automated actionable event detection. The future road-map of ERMA includes the integration with open source ESP engine Esper.
For more InfoQ coverage on ESP topic, check out InfoQ ESP section which has covered other CEP products from IBM (WebSphere Business Events), BEA (WebLogic Event Server), and Esper.