Uber has released an open source distributed JVM profiler, simply called JVM Profiler, in late June. The profiler provides a Java Agent to collect various metrics and stacktraces for Hadoop/Spark JVM processes in a distributed way, for example, CPU/Memory/IO metrics. The Uber team built JVM Profiler to solve resource allocation issues that they had with Apache Spark, the popular framework for processing large data streams. Although the tool was built for Spark, it is applicable to any JVM-based service or application.
Uber wanted the ability to correlate metrics across a large number of processes across tens of thousands of applications, running on thousands of machines. In their distributed environment, many Spark applications run on the same server, and each application has thousands of executors. Their existing tools could only monitor server-level metrics and did not allow them to monitor metrics for individual applications. They needed a solution that could collect metrics for each process and correlate them across processes for each application.
JVM Profiler is made up of three features that simplify collecting performance and resource usage metrics, and then publishing them to other systems (e.g. Apache Kafka) for further analysis.
- A Java agent: allows collecting metrics on JVM processes in a distributed way.
- Advanced profiling capabilities: allows tracing arbitrary methods and arguments without code changes. Makes it possible to identify slow method calls in Spark applications, and identify hot files in HDFS file paths.
- Data analytics reporting: allows for faster data analytics via Kafka topics and Apache Hive tables.
JVM Profiler has a simple and extensible design, which allows you to add additional profiler implementation and collect more metrics. This also allows you to add your own custom reporter for publishing metrics.
Out of the box, the Uber JVM Profiler supports the following features:
- Debug memory usage for all Spark application executors, including Java heap memory, non-heap memory, native memory (VmRSS, VmHWM), memory pool, and buffer pool (directed/mapped buffer).
- Debug CPU usage, Garbage Collection time for all Spark executors.
- Duration Profiling -- Debug arbitrary java class methods (how many times they run, how much duration they spend).
- Argument Profiling -- Debug arbitrary java class method call and trace it argument value.
- Stacktrack Profiling and flamegraph generation to visualize CPU time spent for the Spark application.
- Debug IO metrics (disk read/write bytes for the application, CPU iowait for the machine).
Uber's blog post on JVM Profiler has additional information on how to add a custom reporter, as well as how to use it to trace your own applications.
Uber used JVM Profiler on one of their largest Spark applications and was able to reduce the memory allocation for each executor by 2GB, going from 7GB to 5GB. They were able to save 2TB of memory for this application alone.
JVM Profiler is on GitHub at https://github.com/uber-common/jvm-profiler. Pull requests are encouraged!