Key Takeaways
- Analyzing the performance of programs is important: open-source tools for profiling have you covered
- There are two major types of profilers: Sampling and instrumenting profilers; understanding their differences will help you to choose the right type
- There are three major open-source profilers with different pros and cons: a simple profiler (VisualVM), a hackable profiler with lots of features (async-profiler), and a built-in profiler which obtains lots of additional information (JMC)
- All of these profilers are sampling profilers that approximate the results, which make them faster and less intrusive than others too, but also requires support from the Java Runtime
- Using profilers doesn’t come without risks and might sometimes cause performance degradations and rare crashes
I want to convey the foundational concepts and different types of Open Source Java profilers in this article. The article should allow you to choose the best-suited profiler for your needs and comprehend how these tools work in principle.
It is the accompanying post to my "Your Java Application Is Slow? Check Out These Open-Source Profilers" talk at QCon London 2023 in which I dive deeper into the topic and also cover the different profile viewers.
The aim of a profiler is to obtain information on the program execution so that a developer can see how much time a method executed in a given period.
But how does a profiler do this? There are two ways to obtain a profile: instrumenting the program and sampling.
Instrumenting Profilers
One way to obtain a profile is to log the entering and exiting of every method that is interesting for the developer.
This instrumentation is what many developers already do when they want to know how long a specific part of their program took.
So the following method:
void methodA() {
// … // do the work
}
is modified to record the relevant information:
void methodA() {
long start = System.currentTimeMillis();
// … // do the work
long duration = System.currentTimeMillis() - start;
System.out.println(“methodA took “ + duration + “ms”);
}
This modification is possible for basic time measurements. Still, it gives little information when nesting measured methods, as it’s also interesting to know the relationship between methods, e.g. methodB()
was executed in seconds by methodA()
. We, therefore, need to log every entry and exit into the relevant methods. These logs are associated with a timestamp and the current thread.
The idea of an instrumenting profiler is to automate this code modification: it inserts a call to the logEntry()
and a logExit()
methods into the bytecode of the methods. The methods are part of the profiler runtime library. This insertion is usually done at runtime, when the class is loaded, using an instrumentation agent. The profiler then modifies our methodA()
from before to:
void methodA() {
logEntry(“methodA”);
// … // do the work
logExit(“methodA”);
}
Instrumenting profilers have the advantage that they work with all JVMs, as they can be implemented in pure Java. But they have the disadvantage that the inserted method calls incur a significant performance penalty and skew the results heavily. The popularity of purely instrumenting profilers, therefore, has faded in recent decades. Modern profilers nowadays are mostly sampling profilers.
Sampling Profilers
The other type of profilers are sampling profilers which take samples from the execution of the profiled programs. These profilers ask the JVM in regular intervals, typically every 10ms to 20ms, for the stack of the currently running program. The profiler can then use this information to approximate the profiles. But this leads us to the major disadvantages: shorter running methods might be invisible from the profile.
The main advantage of sampling profilers is that they profile the unmodified program with low overhead without skewing the results significantly.
Modern sampling profilers typically work by running the following in a loop every 10 to 20ms:
A sampling profiler obtains the list of currently available (Java) threads for every iteration. It then chooses a random subset of threads to sample. The size of this subset is usually between 5 and 8, as sampling too many threads in every iteration would increase the performance impact of running the profiler. Be aware of this fact when profiling an application with a large number of threads.
The profiler then sends every selected thread a signal to every thread, which leads them to stop and call a signal handler each. This signal handler obtains and stores the stack trace for its thread. All stack traces are collected and post-processed at the end of every iteration.
There are other ways to implement sampling profilers, but I’ve shown you the most widely used technique that offers the best precision.
Different Open Source Profilers
Three prominent open-source profilers currently exist: VisualVM, async-profiler, and JDK Flight Recorder (JFR). These profilers are in active development and usable for various applications. All of them are sampling profilers. VisualVM is the only profiler that also supports instrumentation profiling.
We can distinguish between "external" and "built-in" profilers: external profilers are not directly implemented into the JVM but use APIs to collect the stack traces for specific threads. Profilers using only APIs can target different JVM versions and vendors (like OpenJDK and OpenJ9) with the same profiler version.
The two most prominent external profilers are VisualVM and async-profiler; their main distinguishing element is the API they use. VisualVM uses the official Java Management Extensions (JMX) to obtain the stack traces of threads. Async-profiler, on the other hand, uses the unofficial AsyncGetCallTrace API. Both have advantages and disadvantages, but the JMX and related APIs are commonly considered safer and AsyncGetCallTrace more precise.
The single built-in profiler for the OpenJDK and GraalVM is the Java Flight Recorder (JFR); it works roughly the same as the async-profiler and is as precise but slightly more stable.
I’ll cover the different profilers and their history in the following section.
VisualVM
This tool is the stand-alone version of the Netbeans profilers. Starting with Oracle JDK 6 in 2006 till JDK 8, every JDK included the Java VisualVM tool, open-sourced in 2008. This profiler later changed its name to VisualVM, and Oracle did not include it in JDK 9. According to a recent JetBrains survey, VisualVM is the most used open-source profiler. You can obtain the profiler from its VisualVM: Download website.
Its usage is quite simple; just select the JVM that runs the program you want to profile in the GUI and trigger the profiling:
You can then directly view the profile in a simple tree visualization. There is also the possibility to start and stop the sample profiler from the command line using:
visualvm --start-cpu-sampler <pid>
visualvm --stop-sampler <pid>
VisualVM is a profiler with a simple UI that is easy to use, with the caveat of using less specific JVM APIs.
Async-Profiler
One of the most commonly used profilers is async-profiler, not the least because it's embedded into many other tools like the IntelliJ Ultimate Profiler and AppIication Performance Monitors. You can download async-profiler from its the project's GitHub page. It is not supported on Windows and consists of platform-specific binaries. I created the ap-loader project, which wraps all async-profiler binaries in a multi-platform binary, making embedding and using the profiler easier.
You can use async-profiler by using the many tools that embed it or directly using it as a native Java agent. Assuming you downloaded the platform-specific libasyncProfiler.so, you can profile your Java application by just adding the following options to your call of the Java binary:
java
-agentpath:libasyncProfiler.so=start,event=cpu,file=flame.html,flamegraph …
This call will tell the async-profiler to produce a flame graph, a popular visualization.
You can also create JFR files with it:
java
-agentpath:libasyncProfiler.so=start,event=cpu,file=profile.jfr,jfr …
This call allows you to view the profile in a multitude of viewers.
For the curious now a bit of history for async-profiler:
In November 2002, Sun (later bought by Oracle) added the AsyncGetStackTrace API to the JDK, according to the JVM(TM) Tool Interface specification. The new API made obtaining precise stack traces from an external profiler possible. Sun introduced this API to add a full Java profiler to their Sun Development Studio. Then two months later, they removed the API for publicly unknown reasons. But the API remained in the JDK as AsyncGetCallTrace, and is there to this day, just not exported, so it is harder to use.
A few years later, people stumbled upon this API as a great way to implement profilers. The first public mention of AsyncGetCallTrace as a base for Java profilers is by Jeremy Manson in his 2007 blog post titled Profiling with JVMTI/JVMPI, SIGPROF and AsyncGetCallTrace. Since then, many open-source and closed-source profilers have started using it. Notable examples are YourKit, JProfiler, and honest-profiler. The development of async-profiler started in 2016; it is currently the dominant open-source profiler using AsyncGetCallTrace.
The problem with async-profiler is that it’s based on an unofficial internal API. This API is not well-tested in the official OpenJDK test suite and might break at any point. Although the wide usage of the API leads to a quasi-standardization, this is still a risk. To alleviate these risks, I am currently working on a JDK Enhancement Proposal that adds an official AsyncGetCallTrace version to the OpenJDK; see JEP Candidate 435.
The advantages of async-profiler are its many features (like heap sampling), its embeddability, its support for other JVMs like OpenJ9, and its small code base, which makes it easy to adapt. You can learn more about using async-profiler in the async-profiler README, the async-profiler wiki, and the Async-profiler - manual by use cases by Krzysztof Ślusarski.
JDK Flight Recorder (JFR)
JRockit first developed its runtime analyzer for internal use, but it also grew in popularity with application developers. Later the features were integrated into the Oracle JDK after Oracle bought the developing company. Oracle eventually open-sourced the tool with JDK11 and since then JVM interval profiling tool for OpenJDK, with no support in other JVMs like OpenJ9.
It works comparable to async-profiler, with the main distinction that it uses the internal JVM APIs directly. The profiler is simple to use by either adding the following options to your call to the Java binary:
$ java \
-XX:+UnlockDiagnosticVMOptions \
-XX:+DebugNonSafepoints \ # improves precision
-XX:+FlightRecorder \
-XX:StartFlightRecording=filename=file.jfr \
arguments
Or by starting and stopping it using the JDK command tool, jcmd
:
$ jcmd PID JFR.start
$ jcmd PID JFR.dump filename=file.jfr
$ jcmd PID JFR.stop
JFR captures many profiling events, from sampled stack traces to Garbage Collection and Class Loading statistics. See the JFR Events website for a list of all events. There is even the possibility to add custom events.
You can learn more about this tool in blog posts like JDK Flight Recorder, The Programmatic Way by BellSoft.
The main advantage of JFR over async-profiler is that it is included in the OpenJDK on all platforms, even on Windows. JFR is also considered slightly more stable and records far more events, and information. There is a GUI for JFR called JDK Mission Control which allows you to profile JVMs and view the resulting JFR profiles.
Correctness and Stability
Please keep the following in mind when using profilers like the one I’ve covered: they are just software themselves, interwoven with a reasonably large project, the OpenJDK (or OpenJ9, for that matter), and thus suffer from the same problems as the typical problems of application they are used to profile:
- Tests could be more plentiful, especially for the underlying API, which could be tested better; there is currently only one single test. (I’m working on it)
- Tests could be better: the existing test did not even fully test that the API worked for the small sample. It just checked the top frame, but missed that the returned trace was too short. I found this issue and fixed the test case.
- Lack of automated regression testing: A lack of tests also means that changes in seemingly unrelated parts of the enclosing project can adversely affect the profiling without anyone noticing.
Therefore you take the profiles generated by profilers with a grain of salt. There are several blog posts and talks covering the accuracy problems of profilers:
- Profilers are lying hobbitses
- How Inlined Code Makes For Confusing Profiles
- Why JVM modern profilers are still safepoint biased?
- Validating Java Profiling APIs
Furthermore, profiling your application might also cause your JVM to crash in rare instances. OpenJDK developers, like Jaroslav Bachorik and I, are working on fixing all stability problems as far as possible in the underlying profiling APIs. In practice, using one of the mentioned profilers is safe, and crashes are rare. If you encounter a problem, please contact the profiler developers or open a GitHub issue at the respective repository.
Conclusion
Modern sampling-based profilers for Java make it possible to investigate performance problems with open-source tools. You can choose between:
- a slightly imprecise but easy-to-use tool with a simple UI (VisualVM)
- a built-in tool with information on GC and more (JFR)
- a tool that has lots of options and can show information on C/C++ code (async-profiler)
Try them out to know what to use when encountering your next performance problem.