This article includes details on the recently background compilation introduced in V8, Chrome’s JavaScript engine.
The latest browser from Google, Chrome Beta v. 33, includes an important change in the JavaScript V8 engine: the ability to run the optimizing compilation process in a background thread, letting the main thread continue to be responsive and gaining a performance boost. There are two types of compilations done by V8, according to Yang Guo, a Google Engineer working on it:
To reduce the overall time spent compiling, V8 defers compilation of JavaScript functions until immediately before they are executed the first time. This compilation phase is fast but doesn’t focus on optimizing the code, just on getting it done quickly. In V8, pieces of code that are executed very often are compiled a second time by a specialized optimizing compiler [Crankshaft]. This second compilation pass makes use of many advanced optimization techniques, meaning it takes more time than the first pass but delivers much faster code.
By doing the optimization compilation in a separate thread, the application is not only more responsive, but it is faster by 27% on Nexus 5 in the Mandreel test from the Octane 2.0 benchmark suite, according to Guo.
InfoQ performed some tests on Chrome 33, with (--js-flags="--concurrent-recompilation") and without concurrent recompilation (--js-flags="--no-concurrent-recompilation"), and noticed the following performance improvements in the Octane 2.0 benchmarks, considering the average results from 5 consecutive runs of the tests having restarting the browser between each run:
Test | Improvement |
Octane 2.0 (all 17 tests) |
7.12% |
Mandreel |
18% |
Box2DWeb |
32% |
zlib |
11% |
Higher improvements were noticed for 2D and 3D physics engines, while for the entire Octane suite of benchmarks we got a 7% improvement.
We asked Guo why optimizing compilation was not introduced when Crankshaft was released in December 2010. Making sure that we know he’s not speaking for Google and at that time he was not with the team, Guo said that improvements are done based on an actual need:
When Crankshaft was designed, latency was not much of an issue. JavaScript code still have yet to reach sizes where compilation time became noticeable, so low latency was neither an issue nor a design goal for Crankshaft. IMO, introducing concurrency at that time would have made designing a then fledgling optimizing compiler unnecessarily complicated and as such would have been a premature optimization without any immediate benefit.
Clearly, that changed over the recent years. If you look at the newest version of the Octane benchmark suite, you'll notice that some parts are over 1MB in size. This is to reflect some of the real world applications that push JavaScript engines to their limits. The Mandreel benchmark consists of 4.8MB of minimized code. In comparison, Photoshop 1.0 source code unzipped has a size of 4.4MB. Churning through that amount of code does take noticeable time, and especially becomes a problem when, for example, animation rendering is expected to complete in a blink of an eye.
Without attempting to be exhaustive, Guo also told us what were some of the challenges to be dealt with in order to implement background compilation in V8:
- As every computer scientist can tell you, multithreading is hard to get right. Good test coverage is hard to get. Bugs may be hard or impossible to reproduce due to the inherent non-deterministic behavior. Having a good set of test cases, using invariants guarded by assertions, fuzz testing and last but not least Canary test coverage can give much confidence that it's correct. Kudos to the ThreadSanitizer team btw.
- With compilation blocking execution, we can be sure that the state of the JavaScript heap, including all its objects, stays the same before and after compilation. With concurrent compilation, this assumption no longer holds. That has some implications:
- V8 has a relocating GC, meaning whenever GC kicks in, objects may be moved, so references to it have to be updated. That could very well happen while a compile job is underway. If object references kept by the compile job are not updated, we end up with invalid memory accesses.
- Execution continues during concurrent compilation. That means that the state of the VM and object content and layout can change arbitrarily. Assumptions made upon those facts at the start of the compile job may not hold at the end any longer. The code produced at the end may not even be valid. Running it would cause bugs and crashes. This has to be dealt with correctly.
- In fact, having the background thread accessing the heap at any time will very likely lead to race conditions. We avoid that by gathering all necessary information for the compile job upfront.
- Finding the correct time to kick of a compilation job in the background thread is tricky: there is just no way to foresee for sure whether investing time in optimizing a piece of code is worthwhile, and whether it should have been done earlier to reap the benefits. Formulating a heuristic solution to take care of that is even harder. A lot of fine tuning was necessary, and it is still work in progress.
- The life cycle of a piece of source code has already been complicated, with it going through interconnected states, like being lazily parsed, compiled for the first time using the fast compiler, then optimized by the optimizing compiler, then maybe deoptimized (if assumptions made at compile time break later on), etc.. With concurrent compilation, a couple of new states are added to this life cycle. Keeping track of all of them and ensure that transitioning between them is bug-free and efficient is non-trivial. Unexpected corner cases may cause problems.
According to Guo, “V8 is under active development and being steadily improved”, and that can be seen in the live performance chart maintained by Dart where V8 jumped 30% in the DeltaBlue benchmark on Feb. 11th, improvement resulting from compiler optimizations, not being related to background compilation.