In this week's podcast, QCon chair Wesley Reisz talks to Tal Weiss, CEO of OverOps, recently re-branded from Takipi. The conversation covers how the OverOps product works, explores the difference between instrumentation and observability, discusses bytecode manipulation approaches and common errors in Java based applications.
A keen blogger, Weiss has been designing scalable, real-time Java and C++ applications for the past 15 years. He was co-founder and CEO at VisualTao which was acquired by Autodesk in 2009, and also worked as a software architect at IAI Space Industries focusing on distributed, real-time satellite tracking and control systems.
Key Takeaways
- OverOps uses a mixture of machine code instrumentation and static code analysis at deployment time to build up an index of the code
- Observability is how you architect your code to be able to capture information from its outputs. Instrumentation is where you come in from the outside and use bytecode or machine code manipulation techniques to capture information after the system has been designed and built.
- Bytecode instrumentation is a technique that most companies can benefit from learning a bit more about. Bytecode isn’t machine code - it is a high-level programming language. Being able to read it really helps you understand how the JVM works.
- There are a number of bytecode manipulation tools you can use to work with bytecode - ASM is probably the most well known.
- A fairly small number of events within an application’s life-cycle generate the majority of the log volume. A good practice is to regularly review your log files to consider if what is being logged is the right thing.
Subscribe on:
OverOps
1m:21s - OverOps allows you to see the source code and variable state for the entire call stack from a tiny-link added to the log file.
3m:00s - To do this in a performant way, with an overhead of less than 1%, the product sits between the JVM and the processor itself.
3m:44s - OverOps uses a mixture of machine code instrumentation and static code analysis at deployment time to build up an index of the code. This avoids needing to use metadata or reflection at run time; instead, OverOps can capture the raw memory state at a low level and reconstruct that into the source code.
SaaS vs On-Premise
5m:44s - OverOps started as a SaaS product, but given that a lot of the data it collects is potentially sensitive, they introduced a new product called Hybrid. Hybrid separates the data into two independent streams: data and metadata.
6m:42s - The data stream is the raw data that is captured which is then privately encrypted using 256 bit AES encryption keys which are only stored on the production machine and by the user when they need to decrypt it. The metadata stream is not sensitive since it is just an abstract mathematical graph.
7m:18s - Because the data stream is already privately encrypted, that stream can be stored behind a firewall and never needs to leave a company’s network.
7m:39s - For major banks and telcos where regulatory constraints make it impossible to use even this hybrid SaaS approach, OverOps offers a Docker container of the backend that allows you to run the entire system on premise.
8m:24s - As a start-up you need to be able to do both SaaS and on-premise. If you are SaaS only you will get locked out from a number of accounts
Instrumentation vs. Observability
8m:59s - Observability is how you architect your code to be able to capture information from it. Instrumentation is where you come in from the outside and use bytecode or machine code manipulation techniques to capture information after the system has been designed and built.
10m:42s - The JVM allows for bytecode instrumentation so you can patch in additional bytecode and have the JIT compiler take it out and re-compile it. The technique is often used for performance management.
11m:14s - You can also instrument the machine code as OverOps does. The advantage of doing this is that you are operating close to the hardware which gives you a lot of speed and visibility. However, you have to write for specific processors loosing platform independence, and it’s hard to do.
Working with bytecode
11m:49s - bytecode instrumentation is a technique that most companies can benefit from learning a bit more about.
13m:32s - bytecode isn’t machine code - it is a high level programming language. Being able to read it really helps you understand how the JVM works, for example how garbage collection works, understanding stack traces, profiles and so on.
14m:30s - One of the interesting things about bytecode is that it is malleable in the JVM at runtime - it isn't immutable.
15m:46s - If reflection is the ability to reflect on the structure of objects, bytecode instrumentation is the ability to do that to code.
16m:40s - Weiss did a course for O’Reilly that teaches you how to do it. You can learn it in a day. All you do is add an attribute to the JAR file to say, “this is going to be an agent.” This gives it permission to manipulate code at the JVM level.
17m:41s - An agent gains a pre Main() method that is a callback the JVM will call into giving you an instrumentation object which allows you to look at all the classes and methods that are loaded and manipulate it.
18m:20s - There are a number of bytecode manipulation tools you can use to work with bytecode - ASM is probably the most well known.
19m:04s - There is also a plug-in called The ASMifier which shows you the bytecode representation in real-time as you write code in the editor.
20m:12s - The JVM has you covered. The bytecode verifier checks the bytecode regardless of how you add it.
Common errors in Java code
22m:27s - A fairly small number of events within an application’s life-cycle generate so much of the log volume. As architects, it’s worth mapping this out because so often we have a lot of stuff in log files that we don’t really need and stuff we do need is missing.
25m:10s - Often errors come from data validation and data processing rather than null pointer problems - number format, index out of bounds errors and so on. It comes about because data has a tendency to be corrupt.
25m:55s - Class Loading problems and issues around dynamic linkage are another very common cause of problems.
26m:40s - NullPointer is quite rare because developers have become so sensitive to it.
27m:33s - You need to continuously look at the logs and optimising them.
Resources
- InfoQ eMag: Java Agents and Bytecode
- Basic JVM agent project on Github
- Debugging Java and Scala - O'Reilly Course
Bytecode manipulation frameworks
- ASM. Weiss talks about ASM and the Eclipse Plug-in (Bytecode Outline) part way through this JavaOne presentation.
- AspectJ
- BCEL
- Byte Buddy
- CGLIB
- Cojen
- Javassist
- Serp