BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Classloader Acrobatics: Code Generation with OSGi

Classloader Acrobatics: Code Generation with OSGi

Porting great infrastructure to OSGi often means solving complex class loading problems. This article is dedicated to the frameworks that face the hardest issues in this area: those that do dynamic code generation. Incidentally these are also the coolest frameworks: AOP wrappers, ORM mappers, and service proxy generators are just a few examples.

We will examine in order of increasing complexity some typical classloading problems and develop a tiny bit of code to solve the most interesting one. Even if you don't plan to write code generation frameworks any time soon, this article can give you some insight into the low level operation of a modular runtime with statically defined dependencies, such as OSGi.

This article comes with a working demo project that contains not only the code presented here, but also two ASM-based code generators you can play with.

Classload Site Conversion

Porting a framework to OSGi usually requires it to be refactored to the extender pattern. This pattern allows the framework to delegate all classloading to OSGi but at the same time retain control over the lifecycle of application code. The goal of the conversion is to replace the traditional plethora of classloading policies with loading classes from the application bundle. For example we want to replace code like this:

ClassLoader appLoader = Thread.currentThread().getContextClassLoader();
Class appClass = appLoader.loadClass("com.acme.devices.SinisterEngine");
...
ClassLoader appLoader = ...
Class appClass = appLoader.loadClass("com.acme.devices.SinisterEngine");

With:

Bundle appBundle = ...
Class appClass = appBundle.loadClass("com.acme.devices.SinisterEngine");

Although we must do a non-trivial amount of work to get OSGi to load the application code for us, we at least have a nice and correct way to get things working. And now they will work even better than before! Now the user can add/remove applications just by installing/uninstalling bundles into the OSGi container. Also the users can break up their application into as many bundles as they wish, share libraries between applications, and utilize other such capabilities of modularity.

Since the context classloader is the current standard way for frameworks to load application code, it deserves a few extra words. Currently OSGi does not define a policy for setting the context classloader. For this reason developers need to know in advance when a framework relies on the context loader, and set it manually every time they call into that framework. Because this is error-prone and inconvenient the context loader is almost never used under OSGi. There are efforts under way to define how the OSGi container should automatically manage the context classloader. Until an official standard emerges it is best to convert the sites where is it used into classloads from a concrete application bundle.

Adapter ClassLoader

Sometimes the code we convert has externalized its classloading policy. This means the classes and methods of the framework take explicit ClassLoader parameters, allowing us to dictate where they load application code from. In this case the conversion to OSGi can become a mere question of adapting a Bundle object to the ClassLoader API. This is a classic incarnation of the adapter pattern:

public class BundleClassLoader extends ClassLoader {
  private final Bundle delegate;
 
  public BundleClassLoader(Bundle delegate) {
    this.delegate = delegate;
  }
 
  @Override
  public Class<?> loadClass(String name) throws ClassNotFoundException {
    return delegate.loadClass(name);
  }
}

Now we can pass this adapter to the converted framework code. We can also add bundle tracking code to create the adapters as new bundles come and go - for example, we can adapt a Java framework to OSGi "externally", avoiding the effort of browsing through the codebase and the converting each individual classload site. Here is a highly schematic sample of some code that converts a framework to use OSGi classloading:

...
Bundle app = ...
BundleClassLoader appLoader = new BundleClassLoader(app);

DeviceSimulationFramework simfw = ...
simfw.simulate("com.acme.devices.SinisterEngine", appLoader);
...

Bridge ClassLoader

Many interesting Java frameworks do fancy classworking on client code at runtime. The goal usually is to dynamically build classes out of stuff living in the application's class space. Let's call these generated classes enhancements. Usually the enhancement implements some application-visible interface or extends an application-visible class. Sometimes additional interfaces and their implementations are also mixed in.

Enhancements augment application code - the generated objects are meant to be called directly by the application. For example, a service proxy is passed to the application code to free it from the need to track a dynamic service. Similarly, a wrapper that adds some AOP feature is passed to the application code in place of the original object.

Enhancements start their lives as byte[] blocks, produced by your favorite class engineering library (ASM, BCEL, CGLIB, ...). Once we have generated our class, we must turn the raw bytes into a Class object, i.e. we must make some ClassLoader call its defineClass() method on our bytes. We have three separate problems to solve:

  • Class space completeness - First we must determine the class space into which we can define our enhancements. It must "see" enough classes to allow the enhancements to be fully linked
  • Visibility - ClassLoader.defineClass() is a protected method. We must find a good way to call it
  • Class space consistency - Enhancements mix classes from the framework and the application bundles in a way that is "invisible" to the OSGi container. As a result the enhancements can potentially be exposed to incompatible versions of the same class

Class space completeness

Enhancements are backed by code private to the Java framework that generated them - this implies that the framework should introduce the new class into its own class space. On the other hand, the enhancements implement interfaces or extend classes visible in the application class space, which implies that we should define the enhancement class there. We cannot define a class in two class spaces at the same time, so we have a problem.

Because there is no class space that sees all required classes, we have no other option but to make a new class space. A class space equals a ClassLoader instance, so our first job is to maintain one dedicated ClassLoader on top of every application bundle. These are called Bridge ClassLoaders, because they merge two class spaces by chaining their loaders:

public class BridgeClassLoader extends ClassLoader {
  private final ClassLoader secondary;
 
  public BridgeClassLoader(ClassLoader primary, ClassLoader secondary) {
    super(primary);
  }
 
  @Override
  protected Class<?> findClass(String name) throws ClassNotFoundException {
    return secondary.loadClass(name);
  }
}

Now we can use the BundleClassLoader developed earlier:

  /* Application space */
  Bundle app = ...
  ClassLoader appSpace = new BundleClassLoader(app);
 
  /*
   * Framework space
   *
   * We assume this code is executed inside the framework
   */
  ClassLoader fwSpace = this.getClass().getClassLoader();
 
  /* Bridge */
  ClassLoader bridge = new BridgeClassLoader(appSpace, fwSpace);

This loader will first serve requests from the application space - if that fails, it will then try the framework space. Notice that we still let OSGi do lots of heavy lifting for us. When we delegate to either class space, we are in fact delegating to an OSGi-backed ClassLoader - basically, the primary and secondary loaders can delegate to other bundle loaders in accordance to the import/export metadata of their respective bundles.

At this point we might be pleased with ourselves. The bitter truth, however, is that the framework and application class spaces combined may not be enough! Everything hinges on the particular way the JVM links classes (also known as resolving classes). There are a variety of explanations for how this works:

The short answer: JVM resolution works on a fine-grained (one symbol at a time) level.

The long answer: When the JVM links a class, it does not need the complete descriptions of all classes referenced by the linked class. It only needs information about the individual methods, fields and types that are really used by the linked class. What to our intuition is a monolithic whole to the JVM, is a class name, plus a superclass, plus a set of implemented interfaces, plus a set of method signatures, plus a set of field signatures. All these symbols are resolved independently and lazily. For example, to link a method call site, the class space of the caller needs to supply Class objects only for the target class and for all types used in the method signature. Definitions for the numerous other things that the target class may contain, are not needed and the ClassLoader of the caller will never receive a request to load them.

The formal answer: Class A from class space SpaceA must be represented by the same Class object in class space SpaceB if and only if:

  • A class B from SpaceB exists, that refers to A from its symbol table (known also as the constant pool).
  • The OSGi container has wired SpaceA as the provider of class A for SpaceB. The wire is established based on the static metadata of all bundles in the container.

By example: Imagine we have a bundle BndA that exports a class A. Class A has 3 methods, distributed between 3 interfaces:

  • IX.methodX(String)
  • IY.methodY(String)
  • IZ.methodZ(String)

Imagine also that we have a bundle BndB that has a class B. Somewhere in class B there is a reference A a = ... and a method call a.methodY("Hello!"). To get class B to resolve, we need to introduce into the class space of BndB class A, and class String. That's all! We don't need to import IX or IZ. We don't need to import even IY, because class B does not use it - it uses only A. On the other hand, when the exporting bundle BndA resolves class A it must supply IX, IY, IZ, because they are directly referenced as implemented interfaces. Finally, even BndA does not have to supply any of the super-interfaces of IX, IY, IZ, because they are also not directly referenced.

Now let's imagine we want to present an enhanced version of class A from class space BndA to class B from class space BndB. The enhancement needs to extend class A and override some or all of its methods. Because of that, the enhancement needs to see the classes used in the signatures of all overridden methods. However, BndB will import all these classes only if it contains code that calls each overridden method. It is very unlikely that BndB calls exactly the methods of A that we mean to override with our enhancement. Therefore BndB likely does not see enough classes to define the enhancement in its class space. In fact the complete set of classes can only be supplied by BndA. We have a problem!

Turns out that we must bridge not the framework and application spaces, but the framework space and the space of the enhanced class - so, rather than "bridge per application space" we must shift our strategy to "bridge per enhanced space". We need to make a transitive hop from the application to the class space of some third party bundle, from where the application imports the class it wants us to enhance. How do we make that transitive leap? Simple! As we know, every Class object can tell us which is the class space where it was first defined. For example, all we need to do to get the defining class loader of A is to call A.class.getClassLoader(). In many cases however, we have a String name rather than a Class object, so how do we get A.class in the first place? Simple again! We can ask the application bundle to give us the exact Class object it sees under the name "A". Than we can bridge the space of that Class with the framework space. This is a critical step because we need the enhanced and original classes to be interchangeable within the application. Out of potentially many available versions of class A, we need to pick the class space of exactly the one used by the application. Here is a schematic of how the framework can maintain a cache of classloader bridges:

...
/* Ask the app to resolve the target class */
Bundle app = ...
Class target = app.loadClass("com.acme.devices.SinisterEngine");
 
/* Get the defining classloader of the target */
ClassLoader targetSpace = target.getClassLoader();
 
/* Get the bridge for the class space of the target */
BridgeClassLoaderCache cache = ...
ClassLoader bridge = cache.resolveBridge(targetSpace);

Where the bridge cache would look something like:

public class BridgeClassLoaderCache {
  private final ClassLoader primary;
  private final Map<ClassLoader, WeakReference<ClassLoader>> cache;
 
  public BridgeClassLoaderCache(ClassLoader primary) {
    this.primary = primary;
    this.cache = new WeakHashMap<ClassLoader, WeakReference<ClassLoader>>();
  }
 
  public synchronized ClassLoader resolveBridge(ClassLoader secondary) {
    ClassLoader bridge = null;
 
    WeakReference<ClassLoader> ref = cache.get(secondary);
    if (ref != null) {
      bridge = ref.get();
    }
 
    if (bridge == null) {
      bridge = new BridgeClassLoader(primary, secondary);
      cache.put(secondary, new WeakReference<ClassLoader>(bridge));
    }
 
    return bridge;
  }
}

To prevent memory leaks due to ClassLoader retention, we had to use both weak keys and weak values. The goal is to not retain the class space of an uninstalled bundle in memory. We had to use weak values because the value (BridgeClassLoader) of each map entry references strongly the key (ClassLoader), thus negating its "weakness". This is the standard advice prescribed by the WeakHashMap javadoc. By using a weak cache we avoid the need to track a whole lot of bundles and do eager reactions to their lifecycles.

Visibility

Okay, we finally have our exotic bridge class space. Now how do we define our enhancements in it? The problem as earlier mentioned, is that defineClass() is a protected method of BridgeClassLoader. We could override it with a public method, but that would be rude. Also we will have to code our own checks to see if the requested enhancement has already been defined. It is a better idea to follow the intended design of ClassLoader. This design prescribes that we should override findClass(), which can call defineClass() when it determines it can supply the requested class from an arbitrary binary source. In findClass() we can rely only on the name of the requested class make decisions. So our BridgeClassLoader must think to itself:

This is a request for "A$Enhanced", so I must call the enhancement generator for a class named "A"! Then I call defineClass() on the produced byte[]. Then I return the new Class object.

There are two remarkable things about that statement.

  • We introduced a text protocol for the names of enhancement classes - We can pass a single item of data to our ClassLoader - a String for the name of the requested class. At the same time we need to pass two items of data - the name of the original class and a flag, marking it as a subject of enhancement. We pack these two items into a single string of the form
    [name of target class]"$Enhanced"
    Now findClass() can look for the enhancement marker $Enhanced and, when it is present, extract the name of the target class. In this way we also introduce a convention for the names of our enhancements. Whenever we see a class name ending with $Enhanced in a stack trace, we know this is a dynamically generated class. To mitigate the risk of name clashes with normal classes, we make the enhancement marker as exotic as Java allows (e.g. $__service_proxy__)
  • Enhancements are generated on demand - We will never try to generate an enhancement twice. The loadClass() method we inherited will first call findLoadedClass(), if that fails it will call parent.loadClass(), and only if that fails it will call findClass(). The fact that we use a strict protocol for the names, guarantees findLoadedClass() will work the second time we get a request to enhance the same class. Couple this with the caching of bridge ClassLoaders and we get a pretty efficient solution, where there is no chance that we will bridge the same bundle space twice or generate redundant enhancement classes

Here we must also mention the option to call defineClass() through reflection. This approach is used by cglib. This is a viable option when we want the user to pass us a ready to use ClassLoader. By using reflection we avoid the need to create yet another loader on top of that, just so we can access its defineClass() method.

Class space consistency

At the end of the day, what we have done is to merge two distinct, unconnected class spaces using the OSGi modular layer. Also we introduced a search order between those spaces similar to the search order of the evil Java classpath. In effect, we have somewhat eroded the class space consistency of the OSGi container. Here is a scenario of how bad things can happen:

  1. Framework uses package com.acme.devices and requires exactly version 1.0
  2. Application uses package com.acme.devices and requires exactly version 2.0.
  3. Class A refers directly to com.acme.devices.SinisterDevice.
  4. It just happens that class A$Enhanced uses com.acme.devices.SinisterDevice from it's internal implementation.
  5. Because we search the application space, first A$Enhanced will be linked against com.acme.devices.SinisterDevice version 2.0, while it's internal code has been compiled against com.acme.devices.SinisterDevice version 1.0.

As a result the application will see mysterious LinkageErrors and/or ClassCastExceptions. Needless to say, this is a problem.

Alas, an automated way to handle this problem does not exist yet. We must simply make sure the enhancement-internal code refers directly only to "very private" implementation classes that are not likely to be used by anyone else. We can even build private adapters for any external APIs we might want to use and then refer to those from the enhancement code. Once we have a well defined implementation subspace, we can use that knowledge to limit the class leakage. We now delegate to the framework space requests only for the special subset of private implementation classes. This will also limit the search order problem, making it irrelevant if we do application-first or framework-first search. One good policy to keep things under control is to have a dedicated package for all enhancement implementation code. Then the bridge loader can check for classes whose name begins with that package and delegate their loading to framework loader. Finally, we sometimes need to judiciously relax this isolation policy for certain singleton packages like org.osgi.framework - we can feel pretty safe compiling our enhancement code directly against org.osgi.framework, because at runtime everyone inside the OSGi container will see the same org.osgi.framework - it is supplied by the OSGi core.

Putting it all together

Everything from this classloading saga can be distilled in the following ~100 lines of code:

public class Enhancer {
  private final ClassLoader privateSpace;
  private final Namer namer;
  private final Generator generator;
  private final Map<ClassLoader , WeakReference<ClassLoader>> cache;

  public Enhancer(ClassLoader privateSpace, Namer namer, Generator generator) {
    this.privateSpace = privateSpace;
    this.namer = namer;
    this.generator = generator;
    this.cache = new WeakHashMap<ClassLoader , WeakReference<ClassLoader>>();
  }

  @SuppressWarnings("unchecked")
  public <T> Class<T> enhance(Class<T> target) throws ClassNotFoundException {
    ClassLoader context = resolveBridge(target.getClassLoader());
    String name = namer.map(target.getName());
    return (Class<T>) context.loadClass(name);
  }

  private synchronized ClassLoader resolveBridge(ClassLoader targetSpace) {
    ClassLoader bridge = null;

    WeakReference<ClassLoader> ref = cache.get(targetSpace);
    if (ref != null) {
      bridge = ref.get();
    }

    if (bridge == null) {
      bridge = makeBridge(targetSpace);
      cache.put(appSpace, new WeakReference<ClassLoader>(bridge));
    }

    return bridge;
  }

  private ClassLoader makeBridge(ClassLoader targetSpace) {
    /* Use the target space as a parent to be searched first */ 
    return new ClassLoader(targetSpace) {
      @Override
      protected Class<?> findClass(String name) throws ClassNotFoundException {
        /* Is this used privately by the enhancements? */
        if (generator.isInternal(name)) {
          return privateSpace.loadClass(name);
        }

        /* Is this a request for enhancement? */
        String unpacked = namer.unmap(name);
        if (unpacked != null) {
          byte[] raw = generator.generate(unpacked, name, this);
          return defineClass(name, raw, 0, raw.length);
        }

        /* Ask someone else */
        throw new ClassNotFoundException(name);
      }
    };
  }
}

public interface Namer {
  /** Map a target class name to an enhancement class name. */
  String map(String targetClassName);

  /** Try to extract a target class name or return null. */
  String unmap(String className);
}

public interface Generator {
  /** Test if this is a private implementation class. */
  boolean isInternal(String className);

  /** Generate enhancement bytes */
  byte[] generate(String inputClassName, String outputClassName, ClassLoader context);
}

Enhancer captures only the bridging pattern. The code generation logic is externalized into a pluggable Generator. The generator receives a context ClassLoader from where it can pull classes and use reflection on them to drive the code generation. The text protocol for the enhancement class names is also pluggable via the Namer interface. Here is a final schematic code for how such an enhancement framework can be used:

...
/* Setup the Enhancer on top of the framework class space */
ClassLoader privateSpace = getClass().getClassLoader();
Namer namer = ...;
Generator generator = ...;
Enhancer enhancer = new Enhancer(privateSpace, namer, generator);
...

/* Enhance some class the app sees */
Bundle app = ...
Class target = app.loadClass("com.acme.devices.SinisterEngine");
Class<SinisterDevice> enhanced = enhancer.enhance(target);
...

The Enhancer framework presented above is more than pseudocode. In fact, during the research of this article, it was really built and tested with two demo code generators operating simultaneously in the same OSGi container. The result is loads of fun and is now available on Google Code for everyone to play with.

Those interested in the class generation process itself can examine the two demo ASM-based generators. Those who read the article on service dynamics may notice that the proxy generator uses the ServiceHolder code presented there as a private implementation.

Conclusion

The classload acrobatics that were presented are used in a number of infrastructural frameworks under OSGi. For example, classloader bridging is used by Guice, Peaberry, and Spring Dynamic Modules to get their AOP wrappers and service proxies to work. When we hear the Spring guys say they did serious work on Tomcat to adapt it to OSGi, we can speculate they had to do classload site conversion or a more serious refactor to externalize Tomcat's servlet classloading altogether.

Acknowledgements

Many of the lessons in this article were extracted from the excellent code Stuart McCulloch wrote for Google Guice and Peaberry. For examples of industrial strength classload bridging, look at BytecodeGen.java from Google Guice and ImportProxyClassLoader.java from Peaberry. There you will see how to handle some additional aspects like security, the system classloader, better lazy caching and concurrency. Thank you Stuart!

The author is also obliged to Classy Solutions to Tricky Proxies by Peter Kriens. Hopefully, the explanations on JVM linking in the current article will make a useful contribution to Peter's work. Thank you Peter!

About the Author

Todor Boev has been involved with OSGi for the past eight years as an employee at ProSyst. He is passionate about developing OSGi into a general purpose programming environment for the JVM. Currently he explores this topic both professionally and as a contributor to the Peaberry project. He maintains a blog at rinswind.blogspot.com.

Rate this Article

Adoption
Style

BT