There is little doubt that the hardest topic in OSGi is how to deal with service dynamics. In this article we will explore in depth the fundamental nature of the problem and than present some innovative strategies to solve it. Although basic familiarity with OSGi is assumed readers new to the technology should find this an entertaining read as well. The discussion will be based directly on the OSGi API and can serve as an informal introduction to the OSGi service layer. At the end of the article we will present some frameworks that capture the API patterns we have developed and thus free the developers from having to deal with OSGi directly.
The Problem
There are two key factors that make service dynamics fiendishly hard to get right.
Concurrency
Before we delve deeper we should digest one basic and somewhat startling fact: the OSGi framework in practice does not run threads of it's own! It is merely a "dead" threadsafe object structure on the heap. The "main" thread is used to setup this structure and start the initial set of bundles. Than it goes to sleep - it's function to merely prevent the JVM from shutting down due to lack of living threads. The "main" thread is typically awaken only at the end of the framework shutdown sequence when all other threads are supposed to be dead. It is used to perform some final cleanup before it also dies and lets the JVM exit. This means that all useful work must be done by threads started from bundles during that initial startup sequence. Let's call these "active bundles". The majority of bundles are usually "passive bundles". I.e. they don't start threads from their BundleActivator.start() method. Instead they setup the imports of some service objects, which are than composed into new service objects, which are finally exported. After the start()
call returns the bundle just sits there and waits for a thread to call it's exported services. As elegant and lightweight all this might be it also means that the OSGi framework does not enforce any threading model - it steps aside and lets the bundles sort it all out between themselves. The framework object structure acts as a "passive bundle" (a bundle with ID 0 in fact) getting animated only when a thread from an "active bundle" calls in to perform some interaction with another bundle or with the framework itself. Because at any time a random number of threads can call into the OSGi core the framework implementers have their work cut out for them. We as an application coders are also not exempt from the suffering.
The concurrency factor is is than this: at all times an OSGi application is subjected simultaneously to two independent control flows. We'll call these the "business logic flow" and the "dynamic management flow". The first one represents the useful work done by the app and has nothing to do with OSGi. Here we choose the design (thread pools, work queues etc) and code our bundles to follow the rules. The second control flow however is entirely out of our hands. Generally it takes place when a management application running concurrently to our own plays with the lifecycle of the bundles (this includes installing and uninstalling). Often there are more than one such applications - each with it's own threading rules just like our app. Some examples include provisioning over HTTP, management and monitoring over JMX, even a telnet console interface. Each of these can reach through the OSGi core, call BundleActivator.stop() on a bundle we depend on, and cause the withdrawal of a service we require. When this happens we must be ready to cooperate in the release of the service object. This odd arrangement is explained with the second factor mentioned earlier.
Direct service references
The second factor has to do with the way objects are exchanged between bundles. Here again OSGi is non-intrusive and lightweight: an importing bundle holds a direct reference to the object owned by the exporting bundle. The chief benefit of this design is that OSGi does not introduce method call overhead between bundles - calling a service is just as fast as calling a privately owned object. The downside is that the importing bundle must cooperate with the exporting bundle to properly release the service. If an importer retains a reference to the dead service at least two harmful effects take place:
- Calls to the stale service are likely to crash.
Because the service is no longer backed by a bundle, calling it will likely result in an exception due to unavailable required services or other resources. I.e. the importer will never work properly because of it's unwillingness to drop the stale service and try to get a new working one. - Memory leaks because of
ClassLoader
retention.
TheClassLoader
of the exporter bundle will remain in memory even though the bundle is uninstalled. Obviously each object on the heap must have a concrete implementing class, which in this case is provided by the dead bundle'sClassLoader
. This leak will happen even if the importer sees the service object through an interface loaded from a third library bundle.
All this means that the importer must track the availability of the service and guarantee the release of all references to the service object in finite time after it has received a service unregistration event. Conversely when the service goes back online it must be picked up and propagated to the points within the bundle where it is used.
The solution
So far we went to great pains to describe..err..the pain of service dynamics. Now that we are hurting let us discuss the remedy. For the moment we have exhausted the subject of the correct importer policy. Now let's add to this a service export policy. The sum of an import and an export policy should form a complete doctrine about service dynamics. We will explore two export policies with their corresponding doctrines.
Eager
This school of though shoots for safe service calls. It's motto is:
To export a service is to announce it is ready for use
Consider what this means for services that are composed from imported objects. Such objects are called "required services". A service can also be "optional - e.g. logging. Under the eager motto when a required service goes down the export is no longer usable. So it must also be withdrawn from the OSGi service registry. This goes the other way too - when the required service comes back the composite service must be registered once more. This results in cascades of service registrations and unregistrations as chains of dependent services come together and fall apart. Implementing this flickering behavior varies from hard to exceptionally hard. The problem is that the imports and the exports have to be tracked inside common objects with the proper synchronization. Also quite often this dynamic dependency management is further compounded by the need to track events from non-service sources. For example we track a dynamic configuration waiting for it to become valid.
Let us suppose we manage to write all of the boilerplate for each of our bundles. Now imagine how a thread races through the OSGi container when it executes the business control flow (i.e. useful work). It will start it's path from the active bundle that drives the particular application. As soon as it calls a service object it will leave it's "home" bundle and enter the bundle that exports the service. If that service is in turn implemented via other services the thread will hop into the bundle of each one and so on. For the original service call to succeed each and every hop must also succeed. Turns out we are trying to achieve a kind of transactional behavior - a call to a chain (or more generally a tree) of services either fully succeed or can not be made in the first place because the root service is not registered. Under such strong guarantees the active bundle knows ahead of time (or eagerly) that a certain activity can't be performed and can take alternative actions. E.g. rather than react to an error it proactively performs the respective error handling.
The safety promised by the Eager model hinges on the particular way the OSGi framework delivers service status events. Service tracking is done by hooking a ServiceListener to the OSGi service registry. For better or worse the OSGi framework calls these listeners synchronously - i.e. within the management control flow that performs the actual service unregistration. This gives a chance for the management and application control flows to meet inside the importer bundle. The importer can than synchronize both flows against the same private lock. Why do this? Because we can block out the management control flow if the service is currently being accessed by the application control flow. Conversely if the management flow get's the lock first it will have cascaded the service chain apart by the time it releases the lock. The application flow can than proceed with a check to see if the service is available.
Unfortunately it is plain to see this kind of safety is impossible. Imagine some management control flow kicks in and stops a bundle two hops removed from the current position of the business control flow. Since the business flow has not yet entered the stopped bundle it will not be able to block the management flow from taking down it's services. As a result our thread will run into a wall. Obviously no amount of "local" synchronization by each individual bundle along the path will guarantee the integrity of the entire path. What is needed is a third party - a transaction manager of sorts, to lock the entire path before the business flow starts traversing it. Since such a manager does not currently exist we can conclude that service flickering can't prevent errors caused by disappearing services. We simply must accept that we need to handle such errors and try to develop and effective scheme to do so.
This brings on the question if there is some other benefit to justify the complexity caused by service flickering. We could argue that although we can't guarantee that a service call will succeed at least service flickering can tell us the precise moment after which a service call is guaranteed to fail. This allows us to perform various auxiliary reactions right after a required service goes down. For example if a bundle draws buttons in your IDE and a direct or transitive dependency goes away it can pop a dialog or hide the buttons from the toolbar. Without the cascading destruction of the service chain the buttons will be right there on the toolbar and the user will get exceptions every time they are clicked. This is a dubious return that hardly justifies the boilerplate investment. This becomes even less attractive if we consider the additional complications. Why should we blow the horn loudly during a routine bundle update that lasts 2 seconds? Maybe we should just "flicker" the buttons on the toolbar and postpone the dialog until the failure persist for more than 10 seconds. Should this period be configurable? Also who should react - only active bundles or every bundle along the service chain? Since we don't want to get flooded by dialogs (or other reactions in general) we must introduce some application-wide policy. In short we have payed a lot to get back a dubious benefit and as a side effect have introduced a brand new crosscutting concern in our modular application.
Lazy
Having seen how the extreme dynamics cause more problems than it solves let's try to reduce the service tracking behavior to the essential minimum. Because most of the complexity of the Eager model was a consequence of the overly strong service export policy we will begin by relaxing it:
To export a service is to declare an entry point into the bundle
Since the export is merely a declaration it does not require any dynamic flickering. We simply accept that calling a service can cause an exception because of a missing direct or transitive service dependency. We call this model "lazy" because here we do not learn about a missing service unless we try to call it. The complete dynamics doctrine than becomes:
- Explicit registration is used only during bundle startup.
Generally a bundle should follow this sequence in BundleActivator.start():- Import
- Construct
Build the bundle internal structure from imports and implementation classes. Store it's roots in non-final fields in the activator. - Activate
If this is an active bundle start it's threads. - Export
If there are objects to export register them now and store their ServiceRegistrations in non-final fields in the activator.
How to do this in an easy way is described below. - Construct
- Importers fail fast Every imported service must be tracked. When an attempt is made to use a missing service a
- Service errors are handled like regular
RuntimeExceptions
(faults)
We handle any unchecked exceptions (including the SUE) thrown when using service objects by the classical fault scheme: propagate to a top-level catch block (fault barrier), do cleanup intry/finally
blocks as the stack unrolls or when we arrive at the catch block. Finally complete the crashed activity in some rational way. In detail:- If the service is optional we just catch, handle and proceed.
If the service is not critical for the job at hand there's no need to crash. Runtime exceptions are caught on the spot and logged. Any appropriate fail-over actions are taken and the execution happily proceeds. I.e. we convert a fault to a contingency. Whether a service is optional depends on the concrete application. We can even imagine partially optional services where only some of the methods calls are wrapped intry/catch
fault barriers while others lead to more comprehensive crashes. - If the service is required and we are a passive bundle we clean up our own resources and let the exception propagate.
Passive bundles don't drive business logic and therefore don't own the activities that call into them. As such they have no right to decide how these activities are completed and should let the exception propagate to the owning active bundle. They still must clean up any internal resources associated with the service call fromtry/finally
blocks. Because good coding style requires such cleanup to be implemented anyway it turns out that for passive bundles lazy service dynamics coast nothing. - If the service is required and we are an active bundle we declare the current activity as crashed, log, clean up, try contingency actions.
If we are the bundle who drives the crashed activity it's our responsibility to complete it one way or another. Good design requires that we wrap a fault barrier around the business logic code to absorb crashes. If there is need of resource cleanup we do it as usual. Than we do whatever the application logic dictates: display an error dialog, send an error response, etc.
- If the service is optional we just catch, handle and proceed.
- Explicit service unregistration is used only during bundle shutdown.
All bundles should execute the following sequence inBundleActivator.stop()
:- Unexport
Bring down all exported services by callingunregister()
on the previously storedServiceRegistration
objects. This is an announcement to all current clients to start dropping their references and to the OSGi framework to stop letting new clients obtain a reference to the service. Doing this in the beginning of shutdown is a courtesy move since all services will be unregistered automatically right afterBundleActivator.stop()
completes. - Deactivate
If we are an active bundle perform graceful shutdown to our threads. Release any non-heap resources we own: close sockets and files, release UI widgets etc. This has to be done in a threadsafe manner because legitimate client calls may still be in progress. The goal of this step is to transform the content of the bundle into a fail fast ball of objects. The function of this ball from here on will be to crash cleanly any business control flow that has been trapped inside our service at the moment of shutdown. Because deactivation is performed concurrently from the management control flow it is obvious some synchronization should be in place. This shows that every OSGi service implementation must be prepared for concurrent access at least in terms of graceful shutdown. Because most OSGi services are concurrent anyway so the coast of an additionalclose()
method is not too high. If the service is stateless deactivation could be free - trapped business control flows will crash as they try to hop into another bundle through a discontinued service import. - Detach
This is done by explicitly nulling any fields contained in yourBundleActivator
. At this point we detach the fail-fast ball we created in the previous step from our bundle. As soon as all business control flows trapped inside crash the ball will be garbage collected. After stop() completes the bundle should consume memory only for it'sBundleActivator
instance and it'sClassLoader
. These are both managed directly by the OSGi runtime. The bundle will mushroom again into a runtime structure if some management control flow reaches through the OSGi core to call once moreBundleActivator.start()
(e.g. a user clicks on "start bundle" in his JMX console).
- Unexport
This sequence is carefully arranged to not expose any part of the bundle to external calls before everything is fully setup. Upon completion of this sequence the bundle is started and hooked to the service registry. It's internal structure is spared from garbage collection because it is referenced from within the activator and the activator in turn is referenced from the OSGi service layer. Now the management control flow can leave the activator and go about it's business. If the bundle has started some threads to execute the business flow they can continue doing their work after the activator is no longer being executed.
RuntimeException
is thrown. This exception is typically called ServiceUnavailableException
(or SUE).
The goal of the shutdown policy is to maintain the bundle services in a consistent "work correctly or crash cleanly" state right up to the moment when it is garbage collected. In the start up sequence we do not expose objects to control flow before they become consistent and in the shutdown sequence we maintain them consistent until garbage collected. This "be consistent at all times" policy can be observed in non-OSGi concurrent Java as well. E.g. the ususal way to shutdown a thread is to touch a boolean flag (call
unregister
), destroy all resources the thread uses (discontinue imports, close sockets etc.) and leave the thread expire from natural causes. The beauty of the Lazy doctrine is that we manage to almost completely fold the hard problem of service dynamics into the easier problem of dealing with exceptions properly . We achieve this by directing crashes caused by inconsistent services into the error handling pathways built into every non-dynamic Java application. Turns out dynamics are not so horrible, they mostly force us to have a good error handling and cleanup design. I.e. service dynamics are all about crashing safely.
Fighting code distortion
There is a substantial wrinkle in our smooth plan - the service consuming code is still hard to write and quite disruptive to the rest of our application code. Let's go back and examine how Concurrency and Direct Service References conspire to create a maintenance nightmare. In one sentence:
At any time (concurrency), we can be required to flush out all references to the service object (direct references).
This means we must at all times keep under strict control each and every reference - something akin to the resource cleanup problems faced by C++ programmers. As with any object in the JVM a service object can be referred in two ways:
- Heap
These references are stored in object fields. They must be explicitly managed by our code. The management control flow touches the heap references from the ServiceListener we use to track service availability. Therefore code in the listener must be synchronized with all other code that touches the heap reference. - Stack
These references reside on the stacks of active threads and are automatically cleaned when their call frame expires. The management control flow can't touch the stack references so no synchronization is required
To keep things manageable developers shift the balance sharply in favor of the self-cleaning stack references. Usually the entire bundle contains a single heap reference per service. From there the service can be propagated as a method parameter to it's usage sites. When this is not possible long paths of get()
methods are built to pull the service to the respective usage site. Soon developers become overwhelmed and apply the Singleton pattern to this service reference. This replaces one flavor of code distortion for another, arguably worse one.
An alternative way to achieve correct lazy import behavior is to encapsulate the service reference in a holder object. All required synchronization, event handling, and cleanup will be encapsulated in the holder. Then we can propagate this holder to all service usage sites in place of the actual service. Here is a simple lazy holder:
import static org.osgi.framework.Constants.OBJECTCLASS; import static org.osgi.framework.ServiceEvent.UNREGISTERING; import org.osgi.framework.BundleContext; import org.osgi.framework.InvalidSyntaxException; import org.osgi.framework.ServiceEvent; import org.osgi.framework.ServiceListener; import org.osgi.framework.ServiceReference; public class ServiceHolder<T> implements ServiceListener { private final BundleContext bc; private final Class<T> type; private ServiceReference ref; private T service; /** * Called from BundleActivator.start(). * * (management control flow) */ public ServiceHolder(Class<T> type, BundleContext bc) { this.type = type; this.bc = bc; /* Track events for services of type T */ try { bc.addServiceListener(this, "(" + OBJECTCLASS + "=" + type.getName() + ")"); } catch (InvalidSyntaxException e) { throw new RuntimeException("Unexpected: filter is correct", e); } } /** * Called by the app when it needs the service. * * (application control flow) */ @SuppressWarnings("unchecked") public synchronized T get() { /* Lazy-bind to a suitable service */ if (service == null) { ref = bc.getServiceReference(type.getName()); /* Can't find a service - fail fast */ if (ref == null) { throw new RuntimeException("Service " + type + " unavailable."); } service = (T) bc.getService(ref); } return service; } /** * Called by the container when services come and go. * * (management control flow) */ public synchronized void serviceChanged(ServiceEvent e) { /* Is a service going away? */ if (UNREGISTERING == e.getType()) { /* Is this the service we hold? */ if (ref == e.getServiceReference()) { /* Release the service */ service = null; ref = null; } } } }
There! Now this looks like a real programmer article. Let's imagine we want to import the following mission critical service.
interface Hello { void greet(String who); } In BundleActivator.start() we must setup a ServiceHolder. public Activator implements BundleActivator { private ServiceHolder<Hello> helloHolder; ... public void start(BundleContext bc) { helloHolder = new ServiceHolder<Hello>(Hello.class, bc); ... } ...
The service consuming code would look something like:
... private final ServiceHolder<Hello> holder; ... void salutations() { /* Pop the service into a self-cleaning reference */ Hello hello = holder.get(); hello.greet("InfoQ"); hello.greet("OSGi"); } ...
Or if we shorten a bit:
... void salutations() { holder.get().greet("InfoQ"); holder.get().greet("OSGi"); } ...
This extra level of indirection is still far from optimal. Having our code packed with these holders causes loss of clarity, hinders testing and becomes a refactoring nightmare if we decide to migrate a local class to another bundle and consume it as a service. But it's the only way to be correct. Or is it?
If we squint at the code it becomes obvious we are moving towards the classical Proxy design pattern. Let's complete the proxy by wrapping our holder in the original service interface:
class HelloProxy implements Hello { private final ServiceHolder<Hello> holder; public HelloProxy(ServiceHolder<Hello> holder) { this.holder = holder; } public void greet(String who) { holder.get().hello(who); } }
Now we can create a single HelloProxy
in the activator and use it everywhere through Hello
-typed references as if it was the original service. Except now we can store the "service" in final fields and pass it to constructors. Combine this with the rest of the Lazy doctrine and we get a clean separation between the dynamics boilerplate, locked inside proxies an the activator, and the business logic code. Also the business code now looks like a regular non-dynamic Java program:
... private final Hello hello; ... void salutations() { hello.greet("InfoQ"); hello.greet("OSGi"); } ...
At this point we must make an important observation the eager and lazy models are not mutually exclusive.. As the code above illustrates in the core of every lazy bundle runs tracking code similar to the code that would support an eager bundle. A lazy bundle wraps this tracking core with a stable layer of proxies that shield the application code (and it's control flow) from all the movement happening below. Still if we need to we can plug code into the tracking layer and have a hybrid eager(pre-proxy)/lazy(post-proxy) bundle. For example the eager part can do decorations or even complete transformations to the services before they are wrapped in proxies and passed to the lazy part. Turns out that (excluding service flickering) the lazy model is really a natural evolution of the eager model to a higher level of abstraction.
Service Layer Runtimes
All would be great except for the fact that coding proxies by hand can get very tedious. Fortunately such proxy generation is quite easy to code as a library or even better as an active Service Layer Runtime bundle.
Since OSGi 4.1 it became possible to implement a special type of bundle that can drive the service interactions of other bundles. Let's call these frameworks Service Layer Runtimes (or SLR) because they hide the raw OSGi service layer from the application developers. Although SLRs come in all shapes and sizes they inevitably include a dependency injection component. This is because DI is a natural match for services, which typically enter the bundle from a single point like the activator and need to be propagated all over the bundle internals. Using tons of singletons instead creates problems and organizing "fire brigades" of getters is tedious. Delegating this task to DI is a huge relief.
Below are very brief reviews of the most popular current day SLRs. Lost of features are omitted since a comprehensive side-by-side comparison would require at least two more articles.
Peaberry
Peaberry is a pure Java extension to the ever more popular Guice framework. Discovering an effective way to use Peaberry was the main inspiration for this article. The framework feels largely like using pure Guice. All we need to do to get a service proxy is to bind the interface of the service to a special Peaberry provider:
bind(Hello.class).toProvider(Peaberry.service(Hello.class).single());
The proxies are than generated on the fly using ASM. From there normal Guice takes over and injects them as it would any other object. Code written in this way looks a lot like plain old Java SE, with dynamic proxies practically indistinguishable from local objects. Peaberry supports eager/lazy hybrids by allowing us to hook code to the tracking layer and if required do decorations to the service objects before they are wrapped in proxies. A unique feature of Peaberry is it's ability to seamlessly mix services from different sources - for example objects from the Eclipse registry can be mixed transparently with OSGi services.
Alas Peaberry is not yet perfect. One area where it lags behind the other SLRs is dynamic configuration. The user has to import the ConfigurationAdmin service to mutate configurations or to export a ManagedService to receive dynamic configuration. Another drawback of Peaberry is that it still requires the user to code a minimalistic BundleActivator
where the Guice Injector
gets setup. The good new is that Peaberry is currently under active development and these gaps are sure to be plugged soon.
Spring Dynamic Modules
Dynamics are supported through transparently generated service proxies. Spring DM relies on the Spring component model to do Dependency Injection. Components have dynamic (de)activation lifecycles aligned with the Eager model. While Peaberry inclines the coder to use Lazy by default and go Eager if required Spring DM seems to default to a 50/50 Eager/Lazy style. Although it seems to provide more features than Peaberry, Spring DM feels much more heavyweight to use.
Spring DM will soon continue it's life as an OSGi standard called the Blueprint Service. Much more than a simple change of names the Blueprint Service is carefully evolved from Spring DM to integrate better with the OSGi runtime.
Declarative Services
This is the only SLR standardized by OSGi so it deserves a bit more attention. OSGi DS tries to solve the dynamics problem with traditional Java means. It is high level in that it has a component model with setter dependency injection. It is low level in that it exposes the components to more of the service dynamic (no proxies). A dependency can either be defined as "dynamic" or "static".
For dynamic dependencies a component must provide a pair of bind()/unbind()
callbacks. OSGi DS will do the tracking and call the respective callback. The component than takes over to performs all the swapping and synchronization on it's own. In this case OSGi DS saves the developer only the tracking code.
By default a dependency is "static" and the component only needs to provide a bind()/set()
method. Now the component does not have to worry about synchronization or release of the service object. Instead OSGi DS will re-create the entire component whenever the dependency changes. Alas this is the only way to make sure the old service object is released. Because it is so expensive the use of static dependencies is discouraged.
OSGi DS also follows the Eager service export policy: if a component is exposed as a service and some of it's required dependencies go away the component is unregistered. The consequence is also the cascading deactivation of dependent components. As we saw this lifecycle scheme can not prevent exceptions from failing transitive dependencies. The user code must still have proper error handling in place.
OSGi DS also supports the non Dependency Injection "lookup" style where your components receive a ComponentContext from which to pull out services.
OSGi DS will get a nice facelift in the soon to be released OSGi 4.2 specification. The improvements concentrate on allowing OSGi DS code to be pure POJO. It will become possible to truly never see any element of the OSGi service API unless we want to.
iPojo
Architecturally iPojo seems to be "OSGi DS but done right". Here as well we deploy components as bundles with each component having it's dependencies managed by a central SLR bundle. As with OSGi DS a component can be exposed as a service to the OSGi registry. From here however iPojo starts to significantly depart from OSGi DS. Most importantly it is highly modular allowing a component to specify a different pluggable handler for each type of dependency. Unlike OSGi DS, iPojo does a thorough job of protecting the user code from service dynamics. Rather than within proxies services are stored in a common ThreadLocal
cache. The cache is populated when a thread enters the component and flushed when it leaves. All of this is achieved through bytecode weaving magic. This makes iPojo usable on resource constrained devices.
Acknowledgments
The author would like to thank Roman Roelofsen for providing input and critical reviews throughout the long evolution of this article.
About the Author
Todor Boev has been involved with OSGi technology for the past eight years as an employee at ProSyst. During that time he had the opportunity to apply OSGi in a variety of environments ranging from embedded devices to scalable backend servers. He is passionate about developing OSGi as a user friendly, general purpose programming environment for the JVM. Currently he explores the integration between cutting edge Java and OSGi both professionally and as a contributor to the Peaberry project. He also maintains a blog at http://rinswind.blogspot.com/.