Over the last few years, modularity for Java has been an active discussion topic. From the (now defunct) JSR 277 to the recognition of JSR 291 and the ongoing JSR 294, modularity is seen as a necessary step in Java's evolution. Even future JVM-based languages like Scala are considering modularity. This article, the first in a multi-part series on Modular Java, discusses what modularity means, and why you should care.
What is modularity?
Modularity is a general concept which applies to the development of software in a fashion which allows individual modules to be developed, often with a standardised interface to allow modules to communicate. In fact, the kind of separation of concerns between objects in an OO language is much the same concept as for modules, except on a larger scale. Typically, partitioning a system into modules helps minimise coupling, which should lead to easier to maintain code.
The Java language was not designed with modules in mind (other than packages, which are likened to Modula-3 modules in the introduction) but none the less there are many de-facto modules in the Java community. Any Java library is in effect a module, from Log4J to Hibernate to Tomcat. Typically, both open-source and closed-source applications will have one or more dependencies on external libraries, which in turn have transitive dependencies on others.
Libraries are modules too
Libraries are implicitly modules. They may not all have a single interface to communicate with, but often will have 'public' APIs (which should be used) and 'private' packages which have documented use cases. Furthermore, they have dependencies themselves (such as JMX or JMS). This can result in automatic dependency managers bringing in a lot more than is strictly necessary; in the case of Log4J-1.2.15 bringing in over 10 dependencies (including javax.mail
and javax.jms
) even though many of these are never needed by programs that use Log4J.
In some cases, a module's dependencies can be optional; that is, the module can provide a subset of functionality with missing dependencies. In the above example, if JMS isn't present on the runtime classpath, then logging via JMS will not be available but other mechanisms will be. (Java achieves this through the use of deferred linking; by not requiring a class to be present until it is accessed, a missing dependency can be handled by an appropriate ClassNotFoundException
. Other platforms have the concept of weak linking which does much the same runtime checks.)
Typically, modules have an attached version number. Many open-source projects generate released that are named similarly to log4j-1.2.15.jar
. This allows a developer to determine, through manual inspection at runtime, which version of a particular open-source library is being used by consulting the classpath. However, the program is likely to have been compiled against a different version of the library; the implicit assumption is that compiling against log4j-1.2.3.jar
and running against log4j-1.2.15.jar
will be behaviourally compatible. Even upgrading to the next minor version is generally compatible (which is why problems in log4j 1.3 resulted in a new branch 2.0 to signify a break in compatibility). All of these are generally based on conventions rather than constraints that are known at runtime.
When is modularity useful?
Modularity is useful as a general concept to break down an application into different parts, which can then be tested (and evolved) separately. As noted above, most libraries are modules anyway so for those producing libraries for others to consume, modularity is an important concept to understand. Usually, the dependency information is encoded in the build tool (maven pom or ivy-module) and explicitly documented in the library's usage notes. It's not uncommon for an upstream library to develop workarounds for bugs in a lower level library, even when the latest version of the lower level library has been fixed since, to provide a seamless experience in the higher level library. (Sometimes these can cause subtle problems however.)
If a library is being built for consumption by others, then it is already implicitly a module. But in the same way that there are few “Hello World” libraries, there are also few real “Hello World” modules either. It's only once an application becomes sufficiently large (or it's being built with a sufficiently modular build system) that the concept of logically breaking down an application into different parts comes into play.
One aspect that is a benefit to modularisation is that of testing. A smaller module (with a well-defined API) can typically be tested better than a monolithic application. This is especially true of GUI applications, where the GUI itself might not be easily testable but the code which it calls may be.
Another aspect is that of evolution. Although the system as a whole will have a version number, in reality, it is a product of multiple modules and versions under the covers (whether closed source or open source, there will always be some kind of library – even the Java version – that is a dependency of the system). As a result, each module is free to go about evolving in a way suitable for that module. Some modules may evolve faster than others, whilst some may be stable enough to remain fixed for long periods (for example, Eclipse 3.5 has org.eclipse.core.boot
which has remained unchanged since February 2008).
Project management can also benefit from modularisation. Given that a module will end up having a published API to which others can subscribe, it's possible for separate modules to be implemented by separate teams. This inevitably happens on large-scale projects anyway, but sub-teams can be made responsible for the delivery of different modules.
Finally, modularising an application can help to concretely identify which versions of dependent libraries are being used in order to harmonise library dependencies across a large project.
Runtime versus compile time
Java typically has a flat classpath, whether at compile time or at runtime. In other words, applications normally have full visibility to any class that's found on the classpath, regardless of the order of entries in the classpath (assuming that there are no overlaps, at least; otherwise, first one wins). This enables the functionality of dynamic linking in Java; a class loaded from the front of the classpath need not have resolved all references to the classes that may be towards the rear of the classpath until they're actually required.
This is frequently used when working against a set of interfaces to which the implementation isn't known about until runtime. For example, an SQL utility can be compiled against the generic JDBC package, but at runtime (and with an additional piece of configuration information) can instantiate the correct JDBC driver. This is typically achieved through the name of a class (which implements a pre-defined factory interface or abstract class) being supplied to a Class.forName
lookup at runtime. If the specified class doesn't exist (or can't be loaded for any other reason) an error is generated.
It's therefore quite likely that the compile time classpath is (subtly) different from the runtime classpath for a module. Further, each module is often compiled in isolation (module A may be compiled against module C 1.1, and module B may be compiled against module C 1.2) but then combined at runtime in a single path (and in this case, either arbitrarily choosing version 1.1 or 1.2 of module C). This leads quickly to Dependency Hell, especially when it is the transitive closure of these dependencies which forms the runtime classpath. Build systems like Maven and Ivy make modularity visible to developers, if not end users.
Java has an under appreciated feature called ClassLoaders which allow the runtime path to be more segmented. Typically, all classes are loaded from the system ClassLoader; however, some systems partition their runtime space with different ClassLoaders. A good example is Tomcat (or other Servlet engines) which typically have a one ClassLoader-per-WebApp. This allows a WebApp to function normally but not see (accidentally or otherwise) classes defined by other WebApps in the same JVM.
The way this works is that each WebApp loads classes from its own ClassLoader, so that a (local) WebApp's implementation doesn't load classes which conflict with another WebApp's implementation. The requirement is, for any ClassLoader chain, that the class spaces be consistent; this means you can have two Util.class
es loaded from two separate ClassLoaders in your VM at one time, provided that these ClassLoaders aren't visible to one another. (It's also what gives the Servlet engine its ability to redeploy changes without a restart; by throwing a ClassLoader away, you throw away references to its classes as well, making the old version eligible for garbage collection – this then allows the Servlet engine to create a new ClassLoader and re-load the new versions of the classes in at runtime.)
Modules all the way down
Building a modular system is really a way of partitioning an application into (potentially) reusable modules and to minimise the coupling between them. It's also a way of de-coupling a module's requirements; for example, the Eclipse IDE typically has plugins that have separate dependencies on GUI and non-GUI components (e.g. jdt.ui
and jdt.core
). This permits other uses of the non-GUI module (headless builds, parsing and error checking, etc.) outside of the IDE environment.
Other than the monolithic rt.jar
, any system can typically be decomposed into various modules. The question becomes; is it worth it? After all, it's much easier to start with a modular system and build your way up than to take a monolithic system apart and break it into modules.
One of the reasons why this is usually the case is to do with class leakage across module boundaries. For example, the java.beans
package logically shouldn't have any dependencies on any GUI code; however, java.beans.AppletInitializer
, used by Beans.instantiate()
, has a reference to Applet
which of course has knock-on dependencies to the whole AWT chain. So java.beans
technically has an optional dependency on AWT, when common sense dictates that it should not. Had a more modular approach been taken to building the core Java libraries initially, this error would have been caught long before the API was ever made public.
At some point, a module cannot be broken further down into sub-modules. However, sometimes related functions are kept within the same module for ease of organisation, and only decomposed further when necessary. For example, the refactoring support, originally part of Eclipse's JDT, was pulled out into its own module in order to allow other languages (like CDT) to take advantage of the generic refactoring capability.
Plugins
Many systems are extensible through the concept of plugins. In these cases, the host system has a defined API to which the plugin must conform, and a way of injecting that plugin in. Many applications (such as web browsers, IDEs and build tools) offer a way to customise the application by providing a plugin that offers the correct API.
Sometimes these plugins are limited or perform generic operations (decoding audio or video) but equally well can be complex in their own right (e.g. plugins for IDEs). Sometimes, these plugins can provide their own plugin to customise the behaviour further, which can make systems highly customisable. (Increasing the number of levels of indirection can make a system increasingly hard to understand, however.)
The plugin API forms part of a contract which the individual plugins must obey. These plugins are themselves modules, which go through the normal dependency chain and versioning issues that the enclosing system is providing. As the complexity of the (specific) plugin API evolves, so too must the plugin itself (or backward compatible behaviour must be maintained).
One of the reasons for the success of the Netscape plugin API for browsers has been its simplicity; only a handful of functions are needed, and providing that the host browser redirects input with the appropriate MIME type, the plugin can handle processing the rest. However, more complex applications like IDEs typically need far more tightly integrated modules, and therefore, a more complex API to drive them.
Current state of Java modularity
Many module systems and plugin infrastructures exist in Java at the moment. IDEs typically are the well known ones, with IntelliJ, NetBeans and Eclipse all offering their own plugin systems as ways to customise the experience. However, build systems (Ant, Maven) and even end-user applications (Lotus Notes, Mac AppleScript-able applications) have the concept of being able to extend the core functionality of the application or system in question.
Arguably the most mature module system in Java is OSGi, which has been around almost as long as Java itself, first appearing as JSR 8, but more recently accepted as JSR 291. OSGi defines additional metadata in the JAR's MANIFEST.MF to indicate required dependencies on a per-package basis. This permits modules to check (at runtime) that their dependencies are met, and in addition, to permit each module to have its own private classpath (by virtue of having one ClassLoader per module). This helps, but does not completely prevent, the concept of dependency hell mentioned earlier. As with JDBC, OSGi is a specification (currently release 4.2) which has multiple open-source (and commercial) implementations. Since modules don't need to depend on any OSGi specific code, many open-source libraries now embed their meta-information into the manifest for consumption in OSGi runtimes; for those that don't, tools like bnd can post-process an existing JAR file and generate sensible defaults. Eclipse 3.0 switched to OSGi in 2004 from a proprietary plugin system; many other systems that had proprietary kernels (JBoss, WebSphere, Weblogic) have followed suit and based their runtimes on an OSGi kernel as well.
More recently, Project Jigsaw has been created in order to modularise the JDK itself. Although an internal part of the JDK, and with the potential for it not to be supported by other SE 7 implementations, the use of Jigsaw outside the JDK is not prevented. Jigsaw is also likely to be the reference implementation for the aforementioned JSR 294, although work is still ongoing. The requirement for a minimum version of SE 7 (coupled with the fact that there is no Java 7 at the moment) means that Jigsaw is still a work in progress and that it isn't generally available for systems running on Java 6 or below.
To encourage adoption of a standard modularisation format, the JSR 294 expert group is currently discussing the simple module system proposal; one in which producers of Java libraries (such as those found in the Maven repository and from the likes of Apache.org) can provide meta information that will be consumable by both Jigsaw and OSGi systems. Combined with minor changes to the Java language (the most notable being the addition of the module
keyword), this information could be generated at compile time by sufficiently advanced compilers. Runtime systems (like Jigsaw or OSGi) could then use this information in validating the set of installed modules and their dependencies.
Summary
This article discussed the general concepts of modularity and how it is achieved in Java systems. Since the compile time and runtime paths may be different, it's possible to have inconsistent library requirements leading to dependency hell. However, plugin APIs allow many types of code to be loaded which must follow the host's dependency resolution, which increases the possibility of such an inconsistency occurring. To prevent this, runtime modularity systems like OSGi can validate the set of requirements ahead of time to determine whether an application can be correctly started instead of failing in a silent or undetectable manner at runtime.
Finally, work is ongoing on the JSR 294 mailing list to create a module system for the Java Language which can be defined in its entirety in the Java Language Specification, in order to allow Java developers to generate versioned modules with encoded dependency infomration, which can subsequently be used in any module system.