During the NACA project run by Publicitas Ltd., 4m lines of COBOL were automatically trans-coded (migrated) toward their Java equivalent. The company claims that the recurrent annual savings in cash-outs amount to a total of 3m euros and has released the tools from the NACA project under GPL.
Didier Durand and Pierre-Jean Ditscheid made a presentation about it at at Jazoon09 last week, which is available online.
Pierre described the architecture of the "transcoding compiler":
- many levels of cache to maximize performances of the new Java version of the old application. Through them, our Java-transcoded transactions and batches have better performances than their Cobol ancestors used to have on mainframe.
- pre-allocation of all program variable structures (COMMAREA of COBOL) to further improve performances but also to minimize garbage collection that freezes the system while running.
- strongly object-oriented architecture of resulting Java objects in order to maximize the effect of all controls done by compiler. As example, each old COBOL program becomes a Java class whose existence is checked at compile-time rather than at runtime. Very useful when your application is 4 millions lines of code like ours and when you want to track down every typing mistake in a continuous integration architecture like ours
- strong integration with Eclipse IDE for highest productivity for developpers: we even developed a plug-in to facilitate debugging and edition of old COBOL programs from Eclipse
- line-by-line equivalence between old COBOL programs and newly transcoded Java classes. The home developers don't get lost: they receive afterwards a Java application with the exact same structure as the original COBOL version
- support of IBM JVM as well as Sun JVM in order to also allow for the transcoding of stored procedures
- support of distinct character sets and encoding schemes (EBCDIC) between mainframe & Linux. Support of all resulting possibilities for data sorting.
- full management of multi-level COBOL data structures in Java independently of the UTF encoding (2 bytes per char) used by Java
- transparency of wrapping framework (raw JVM, Apache Tomcat, etc...) for the application
- etc...
While Didier emphasized the key aspects of such a project:
- economic motivation as core driver: move from a multi-million (CHF or euros) mainframe environment to an incredibly cheap and nimble farm of Linux Intel-based servers. The massive savings (3 millions euros / year in our case) allow for a quick auto-financing of the project far before its end. The main virtue of Open Source for a company like us remains clearly its very very low price.
- migrate people with technology: we believe that we succeeded in our project because we clearly demonstrated very early on to the people in place that they would find a new interesting job in the final constellation. That generated their full commitment to the project!
- iso-functionality as a must: migrating in such a manner prevents months of discussion about the final target. But, mostly, it allows for 100% automatic migration, a key factor for quality in the transcoding.
- no big-bang but numerous reversible steps: such a total migration with (tens of) thousands of new steps can never successfully reach the ends if you try big steps. Permanent incremental progress toward the goal is a much better approach. The nice consequences: small steps generate smaller local trouble when problems arise. Your users remain much more patient this way! Our experience was so...
The tools can be downloaded and the distribution contains:
The tools that we deliver today (v1.0) in the zip package:
- Doc: a set of documents explaining in details the tools and libraries. Your feedback around this documentation, its missing points, etc. is essential in order to improve it.
- NacaTrans (license GPL - approx. 83′00 lines of code code & 690 Java classes): our transcoder that allowed us to convert 100% automatically the 4 millions lines of our PUB 2000 application in COBOL to Java. It is very much based on compiler technologies. It analyzes the structure of the initial COBOL programs (supposed 100% valid) to bring them in an intermediate xml structure before generating the final Java code that extensively calls functions and uses classes of the runtime library NacaRT, itself depending on JLib. This new Java source code was very carefully designed: each line of Cobol generates very intentionally a single corresponding line of Java. The aim is to look like as much as possible like the original COBOL code in order to ease the maintenance by the original developers / maintainers who master the structure of their original Java programs. The completeness of the accepted syntax for all variants of Cobol is of course not guaranteed. But our own 4 millions of lines of code as well as additional tests on other external application tend to prove that the current coverage of Cobol by NacaTrans is already very high. We want to improve this coverage through your feedback for valid constructs that we don’t support yet.
- NacaRT & Jlib (license LGPL - approx 153′000 lines of code & 890 Java classes): those are the 2 runtime librairie who provide all the runtime transactional services for the application. They emulate all teh functions of a classical transactional monitor like CICS from IBM. At the same time, they also support all the COBOL constructs (for example, COMMARÈA structure with multiple data representation masks, management of specific data format like COMP-X, etc.)
- NacaRTTest (license GPL): this is a test suite allowing us to test the correct output of the transcoder on a set of reference COBOL constructs. It’s the way to validate part of our transcoding algorithms. For a new user of NACA, this is definitely the place to start: when this runs on oyur infrastructure, you can feel pretty confident about your installation of the package.
With a legacy of 50 years of COBOL and around 250bn LOCs in production, it seems there is a considerable market for similar tools.