IBM's WebSphere feature packs are optionally installable product extensions to the application server offering new features. IBM's recently released Feature Pack for XML provides application developers with support for the most recently ratified set of W3C XML standards:
- XQuery 1.0: a newly introduced query and functional programming language designed to query collections of XML data. It uses the XPath expression syntax to address specific parts of an XML document supplementing it with "FOR, LET, WHERE, ORDER BY and RETURN" expressions. Commonly abbreviated to "FLWOR" these expressions may be used for performing joins across multiple XML streams in a similar manner to SQL.
- XPath 2.0: an expression language for working with XML documents. The result of an XPath expression is typically a selection of nodes from the input documents or an atomic value. XPath 2.0 is now a subset of XQuery 1.0. The most notable change from XPath 1.0 to XPath 2.0 is the introduction of a much richer type system: Every value is now a sequence, with a single value being treated as a sequence of length 1. XPath 2.0 supports "Schema Awareness" meaning that elements of the tree have type annotations which can be used to navigate the XPath. A processor must offer schema awareness for the built in types such as string, number, date and so on. It may optionally support user defined types, which can greatly simplify the required expressions. For example an internet mailing company might have an XML document with a billing address and delivery address associated with a customer's order. If both address fields have a common AddressType the expression for //element(*, tns:AddressType)/postalCode would return the post code for both addresses. XPath 2.0 also offers a greatly expanded set of functions and operators. New functions include regular expression syntax for pattern matching, new date functions such as current date, and new numeric functions such as floor, ceiling and round. New operators include intersect and except.
- XSLT 2.0: a programming language used to transform XML into a new XML format, or into another presentation format such as HTML, XHTML or SVG. Like XQuery 1.0, XSLT 2.0 uses XPath 2.0 as a path language. XSLT 2.0 adds a number of new capabilities including grouping, the ability to output multiple results from a single input document, and the ability to define functions in XSLT that can be called from XPath. As for an XPath 2.0 processor an XSLT processor may optionally be "Schema Aware". Being so offers a number of advantages, such as the ability to validate input trees prior to the XSLT transformation, and to validate output trees to ensure that the XSLT transformation is producing valid output. You are also able to specify data types for variables, input parameters, return values from functions, user-defined functions and templates. XSLT continues to be the primary choice for transforming XML data, whilst XQuery is expected to become the standard for querying XML documents. Whilst both XQuery and XSLT 2.0 use XPath 2.0 as a path language, the XQuery extensions to XPath 2.0 are not of practical relevance to XSLT developers.
To find out more about the feature pack InfoQ talked to its chief Architect, Andrew Spyker:
InfoQ: XML is obviously a widely used format in enterprise computing. Can you provide some examples of the kinds of applications where the new features in these standards may be of particular interest?
Given XML is so popular in enterprise computing it's very hard to talk about every scenario in which this feature pack will be used. Therefore, the following isn't meant to be complete – only illustrative.Applications that query and represent data from XML sources. As an example, a blog feed and the comments and analytics associated with the feed. Consider an application to mine all your blogs looking for questionable content posted in comments, represent the information within web forms that allow you to inspect trends and flag certain comment authors as troublesome. Over time, you’d like to store that list in a database so you can proactively identify troublesome comments. Given all input sources in this scenario are XML and XHTML (newly supported serialization format of XSLT 2.0) can be easily used for web presentation, using XML based programming models is natural. There's an example application that demonstrates this in the feature pack.
Applications that work with industry standard schemas. Almost every vertical industry works with industry standard schemas and most enterprises extend these schemas to customize them for their business. In the past, with XPath 1.0 (and dependent XSLT 1.0), the schema knowledge wasn't known by the XML runtime. This meant if you were looking for all data elements that were of type PurchaseOrder, you had to hard code every possible qualified name in your search. At best this caused hard to maintain code and at worst brittle code that fails when new types in the enterprise extend existing types. In XPath 2.0 (and dependent XSLT 2.0 and XQuery 1.0), you can now search by type in your queries.
Applications that were using XPath 1.0 and XSLT 1.0. This seems obvious, but it's worth mentioning. The XPath 2.0 and XSLT 2.0 standards considered seven years of industry usage. The new capabilities offer many new functional scenarios (some examples: collation support for multiple languages and multiple-outputs for XSLT 2.0) that were not possible before. Also, the capabilities add support for patterns that were complex to express in XSLT 1.0 (example: grouping support) which leads to less code that is easier to maintain and likely performs better than before – as the runtime now offers the support instead of being coded above the runtime.
Applications that query data across multiple data sources. While XQuery has been supported for some time in databases that support XML natively (DB2 pureXML for example), having XQuery support in the middleware allows data to be joined between these XML databases and XML stores that exist outside of the database. An example would be a batch file that worked with data in an XML database, enriched it with calls to Web 2.0 APIs, and then stored it in a second database. The XML Feature Pack supports this scenario with the thin client which can be used outside of a server JVM in a standard Java SE application, as long as it's used in support of the Application Server functionality.
As I said before, there are many more scenarios possible. The scope of all possible uses is only bounded by the amount of data or documents you store in XML in the enterprise – which is to say very large.
InfoQ: What advantages do languages like XSLT 2 and XQuery 1 have over using a Java DOM model for working with XML in multi-core or Cloud environments?
There are two main advantage of working with declarative (vs. imperative) XML centric (vs. language centric) languages such as those supported in the XML Feature Pack.First declarative programming asks the user what they want to do. This is as opposed to imperative programming (ex: Java code working with the DOM or JAXB APIs) which asks the user how they want to do what they want to do. Declarative programming leads to smaller, easier to maintain code that adapts faster to change. It also allows a user to express to the XML runtime what they are interested in or how they want to query or transform data in a way that allows the runtime to optimize in ways not possible when the user tells the runtime exactly how to execute. This difference is very important in multi-core and Cloud as you can imagine optimizations that not only recognize patterns that can be better executed differently on a single CPU, but also executed better on multiple CPUs or across virtual environments. Also, as XSLT and XQuery are functional and side-effect free, you can execute such optimizations with complete safety – something that isn't possible in imperative languages using typical programming APIs. Also, I personally believe, long term, higher level declarative languages will have a better fit with the cloud as they will be more portable than lower level languages that assume a specific runtime.
Second, XML centric (vs. Java or C# or some other language) is all about having the XML type system as the core type system. Mapping the XML type system to the native language type system has two disadvantages. First, there is a XML fidelity issue. Mapping XML to an object representation is very hard and, if done poorly, can result in loss of information. APIs such as JAXB 2.0 do a very good job of handling this problem, but in the end to map to a perfect loss-less representation of XML nothing is better than a pure XML model. The second is as we convert the type system or convert to a DOM model, we add significant performance overhead. Essentially this is a data copy which in middleware is very expensive and should be avoided. By working with XML data in its native representation, we can be sure no data is lost and the performance is optimal. Having these two benefits is important given XML has become such a popular data interchange format between businesses and within the enterprise.
I should note that we understand that people won't change entire existing applications over to XML programming models. Therefore, we allow an easy way to add existing Java data and logic to the XML runtime execution. This extension is done through the same consistent API regardless of which XML language is being used. This consistency is important as it unifies the programming experience across all the three languages and allows data to be pipelined efficiently from XQuery 1.0 to XSLT 2.0 for example. We also designed the API to ensure easy and performant multi-threaded server usage of the technology. Previous XML APIs were designed for client environments, which favor simplicity of single invocation over easy to code performant multi-threaded support. Our API adds thread safety to all shared objects in a way that makes coding for server usage natural and performant.
InfoQ: How do the W3C standards compare to Microsoft's LINQ for working with XML?
A key difference is that the XML Feature Pack implements publically developed W3C standards for XML processing. Having a foundation in standards allows organizations and developers to leverage a consistency in XML programming model skills and tools across multiple implementations of the standards.
InfoQ: What advantages does the feature pack have over Michael Kay's Saxon implementation?
Let me say first that from all industry feedback I've seen - Saxon is a great implementation of XPath 2.0, XSLT 2.0, and XQuery 1.0. Also, Michael Kay along with contributors from IBM and other companies have done great work driving the W3C standards in the XML space.Both Saxon and the XML Feature Pack implement a similar set of standards. So from a programming model standpoint there is consistency. From an operations perspective, WebSphere customers can get the assurance of an XML implementation backed by IBM, tested, and supported along with their current WebSphere Application Server V7 entitlement. As we talked about above, there are many scenarios where we see our customers wanting to implement XML based solutions on top of WebSphere. Given this realization, we wanted to offer the new capabilities offered through these standards in a form that our customers can rely on now and in the future.
InfoQ: Do you have a sense of when we can expect to see the technologies standardised into the Java platform? What will happen to your implementations when that happens?
Today, Java supports XPath 1.0 and XSLT 1.0 through JAXP, but not XPath 2.0 and XSLT 2.0. Also, there is a JSR to support XQuery 1.0 called XQJ. We should consider the best way to implement generalized XQuery in Java in the future. Personally, I have concerns about the connection oriented XQJ API and how users would need to learn multiple APIs (JAXP and XQJ) to work across the XML standards. The Java platform seems to be adapting as the industry needs it to, so I expect the platform to adopt and support these standards over time given the industry support and customer need. IBM will continue to drive XML into the Java standardization process in a matter that makes sense for users.If the Java platform evolved to handle XPath 2.0, XSLT 2.0, and XQuery 1.0, our implementation would remain. IBM ships its own implementation of XPath 1.0 and XSLT 1.0 in the IBM JVM today. IBM has had a history, while supporting the Java standards, of providing performance, reliability, and functional value add in the XML parts of the JVM. These improvements would remain continuing to power XML processing, web services and SOA across all of the WebSphere products.
More information on the WebSphere XML Feature Pack can be found on the WebSphere community blog here. There is a getting started guide, including installation instructions, available via YouTube here.