Accessing data and services via the World Wide Web and its HTTP protocol is challenging. There have been many attempts to leverage the Web and HTTP through various designs aimed at offering efficient, concise and versionable systems - most under the umbrella of Service-Oriented Architecture. An approach that has gained a lot of attention recently, REST, relies on a URL to identify services and information. However, the Web is a dynamic, constantly changing information environment with new content and URLs being added all the time, whereas implementation code (particularly code which has been deployed and is now publicly accessible) is more difficult to change without causing problems for developers, system administrators and users. What is needed is a mechanism that can fit between the potentially fluid world of URLs and the more static world of compiled and deployed code. Such a mechanism must provide a binding between the URLs and service implementation code as well as be able to buffer and isolate the changes in the former from the code.
In software, a formal grammar is used to define the syntactic structure of textual information, such as a program, data file, or URI identifier. Programs use grammars to direct them to recognize when textual information adheres to a defined syntax as well as to parse the textual information. Programs can also use grammars to generate text that adheres to the syntactic rules. The following diagram illustrates a Recognizer/Parser program using a supplied grammar to parse the string "Part1Part2Part3" and assign the parts ("Part1", "Part2" and "Part3") to three variables.
While developing NetKernel 4 [1] we realized that a grammar based recognition and parsing technology could be used to process request identifiers and simplify software development on the platform. The NetKernel 4.0 Grammar technology [2] is a bi-directional mapping mechanism that implements this idea; it will both parse an identifier into parts and build an identifier from supplied parts. The NetKernel 4 grammar technology can be leveraged when implementing REST web services to perform the function of recognizing and binding web service identifiers to web service implementation code.
Outside -> In
To start, we will look at the use of a grammar based parser to handle the information coming from the outside, in the form of the REST web service identifier, and convert the parsed identifier text into values associated with internal named arguments. In our example we will use one of the Twitter REST web service APIs [3], which has the following general form:
http://www.twitter.com/statuses/user_timeline/{user-id}.{representation-type}
This diagram illustrates the use of a grammar driven parser to recognize the Twitter web service identifier, parse the user identification and representation information, and assign that information to the named arguments representationType
and twitterID
.
The following NetKernel grammar will recognize this set of identifiers:
<grammar> http://www.twitter.com/statuses/user_timeline/ <group name="twitterID"><regex type="alphanum"/></group> . <group name="representationType"><regex>(xml|json)</regex></group> </grammar>
The grammar includes fixed text ("http://..." and ".") as well as two groups. Each group defines a section of the identifier that is to be recognized using a regular expression. Because each group has a name attribute, the grammar engine will assign the parsed text portion of the identifier to the specified named argument. For example, the second group will recognize either a trailing "xml" or "json" and assign that value to the named argument representationType.
The following table illustrates how the grammar directs the parsing of example identifiers
URI | twitterID | representationType |
http://www.twitter.com/statuses/ |
demo1060 | xml |
http://www.twitter.com/statuses/ user_timeline/pjr1060.json |
pjr1060 | json |
In NetKernel, an endpoint is declared with a grammar and its Java implementation class. In our example, the following endpoint declaration will cause NetKernel to associate the Twitter grammar with an instance of the Java class org.ten60.demo.grammar.UserTimelineAccessor
.
<endpoint> <grammar> http://www.twitter.com/statuses/user_timeline/ <group name="twitterID"><regex type="alphanum"/></group> . <group name="representationType"><regex>(xml|json)</regex></group> </grammar> <class>org.ten60.demo.grammar.UserTimelineAccessor<class> </endpoint>
When an identifier is presented to the endpoint, the endpoint delegates to the grammar engine the job of recognizing and parsing the identifier and assigning portions of the identifier text to twitterID
and representationType
. Those values are available to the UserTimelineAccessor
instance through the context argument of the onSource(...)
method. The following Java code [4] is the implementation of the endpoint functioning as a reflection service [5], simply returning the information provided in the identifier:
package org.ten60.demo.grammar; import org.netkernel.layer0.nkf.INKFRequestContext; import org.netkernel.module.standard.endpoint.StandardAccessorImpl; public class UserTimelineAccessor extends StandardAccessorImpl { public void onSource(INKFRequestContext context) throws Exception { // Request the portion of the identifier that provides the Twitter ID String userID = context.getThisRequest().getArgumentValue("twitterID"); // Request the portion of the identifier that provide the representation type String repType = context.getThisRequest().getArgumentValue("representationType"); // Return a representation that simply reflects the information parsed from the identifier context.createResponseFrom("Request made for [" + userID + "] with type [" + repType +"]"); } }
Note that the compiled Java code is de-coupled from the structural form of the identifier. If the identifier for the service changes, a different grammar could be used to map the new identifier structure to the existing code. For example, let's say that the Twitter service introduces a version 2.0 API that provides a new way to request existing services. If the new API 2.0 URL has the form
http://www.twitter.com/2.0/user/timeline/status/{titter-id}.{representation-type}
Then the new API can be mapped to the existing Java class with the following endpoint declaration:
<endpoint> <grammar> http://www.twitter.com/2.0/user/timeline/status/ <group name="userID"><regex type="alphanum"/></group> . <group name="type"><regex>(xml|json)</regex></group> </grammar> <class>org.netkernel.UserTimelineAccessor<class> </endpoint>
In NetKernel both endpoints can exist simultaneously and use the same implementation class.
Inside -> Out
Now, let's switch this around. Instead of processing requests from the outside, let's use a grammar to create requests inside our code that will allow us to access an outside service. We again use the Twitter service as our example. To create a request to the Twitter service we first define an endpoint that specifies the Twitter grammar:
<endpoint> <id>twitter:endpoint:status</id> <grammar>http://twitter.com/statuses/user_timeline/ <group name="twitterID"><regex type="alphanum"/></group> . <group name="representationType"><regex>(xml|json)</regex></group> </grammar> <request> <identifier>res:/foo</identifier> </request> </endpoint>
The important parts of this endpoint are the id and grammar elements (the request element must be specified but is not used in our example). The grammar element specifies the Twitter grammar that we saw earlier. The id element defines an endpoint identifier that we use in our code to retrieve the grammar. To see how this is done, look at the following code fragment from a NetKernel endpoint implementation:
String repType = "json"; String userID = "pjr1060"; // Create a request that retrieves and binds to the Twitter grammar INKFRequest request = context.createRequestToEndpoint("twitter:endpoint:status"); // Transfer local variable values to the named arguments in the Twitter grammar request.addArgument("twitterID", userID); request.addArgument("representationType", repType); // Now we can issue a request to Twitter by issuing the constructed request // Issue request to Twitter and capture the response INKFResponseReadOnly response = context.issueRequestForResponse(request); // Return the response from the external service as our response context.createResponseFrom(response);
The following diagram illustrates the request object being bound to the Twitter grammar and constructing an identifier from the supplied parts.
Deep Inside
The concept of using a grammar to parse and build identifiers can be taken to the logical extreme deep within software to decouple a requestor and implementor through an associated identifier. In fact, this is exactly how NetKernel works. It borrows the idea of logical / physical decoupling from the Web and moves it inside software. Within a NetKernel system all functions are just like REST web service calls. For example, instead of making a direct API call to an XSLT processing engine, a request is made for the XSLT service using an identifier such as:
active:xslt+operator@res:/style.xsl+operand@res:/data.xml
This URI uses the active URI scheme [6] and includes the service name, xslt, and two named arguments operator and operand.
Why do this? Well, the Web is malleable, but physical code is harder to change; if we introduce web-like identifiers for resources and services within our software, then our software systems can take on the properties of the Web.
Nice idea, but any reasonably experienced developer will say that the performance will be ... *$"%^$# ! That is a valid concern, but it misses one of the important properties of the Web - the ability to cache representations. Because real-world systems tend to follow statistical distributions, a relatively small cache of already computed values can dramatically increase overall performance. The tricky part is - for any given system, which values do you cache? This is almost impossible to predict for hand-coded memoization. NetKernel's cache [7] takes a system-wide view and balances itself as the work load changes. So, when repeated requests are made for a resource identifier, the value can be delivered from cache or computed on demand, from any available CPU core.
Grammar details
The NetKernel grammar supports nested, optional and interleaved groups, and many more features. Please refer to the online documentation for details. When you download and install NetKernel you can use the grammar debugger called "Grammar's Kitchen", one of the many developer tools available within NetKernel. NetKernel also includes the XUnit logical level unit test framework, which allows you to build complete tests of endpoints, grammars, etc.
Companion Videos
The following video tutorials (in two parts due to YouTube's 10 minute video limit) are intended as a companion to this article, and will guide you through the NetKernel download and installation process, importing of the demonstration module defined above, and how the NetKernel Grammar debugger and Visualization tools work.
Part 1
Part 2
Summary
This article has introduced the NetKernel 4.0 grammar technology and shown that it provides critical flexibility at the boundary between REST web service identifiers and compiled code. The grammar is bi-directional and can parse an identifier into named parts or build a properly formed identifier from supplied named part values. To learn more about NetKernel 4's grammar technology, download NetKernel 4 Standard Edition from the 1060 Research web site - http://www.1060research.com. The blog durable scope, by one of the NetKernel architects, provides insights into the design and implementation of the NetKernel platform.
References
[1] The NetKernel 4 Standard Edition open platform is available for download from http://download.netkernel.org
[2] The grammar technology is described by documentation in the NetKernel distribution and in online documentation.
[3] The Twitter REST API is documented on the Twitter Developer Website.
[4] A NetKernel module that includes the source code illustrating the use of the Grammar technology is available for download. To learn how to download NetKernel, install this module and make modifications, please view the companion video, part 1. and part 2.
[5] The companion videos, part 1 and part 2, show how to augment the UserTimelineAccessor class to do more than just reflect the provided information.
[6] The active URI scheme was proposed by HP.
[7] A discussion about NetKernel caching can be found at Tony Butterfield's blog.