Hypermedia WTF?!
Chances are that if you've heard of the REST architectural style, that you've also heard about what some consider it's most important constraint, the uniform interface, in particular the aspect of that interface that constrains the methods that can be invoked on resources. What you may not realize though, is that there's quite a bit more to the uniform interface than that. In particular, there's a sub-constraint that goes by the unwieldly name of "Hypermedia as the engine of application state", which is arguably the most important constraint of REST in the sense that it alone provides the bulk of the "shape" of RESTful systems as we know them.
In this article we'll do the deep dive on this constraint, trying to figure out what it means, and understanding its value.
The definition
Unfortunately, the REST dissertation doesn't expand much upon this constraint beyond providing its name plus a description of what it looks like in action;
The model application is therefore an engine that moves from one state to the next by examining and choosing from among the alternative state transitions in the current set of representations.
While that provides us a useful description, it doesn't, in my opinion, help us really understand the scope of the constraint itself; what exactly it allows, and therefore, what it disallows. To start then, it might be worth seeing what information we can extract from the name of the constraint itself.
"application state" refers to the state that determines "where" the user is in the process of completing a task. For example, when doing personal banking, is the user currently viewing account balances, filling in a bill payment form, or about to order new cheques? Those are each different application states. Some people mistakenly believe that "state" here refers to resource state, which would include, in this example, the balance of the accounts or the list of recent payments made. That isn't the case.
Application state is also known by the name "session state", and is the kind of state referred to by REST's "stateless" constraint, which requires that the client maintain it exclusively. In contrast, if you were using a remote session technology like VNC or Windows Remote Desktop, then the application state is kept entirely on the server.
The word "hypermedia" was coined by Ted Nelson in 1962 as a generalization of "hypertext", an invention of his. Whereas hypertext entailed interlinked textual documents, hypermedia expanded the scope to any form of media. The key with both, of course, is the embedding of links in the content we use.
The constraint in action
REST started getting some traction with developers working with Internet-facing services in 2003/2004 - at least with them actually associating their services with the moniker "REST" - most visibly with two high profile, self-described "REST APIs": Flickr, and Amazon. Interestingly, both services also offered parallel SOAP based interfaces, but neither saw much use compared to the "REST API". As a result, the REST community embraced these services and used them as a means to further explain the value and appeal of using the REST style on the Web. Unfortunately, there was a problem with these APIs: they weren't fully RESTful, as they were disregarding (at least) one of REST's constraints. In fact, there's a lot more to the problems with the Flickr API (as well as Amazon's, del.icio.us's, and several others) than we can cover here, so we'll just stick to those problems that relate to hypermedia.
Luckily, we need not look far for those problems. Take some sample data returned from the "flickr.contacts.getList" operation which a user can use to get a list of their own contacts;
<contacts page="1" pages="1" perpage="1000" total="3">
<contact nsid="12037949629@N01" username="Eric" iconserver="1"
realname="Eric Costello"
friend="1" family="0" ignored="1" />
<contact nsid="12037949631@N01" username="neb" iconserver="1"
realname="Ben Cerveny"
friend="0" family="0" ignored="0" />
<contact nsid="41578656547@N01" username="cal_abc" iconserver="1"
realname="Cal Henderson"
friend="1" family="1" ignored="0" />
</contacts>
There, the "nsid" attribute contains a value which is a unique identifier for individual contacts, in this case three of them. But once a client has retrieved that document, then what? What if they want to know more about Cal Henderson? A quick check of the Flickr API documentation reveals an operation called "flickr.people.getInfo" which takes an nsid as an argument, and returns more information about the person identified by that nsid string. So the required URI which we can use in an HTTP GET message to find out more about Cal would be;
http://api.flickr.com/services/rest/?method=flickr.people.getInfo?auth_key=xxxx&user_id=41578656547@N01
This is not hypermedia. A hypermedia solution would have used standardized identifiers - URIs, for the Web - instead of proprietary ones, thereby avoiding the need for Flickr-proprietary knowledge for a client to go from a document with a list of people, to a document about one of those people. If standardized identifiers were used, then that first document would look something like this;
<contacts page="1" pages="1" perpage="1000" total="3">
<contact nsid="http://api.flickr.com/services/rest/?method=flickr.people.getInfo?auth_key=xxxx&user_id=12037949629@N01" username="Eric" iconserver="1"
realname="Eric Costello"
friend="1" family="0" ignored="1" />
<contact nsid="http://api.flickr.com/services/rest/?method=flickr.people.getInfo?auth_key=xxxx&user_id=12037949631@N01" username="neb" iconserver="1"
realname="Ben Cerveny"
friend="0" family="0" ignored="0" />
<contact nsid="http://api.flickr.com/services/rest/?method=flickr.people.getInfo?auth_key=xxxx&user_id=41578656547@N01" username="cal_abc" iconserver="1"
realname="Cal Henderson"
friend="1" family="1" ignored="0" />
</contacts>
Ok, but what would making this change to hypermedia gain them or their users?
Flickr's current approach of requiring that clients possess Flickr-specific knowledge in order to progress from one application state to another, is simply another way of saying that they have a proprietary application model. Not only is it proprietary though, but it's not even a consistent model within the Flickr API itself, as the knowledge needed to go from a list of contacts to information about one of the contacts (as above), is different than the knowledge needed to go from a contact to that contact's list of photos (which is "flickr.photos.getContactsPublicPhotos". This presents evolvability problems for Flickr, as even simple extensions to the API can easily require new knowledge to be disseminated, in turn requiring changes to client code. A generic client such as a search engine would be unable to index Flickr content via this API, as I'm sure that the maintainers of the search engine would have little interest in upgrading their software each time Flickr - or anybody else using such an application model - extended their API. Again, this isn't specific to hypermedia: any standardized application model would provide the same benefits. Of course, the hypermedia model has proven itself quite popular, even if those using it didn't realize that's what they were doing.
So by using a common application model, one that is not just standardized, but fixed for all time, you are reducing coupling between consumer and producer by permitting each to evolve independently of the other. This way, old and new services can be combined together into a composite application, and old and new clients can be the ones doing that combining. I suppose we take it for granted in our use of the Web that one can simply include a link in a document to a page authored years ago and a consumer of that content can seamlessly navigate between them without having to download a new version of the browser. That's by design, not by accident.
It should also be noted that the Web by no means has a monopoly on the use of hypermedia. Another pervasive application we all use daily on the Internet does too; email. Every email message includes headers which carry email addresses for the sender and the recipients, and possessing one or more of those is sufficient information with which to send another email message.
While we're discussing little known facts, you might also be interested to know that an important aspect of the Web itself does not use hypermedia; robots.txt, aka robot exclusion. How it works is that sites that wish to exclude search engines from indexing some of their content simply place a file at their "/robots.txt" URI which describes what shouldn't be indexed. The thing is though, that as far as I know, ''hardly anybody links to a robots.txt file''. And why should they? "/robots.txt" is a fixed and well known location, especially to search engines: give any URI, they can construct from it the corresponding robots.txt URI for that domain. This is not hypermedia though, because the link isn't dynamically discovered in another page, it's known a priori by the search engine. That's not to say this is a bad solution, because the hypermedia approach would have required two network round-trips (one to discover a page which linked to robots.txt, and another to grab it), which would be a burden to all parties. So there you have a good example of the cost of hypermedia. Keep in mind though, that there are few cases where this is really the best approach. Sitemaps might be one for the same reasons as robots.txt, but others such as "favicon.ico" and Apple's new iPhone WebClip feature would likely have benefitted somewhat from the use of hypermedia; for example, those icons would be picked up by an image search engine without updating the search engine software.
Another technology that deserves some attention when considering hypermedia is WADL, the Web Application Description Language. Though a self-described "RESTful description language", there's an important caveat. Consider this example snippet from a WADL file:
<resources base="http://service.example.com/myservices/">
<resource path="search">
<method name="GET" id="search">
<request>
<param name="query" type="xsd:string" style="query" required="true"/>
</request>
<response>
<representation mediaType="application/xml" element="yn:ResultSet"/>
<fault status="400" mediaType="application/xml" element="ya:Error"/>
</response>
</method>
</resource>
</resources>
The file declares a "search" resource as part of a collection of "myservices". It describes, via the declaration of use of HTTP GET and of the "query" parameter, how a client can construct a URI given an input string of its choosing.
On the face of it, this appears a perfectly RESTful, hypermedia based solution, very similar to how an HTML form (or URI Templates) is used. So what's the caveat? It's the issue of when the WADL is consumed. Some Web services proponents who have taken an interest in Web based solution have been using WADL as they would use WSDL, as a design-time artifact. Using WADL this way though, is akin to developing a Web browser with built-in knowledge of, say, the Google homepage HTML form at the time the browser was compiled: if Google changes the form in a backwards-incompatible way (i.e. not just adding a new optional parameter) after the browser is deployed, then the browser will not be able to use that resource/service. Using the hypermedia constraint with WADL means that the client should consume the WADL at runtime. So be careful when choosing your WADL tooling, as some tools that try to help you, don't.
Conclusion
Hopefully the value of the hypermedia constraint is a little more apparent now than it might have been. More than that though, what I really hope is that you're better able to know what practices you need to avoid when you've decided to use it.
Keep in mind that hypermedia is one of the uniform interface constraints, so the latter's more general litmus test applies: if you're developing client-side code which makes assumptions which aren't true of all resources (or server-side "APIs" which requires this of clients), then you're not using the uniform interface.
About the author
Mark Baker is well-known in the SOA and Web services community because of his continuous efforts to promote an architectural style called REST (REpresentational State Transfer), criticizing many of the standards and specifications as being ignorant of what made and continues to make the Web successful.