Today’s web applications have become critical across most facets of our personal and business lives. They span from social media networks, to online shopping, to business applications, all the way to home appliance configurators. Despite this proliferation, user experience of web applications has not reached the levels that are enjoyed by native and desktop applications. This is primarily due to the way web applications rely on a one-way HTTP protocol. WebSocket changes this – it brings in a new foundational element to browser and server interaction that creates the much-needed basis for creating applications that provide a truly interactive experience.
Originally, web technologies were based on the HTTP protocol which is a simple request – response type of protocol. All requests are originated by the client. This framework was sufficient until developers started building web applications where client initiated communication had serious limitations. Several workarounds were proposed, but they were still based on HTTP protocol and utilized poll or long poll approach (e.g. Comet). Comet freed request servicing threads to prevent server resource exhaustion. Since polling mechanism were unreliable, a full-duplex type communication named WebSocket was proposed in 2007. It took 4 years to bring the original proposal to a standard. However, despite being a standard, it has only reached a very limited use. This article explains the two main reasons that have hampered WebSocket adoption and proposes a design framework that developers can use to rapidly harness the potential of WebSocket and enrich application experience significantly.
The first reason for lack of WebSocket adoption has been a limited support in application servers and browsers. However with new generation of application servers and browsers, this issue is significantly addressed. The second, and the more important reason, is that opening the full potential of WebSocket requires significant web application redesign. The redesign involves going from a basic primitive of request – response to a more sophisticated primitive of bi-directional messaging. Application redesign is typically a costly process and vendors do not see clear benefits of going that route.
We will start with by giving a short explanation of WebSocket, then present a methodology for rebuilding an application using WebSocket, and finally present a simple example to illustrate the points.
WebSocket in a nutshell
WebSocket is a framing protocol over TCP/IP protocol. It is initiated by a client using a special HTTP request to the server and, after the initial handshake, the client and server can freely and asynchronously send frames to each other. There are two types of frames: control and data. The minimal control frame size is 2 bytes while the data frame is starts at 6 bytes for a client and 2 bytes for a server. Data frames can be either text or binary. Text frames are UTF-8 encoded. Frames can be chunked; meaning that a big data set can be split over several frames. WebSocket doesn’t introduce any identification associated with frames. Therefore it isn’t allowed to mix frames of different messages; only control frames can appear in a sequence of intermediate frames of a big message. A more complex protocol can be defined on top of base frames. For example, a frame can carry check sum or its sequence number
APIs for WebSocket
WebSocket isn’t associated with any particular programming language, system, or operating system. Implementations are available for most popular programming languages and many browsers support it. Although there are many standards for different platforms and languages, this article focuses only on JavaScript HTML5 and Java (J2EE) WebSocket support. There are two implementation standards and the latest versions of these are Hixie-76 and HyBi-17 (later denoted IETF RFC 6455) on browser side. HyBi looks like more advanced and currently all modern browsers support it. Java based implementations are most popular for a server side. There were several early implementations for WebSocket that were transformed to JSR 356 later.JSR stands for Java Specification Request. An introduction of the specification request helps to make any further implementations consistent and easy to use. It also eliminated a dependency on a particular vendor for developers. JSR 356 is separated from the servlet specification although it allows accessing certain servlet objects. JSR 356 covers as a client as a server side of WebSocket connection. Further discussion is focused on a server side using in a conjunction with a browser JavaScript. JSR356 is currently part of J2EE 7. All popular open source Java app servers such as Tomcat, Jetty, Glassfish and TJWS support it. There are about 20 more different standalone WebSocket server solutions for Java and some of them also JSR356 friendly. Commercial application servers from Oracle, and IBM support WebSocket as part of J2EE 7.
As I said, WebSocket is a messaging protocol. API provides methods to send and receive messages for both ends of a communication. There is no classic subscriber publisher relation. Only two type of messages are considered as a text and a binary. However, any logical separation of messages is possible in primary type’s message handlers. Java provides a way to work with chunked messages consisting of parts. JavaScript doesn’t offer the level of control though. As it was mentioned WebSocket is a very generic protocol. It is possible to specify during a handshake, the required logical sub-protocols. It simplifies using WebSocket for system integration when different systems can verify capabilities of a connected system to support certain logical sub-protocols and extensions. WebSocket frame format allows the use of negotiable extensions on top of it. It means that generally more information can be supplied with frames and also more different frame types can be introduced.
Browser JavaScript
Because the WebSocket protocol handshake is client initiated, JavaScript includes the WebSocket interface encapsulating all WebSocket operations.
The interface is standardized [1] and defined in IDL as below
[Constructor(in DOMString url, in optional DOMString protocols)]
[Constructor(in DOMString url, in optional DOMString[] protocols)]
interface WebSocket {
readonly attribute DOMString url;
// ready state
const unsigned short CONNECTING = 0;
const unsigned short OPEN = 1;
const unsigned short CLOSING = 2;
const unsigned short CLOSED = 3;
readonly attribute unsigned short readyState;
readonly attribute unsigned long bufferedAmount;
// networking
attribute Function onopen;
attribute Function onmessage;
attribute Function onerror;
attribute Function onclose;
readonly attribute DOMString protocol;
void send(in DOMString data);
void close();
};
WebSocket implements EventTarget;
Constructor of a WebSocket WebSocket takes two parameters:
- WebSocket URL
- Optional an array or an element of required sub-protocol(s)
A WebSocket URL starts with ‘ws’ indicating it is WebSocket protocol and the rest is the same as an HTTP protocol URL specifying host, port, path and query. Extra ‘s’ is added to protocol name to indicate needs of use of a secured connection.
Four message handlers can be specified: onopen, onmessage, onclose, and onerror. The method send has to be used to send a message, and the method close has to be used to close a connection. There is no method like connect, therefore client has to listen to onopen message to confirm that connection is established and only after that send operations can be used. Another option is to poll the readyState property of WebSocket object, however this isn’t recommended. Obviously send operation is always allowed in onmessage handler. The send operation is executed asynchronously by a browser, it means JavaScript gets control back without waiting when actually a message was delivered to a receiver. There is no difference in receiving text and binary messages, so the message type has to be checked in the event data parameter of onmessage handler. WebSocket API exposes several properties for obtaining status, binary message format and some other purpose. A vendor specific implementation may include more properties, therefore consult with your browser documentation for details.
Java WebSocket
Java JSR356 defines common (client) and server part of Java WebSocket communication API. The Java implementation specifies endpoint and server endpoint objects similarly to WebSocket in JavaScript. Annotations are used to mark a certain Java class as an endpoint object. Event handlers are specified by using the annotations OnOpen, OnMessage, OnError, and OnClose. The important Session object can be taken as a parameter in all types of handlers. The Session gives access to send messages capabilities and also can keep WebSocket connection associated state attributes. Sending a message is possible using synchronous and asynchronous mechanisms. A timeout can be configured for both types of sending. Binary and text data can be automatically converted to any Java object by specifying decoders. Encoders allow arbitrary Java objects to be sent over WebSocket. Only one message handler can be specified for text and binary message type in scope of one WebSocket URL path. There is no chaining of messages, although it still can be organized programmatically. The Java API is straightforward. It provides a configuration object which can be customized and affect initial handshake process and make decision about supporting sub protocols, versions, and give an access to important servlet API objects. In addition to annotation based deployment, endpoints can also be established programmatically.
Rethink of web application
WebSockets are natural for development in types of applications such as:
- Games with real-time player collaboration
- Real-time monitoring systems
- System requiring user collaboration like chat, shareable document editing, etc.
However, WebSocket can be applied to traditional web applications with certain benefits.
The majority of web applications are designed on a request - response paradigm. Although AJAX allowed for asynchronicity , there is still artificial waiting for a response before proceeding to the next step of execution. Since a WebSocket connection is established once, it eliminates needs of reestablishing of a connection for every data exchange, and eliminates sending redundant HTTP headers in a sequential communication. These benefits are especially critical for SSL type of connections where initial connection handshake is a costly operation. Browser WebSocket sends are truly asynchronous and Java server side code can send messages without waiting for when they are requested. Such freedom of sending messages can require some bookkeeping organization to keep an application state consistent. It is also possible to mimic request response paradigm using WebSocket. However it can reduce major WebSocket benefits as a truly asynchronous bi-directional messaging system. All said above encourages developers to rethinkapplications design in some cases.
Consider an application generating complex user interface with several areas requiring an extensive server work to generate their content. A traditional AJAX based implementation can use a lazy mechanism to render the areas issuing a content request calls. However in the case of using a WebSocket, a server can deliver the content in a browser as it gets ready without responding to particular AJAX requests. The drawback of AJAX requests is that their server side processing can be not in an optimal order due to browser’s requests serialization. Giving a possibility to a server deciding the best way of a content calculation can improve an overall responsiveness of a web application.
Few more considerations are required for efficient use of a WebSocket. Since the network connection can be dropped at any time, preventing the data from being delivered, some additional bookkeeping may be required for critical data. Generally, every received message has to provide sufficient information to identify how to process it. There is no way to get information who requested the message: was it requested by client or did the server want to update something. Web application design can be rethought even more when WebSocket is used. JavaScript code functionality can be migrated to a server. For example, a user input can be instantly sent to a sever for processing. It can help in implementing sophisticated data validations when JavaScript power is insufficient for that. A user input can be also stored on backend at this time. It can eliminate a final data submission from a browser to a sever and extra data validation, because the data was already validated at a storage time. Such increasing of a server side code role can be considered as another swing from a thick web client to a thin one.
What you should aware using WebSocket
Using WebSocket brings particular challenges in web application development. WebSocket session has no relation with HTTP session, though it can be used for a similar purpose. Certain common data can be associated with the session so all messages processing can rely on certain state and data maintained in the session. WebSocket sessions can time out just like HTTP sessions based on idle (inactivity) interval. However, some system can automatically keep sending ping control messages preventing clear detection of timeout. JSR356 advises to synchronize HTTP session timeout with WebSocket session. If the HTTP session times out, then all WebSocket connections created in its scope have to be closed as well. However web application can be designed without establishing any HTTP session or session timeout can be managed from JavaScript detached from HTTP session, so this mechanism can’t be reliably propagated.
Another specific is a browser can use a connection pool and reuse connections to accessing same web site and as result certain processing can be serialized. If a browser also applies a pool for WebSocket connections it can be serious limitation, because without any mechanism of closing WebSocket connection, it can stay active forever, and other attempts to create new connections will be deadlocked. Therefore a best practice recommendation is to use only one WebSocket connection.
A browser can’t cache data transferred over WebSocket. Therefore using WebSocket for transferring a browser’s cacheable resources isn’t an efficient approach.
WebSocket vs RESTFul
There are many discussions on net regarding RESTful and WebSockets.[2] However, most of these discussions compare apple and oranges. REST is defined as Representational State Transfer and relies on underlying HTTP in most cases. It means that REST is a request - response protocol. REST was never standardized, so mostly any communication over HTTP can be claimed as REST in certain scope. REST is usually associated with Create, Read, Update, and Delete operations (CRUD) mapped to corresponding HTTP methods PUT, GET, DELETE. WebSocket deals with messaging, so there is no one scope of a single RPC. REST communication data format is usually limited to JSON and request parameters whereas a WebSocket message payload can be anything including pure binary data.
Certainly WebSocket can be used for similar purpose as REST, however it appears artificial in most cases. As it was stated different design principles have to be applied in case of use WebSocket. The table below provides major difference between these two.
WebSocket |
REST |
Standardized |
Supported |
Async messaging |
Request – response (sync) |
Underline frames |
Underline HTTP (GET, PUT, DELETE) |
Sub protocols |
Operations discoverable |
Binary and text |
Mostly JSON |
Concurrent bi-directional updates |
Mostly CRUD |
File upload Example
This example demonstrates how a file can be uploaded to a server using WebSocket. A good start is defining a server endpoint
@ServerEndpoint("/upload/{file}")
public class UploadServer {
Two message handlers are defined, one for receiving binary data of the upload file and another one for command interface. Since WebSocket allows separating text and binary messages, there are no additional efforts required for defining these two handlers. A command facing OnMessage handler is defined as
@OnMessage
public void processCmd(CMD cmd, Session ses) {
CMD class is defined as
static class CMD {
public int cmd;
public String data;
}
A decoder has to be specified to convert a text message to CMD object. It is defined as
public static class CmdDecoder implements Decoder.Text<CMD> {
There are no any requirements to have text message encoded as JSON, however this particular example uses JSON. Upload of big files happens in chunks to reduce the memory footprint. There is no way to utilize partial frames of WebSocket in a browser, therefore chunk sending is simulated using full frames. Since a browser sends all messages asynchronously, there is no indication when the server actually receives a complete file. The command interface is used for the following purpose
- Notify a server that upload process will start and specified a name of uploaded file
- Notify a server that a complete file has been sent
- Provide acknowledge to a client that a file was successfully stored
The same CMD object is reused for all these purposes. Inbound commands are handled as
@OnMessage
public void processCmd(CMD cmd, Session ses) {
switch (cmd.cmd) {
case 1: // start
fileName = cmd.data;
break;
case 2: // finish
close(ses);
cmd.cmd = 3;
ses.getAsyncRemote().sendObject(cmd);
break;
}
}
The implementation assumes that a browser will serialize all sending messages activities and all messages will be received in order of being sent. However, if a client can introduce some parallelism, then a more complicated implementation has to be used where every message carries certain id. Another solution is to send an acknowledgment for every received chunk, although this approach will kill all WebSocket benefits. Since CMD object is used for sending a message to a client, encoder has to be provided as:
public static class CmdEncoder implements Encoder.Text<CMD> {
Both decoder and encoder have to be specified in ServerEndpoint annotation as
@ServerEndpoint(value = "/upload/{file}", decoders = UploadServer.CmdDecoder.class, encoders=UploadServer.CmdEncoder.class)
public class UploadServer {
Binary message handler is defined as
@OnMessage
public void savePart(byte[] part, Session ses) {
if (uploadFile == null) {
if (fileName != null)
try {
uploadFile = new RandomAccessFile(fileName, "rw");
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return;
}
}
if (uploadFile != null)
try {
uploadFile.write(part);
System.err.printf("Stored part of %db%n", part.length);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
A handler for OnClose can be added to delete incomplete file in case of abnormal closing connection.
The client side implementation utilizes HTML5 Workers. Unfortunately, Firefox doesn’t implement cloning a File object in a worker, therefore this example can be tested using Internet Explorer or Chrome If a browser portability of the solution is important, then a worker based solution can be transformed to a JavaScript code without using a worker. Some performance degradation can be observed in this case since no separate threads (worker) are used. The worker code is:
var files = [];
var endPoint = "ws" + (self.location.protocol == "https:" ? "s" : "") + "://"
+ self.location.hostname
+ (self.location.port ? ":" + self.location.port : "")
+ "/echoserver/upload/*";
var socket;
var ready;
function upload(blobOrFile) {
if (ready)
socket.send(blobOrFile);
}
function openSocket() {
socket = new WebSocket(endPoint);
socket.onmessage = function(event) {
self.postMessage(JSON.parse(event.data));
};
socket.onclose = function(event) {
ready = false;
};
socket.onopen = function() {
ready = true;
process();
};
}
function process() {
while (files.length > 0) {
var blob = files.shift();
socket.send(JSON.stringify({
"cmd" : 1,
"data" : blob.name
}));
const
BYTES_PER_CHUNK = 1024 * 1024 * 2;
// 1MB chunk sizes.
const
SIZE = blob.size;
var start = 0;
var end = BYTES_PER_CHUNK;
while (start < SIZE) {
if ('mozSlice' in blob) {
var chunk = blob.mozSlice(start, end);
} else if ('slice' in blob) {
var chunk = blob.slice(start, end);
} else {
var chunk = blob.webkitSlice(start, end);
}
upload(chunk);
start = end;
end = start + BYTES_PER_CHUNK;
}
socket.send(JSON.stringify({
"cmd" : 2,
"data" : blob.name
}));
//self.postMessage(blob.name + " Uploaded Succesfully");
}
}
self.onmessage = function(e) {
for (var j = 0; j < e.data.files.length; j++)
files.push(e.data.files[j]);
//self.postMessage("Job size: "+files.length);
if (ready) {
process();
} else
openSocket();
}
Conveniently, JavaScript code communicates with workers also using messaging mechanism. When a user provides upload file(s) in browser, they get messaged to the worker. It processes first file in the batch. The process splits the files to slices (chunks), and send them over WebSocket one by one. A Command message with cmd = 2 gets send at the end. A command messages handler repost messages to main JavaScript notifying that upload of provided file is completed. The code stresses the browser when attempting to send many big files. Therefore, the code can be reorganized to send next file only when a message on upload completion is received. The modification is left as a reader exercise. A complete source code of the example is provided in appendix 1.
References
- W3C Candidate Recommendation 20 September 2012
- REST vs WebSocket
- WebSockets versus REST?
- REST vs WebSocket Comparison and Benchmarks
Appendix
A complete code of this examples and some others can be found here.
About the Author
Dmitriy Rogatkin is heading MetricStream, Inc. Labs division and is putting efforts to make GRC truly pervasive researching on technologies and areas of applications. Previously he created GRC applications platform for the company. He played also role of CTO for a couple Silicon Valley startups. He likes testing out different ideas through the creation of open source software ranging from multimedia desktop and mobile applications to frameworks and application servers. Amongst his projects, TJWS is a tiny application server, an alternative for when using a full Java EE profile application server is too much overhead. Dmitriy likes listen to high fidelity music at free time, and as a part of the hobby he created DSD music player as an open source project and then wrapped in an Android alternative named Kamerton.