Introduction
The recent groundswell of interest in the REST style of application architecture has highlighted the elegant design of the web. We are now beginning to understand the inherent scalability and resilience behind the "Architecture of the World Wide Web", and are exploring ways to further embrace it's paradigms. In this article, we will explore one of the lessor known facilities available to web application developers, the humble "ETag Response Header", and how to integrate its use in a Spring Framework dynamic web application to improve application performance and scalability.The Spring Framework application we will be using is based on the "petclinic" application. The download includes instructions on how to add the necessary configuration and source code so that you can try it out on your own.
What is an "ETag"?
The HTTP protocol specification defines an ETag as the "entity value for the requested variant" (see http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html - Section 14.19.) Another way of saying this is that the ETag is a token that can be associated with web resource. The web resource is typically a web page, but could also be a JSON or XML document. The server is solely responsible for figuring out what the token is and means, and transfers it to the client in the HTTP Response Header.How can ETags help improve performance?
ETags are used in conjunction with the "If-None-Match" header on a GET request by savvy server developers to take advantage of the client's (e.g. browser) cache. Because the server generated the ETag in the first place, it can use it later to determine if the page has changed. Essentially, the client asks the server to validate it's cache by passing the token back to the server.
The process looks like this:
- Client requests a page (A).
- Server sends back page A, plus an ETag for A.
- Client renders the page then caches it, along with the ETag.
- Client requests page A again, passing along the ETag it got back from the server the last time it made the request.
- Server examines the ETag and determines that the page hasn't changed since last time the client requested it, so sends back a response of 304 (Not Modified) with an empty body.
The remainder of the article will present two approaches that take advantage of ETags in a web application built on the Spring Framework using Spring MVC. First we will use a Servlet 2.3 Filter to apply an ETag generated using an MD5 checksum of the rendered view (a "shallow" ETag implementation). The second approach uses a more sophisticated method to track changes in the model used by the view to determine ETag validity (a "deep ETag" implementation). Although we are using Spring MVC, the techniques apply to any MVC style web framework.
Before we go on, it is important to note here that the techniques being presented here are intended to improve the performance of dynamically generated pages. Existing optimization techniques should also be considered as part of a holistic optimization and tuning analysis of your application's performance profile (see sidebar).
Web caching top to bottom
This article deals primarily with using HTTP caching technology for dynamically generated pages. When looking at improving the performance of a web application, a holistic, top-to-bottom approach should be taken. To this end, it is important to understand the layers that a HTTP request goes through, and apply the appropriate technology depending on where you see hot spots. For example:
- Apache can be used in front of your servlet container to handle static files such as images and javascript, and can also create ETag response headers using the FileETag directive.
- Use optimization techniques for javascript files, such as combining the files into one file and compressing whitespace.
- Utilize GZip and Cache-Control headers.
- To help determine where your pain points are in your Spring Framework application, consider using the JamonPerformanceMonitorInterceptor.
- Make sure you fully take advantage of the ORM tool's caching mechanism so that objects are not continually being re-constituted from the database. It is worth the time to figure out how to get query caching working for you.
- Ensure that you minimize the amount of data retrieved from the database, especially with large lists. Large lists should be traversed a page at a time, with each page requesting a small subset of the larger list.
- Minimize what goes into the HTTP session. This frees up memory, and will help when the time comes to cluster the application tier.
- Use a database profiling tool to see what indexes are being used when querying, and that entire tables are not being locked when doing updates.
Of course, the golden adage of performance optimization applies: measure twice, cut once. Oh wait, that is for carpentry, but nonetheless it works here as well!
A Content Body ETag Filter
The first approach we will look at is to create a Servlet Filter that will generate its ETag token based on the content of the page - the "View" in MVC. At first glance, any performance gains using this approach may seem counter-intuitive. We still have to generate the page, and have added computation cycles to generate the token. However, the idea here is to reduce bandwidth utilization. This is particularly beneficial in large latency situations such as when your host and client are on separate sides of the planet. I have seen latency of up to 350 ms with a server in NYC hosting an application used by the Tokyo office. Depending on the number of concurrent users, this can become a significant bottleneck.
The Code
The technique we use to generate the token is based on computing an MD5 hash from the content of the page. This is done by creating a wrapper around the response. The wrapper uses a byte array to hold the generated content and after the filter chain processing completes we compute the token using an MD5 hash of the array.
The implementation of the doFilter method is shown below.
public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException,Listing 1: ETagContentFilter.doFilter
ServletException {
HttpServletRequest servletRequest = (HttpServletRequest) req;
HttpServletResponse servletResponse = (HttpServletResponse) res;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ETagResponseWrapper wrappedResponse = new ETagResponseWrapper(servletResponse, baos);
chain.doFilter(servletRequest, wrappedResponse);
byte[] bytes = baos.toByteArray();
String token = '"' + ETagComputeUtils.getMd5Digest(bytes) + '"';
servletResponse.setHeader("ETag", token); // always store the ETag in the header
String previousToken = servletRequest.getHeader("If-None-Match");
if (previousToken != null && previousToken.equals(token)) { // compare previous token with current one
logger.debug("ETag match: returning 304 Not Modified");
servletResponse.sendError(HttpServletResponse.SC_NOT_MODIFIED);
// use the same date we sent when we created the ETag the first time through
servletResponse.setHeader("Last-Modified", servletRequest.getHeader("If-Modified-Since"));
} else { // first time through - set last modified time to now
Calendar cal = Calendar.getInstance();
cal.set(Calendar.MILLISECOND, 0);
Date lastModified = cal.getTime();
servletResponse.setDateHeader("Last-Modified", lastModified.getTime());
logger.debug("Writing body content");
servletResponse.setContentLength(bytes.length);
ServletOutputStream sos = servletResponse.getOutputStream();
sos.write(bytes);
sos.flush();
sos.close();
}
}
You will notice that we also set the Last-Modified header. This is considered good form for server generated content as it caters for clients that don't understand ETag headers.
The sample code uses a utility class EtagComputeUtils to generate a byte array representation of an object and to handle the MD 5 digest logic. I have used a javax.security MessageDigest to compute the MD 5 hash code.
public static byte[] serialize(Object obj) throws IOException {Listing 2: ETagComputeUtils
byte[] byteArray = null;
ByteArrayOutputStream baos = null;
ObjectOutputStream out = null;
try {
// These objects are closed in the finally.
baos = new ByteArrayOutputStream();
out = new ObjectOutputStream(baos);
out.writeObject(obj);
byteArray = baos.toByteArray();
} finally {
if (out != null) {
out.close();
}
}
return byteArray;
}
public static String getMd5Digest(byte[] bytes) {
MessageDigest md;
try {
md = MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException("MD5 cryptographic algorithm is not available.", e);
}
byte[] messageDigest = md.digest(bytes);
BigInteger number = new BigInteger(1, messageDigest);
// prepend a zero to get a "proper" MD5 hash value
StringBuffer sb = new StringBuffer('0');
sb.append(number.toString(16));
return sb.toString();
}
Installing the filter in web.xml is straightforward.
<filter>Listing 3: Configuration of the filter in web.xml.
<filter-name>ETag Content Filter</filter-name>
<filter-class>org.springframework.samples.petclinic.web.ETagContentFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>ETag Content Filter</filter-name>
<url-pattern>/*.htm</url-pattern>
</filter-mapping>
Every .htm file will be filtered using the EtagContentFilter, which will return an empty body HTTP response if the page has not changed since the client last requested it.
The approach we have shown here is useful for certain types of pages. However, there are a couple of disadvantages:
- We are computing the ETag value after the page has been rendered on the server, but before sending it back to the client. If there is a ETag match, there really is no need to pull in the data for the model as the rendered page will not be sent to the client.
- For pages that do things like render the date and time in a footer, each page will be different, even if the content hasn't actually changed.
In the next section, we will look at an alternative approach to the problem that overcomes some of these limitations by understanding more about the underlying data used to build the page.
An ETag Interceptor
The Spring MVC HTTP Request processing pipeline includes the ability to plug in an Interceptor before a controller has the chance to process the request. This is an ideal place to apply our ETag comparison logic so that if we find the data that is used to build a page hasn't changed we can avoid further processing.
The trick here is how do you know if the data that makes up the page has changed? For the purposes of this article, I created a simple ModifiedObjectTracker that keeps track of insert, update and delete operations via Hibernate event listeners. The tracker keeps a unique number for each view in the application, and a map of what Hibernate entities impact each view. Whenever a POJO is changed a counter is incremented for the views that the entity is used in. We use the count as the ETag, so when the client sends it back we know if one of the objects behind the page has been modified.
The Code
We will start with ModifiedObjectTracker:
public interface ModifiedObjectTracker {
void notifyModified(> String entity);
}
Simple enough right? The implementation is a bit more interesting. Any time an entity is changed, we update a counter for each view that is affected by the change:
public void notifyModified(String entity) {
// entityViewMap is a map of entity -> list of view names
Listviews = getEntityViewMap().get(entity);
if (views == null) {
return; // no views are configured for this entity
}
synchronized (counts) {
for (String view : views) {
Integer count = counts.get(view);
counts.put(view, ++count);
}
}
}
A "change" is an insert, update or delete. Here is the listing of the handler for delete operations (configured as an event listener on the Hibernate 3 LocalSessionFactoryBean):
public class DeleteHandler extends DefaultDeleteEventListener {
private ModifiedObjectTracker tracker;
public void onDelete(DeleteEvent event) throws HibernateException {
getModifiedObjectTracker().notifyModified(event.getEntityName());
}
public ModifiedObjectTracker getModifiedObjectTracker() {
return tracker;
}
public void setModifiedObjectTracker(ModifiedObjectTracker tracker) {
this.tracker = tracker;
}
}
The ModifiedObjectTracker is injected into the DeleteHandler via Spring Configuration. There is also a SaveOrUpdateHandler that deals with new and updated POJOs.
If the client sends back a currently valid Etag (meaning our content hasn't changed since the last request), we will want to prevent further processing in order to realize our performance gain. In Spring MVC, we can use a HandlerInterceptorAdaptor and override the preHandle method:
public final boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws
ServletException, IOException {
String method = request.getMethod();
if (!"GET".equals(method))
return true;
String previousToken = request.getHeader("If-None-Match");
String token = getTokenFactory().getToken(request);
// compare previous token with current one
if ((token != null) && (previousToken != null && previousToken.equals('"' + token + '"'))) {
response.sendError(HttpServletResponse.SC_NOT_MODIFIED);
// re-use original last modified timestamp
response.setHeader("Last-Modified", request.getHeader("If-Modified-Since"))
return false; // no further processing required
}
// set header for the next time the client calls
if (token != null) {
response.setHeader("ETag", '"' + token + '"');
// first time through - set last modified time to now
Calendar cal = Calendar.getInstance();
cal.set(Calendar.MILLISECOND, 0);
Date lastModified = cal.getTime();
response.setDateHeader("Last-Modified", lastModified.getTime());
}
return true;
}
We first make sure we are dealing with a GET request (ETag in conjunction with PUT can be used to detect conflicting updates, but that is beyond the scope of this article.) . If the token matches the last one we sent back, we return a 304 Not Modified and bypass the rest of the request processing chain. Otherwise, we set the ETag response header in preparation for the next client request.
You will notice I have abstracted the logic for generating the token to an interface so that different implementations can be plugged in. The interface has one method:
public interface ETagTokenFactory {
String getToken(HttpServletRequest request);
}
To minimize code listings, my simple implementation of SampleTokenFactory also plays the role of ETagTokenFactory. In this case, we generate the token by simply returning the modified count for the request URI:
public String getToken(HttpServletRequest request) {
String view = request.getRequestURI();
Integer count = counts.get(view);
if (count == null) {
return null;
}
return count.toString();
}
That is it!
The Conversation
At this point, our interceptor will prevent any cycles being spent on gathering data or rendering a view if nothing has changed. Now, let's take a look at the HTTP headers (courtesy of LiveHTTPHeaders) and see what is happening under the covers. The download includes instructions for configuring the interceptor so that owner.htm is "ETag enabled".
The first request we make shows that this user has already looked at this page:
----------------------------------------------------------
http://localhost:8080/petclinic/owner.htm?ownerId=10
GET /petclinic/owner.htm?ownerId=10 HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: JSESSIONID=13D2E0CB63897F4EDB56639E46D2BBD8
X-lori-time-1: 1182364348062
If-Modified-Since: Wed, 20 Jun 2007 18:29:03 GMT
If-None-Match: "-1"
HTTP/1.x 304 Not Modified
Server: Apache-Coyote/1.1
Date: Wed, 20 Jun 2007 18:32:30 GMT
We should now make a change and see if the ETag changes. We add a pet to this owner:
----------------------------------------------------------
http://localhost:8080/petclinic/addPet.htm?ownerId=10
GET /petclinic/addPet.htm?ownerId=10 HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost:8080/petclinic/owner.htm?ownerId=10
Cookie: JSESSIONID=13D2E0CB63897F4EDB56639E46D2BBD8
X-lori-time-1: 1182364356265
HTTP/1.x 200 OK
Server: Apache-Coyote/1.1
Pragma: No-cache
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: no-cache, no-store
Content-Type: text/html;charset=ISO-8859-1
Content-Language: en-US
Content-Length: 2174
Date: Wed, 20 Jun 2007 18:32:57 GMT
----------------------------------------------------------
http://localhost:8080/petclinic/addPet.htm?ownerId=10
POST /petclinic/addPet.htm?ownerId=10 HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost:8080/petclinic/addPet.htm?ownerId=10
Cookie: JSESSIONID=13D2E0CB63897F4EDB56639E46D2BBD8
X-lori-time-1: 1182364402968
Content-Type: application/x-www-form-urlencoded
Content-Length: 40
name=Noddy&birthDate=1000-11-11&typeId=5
HTTP/1.x 302 Moved Temporarily
Server: Apache-Coyote/1.1
Pragma: No-cache
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: no-cache, no-store
Location: http://localhost:8080/petclinic/owner.htm?ownerId=10
Content-Language: en-US
Content-Length: 0
Date: Wed, 20 Jun 2007 18:33:23 GMT
Because we did not configure any ETag awareness to addPet.htm, no headers are set. Now, we ask for owner 10 again. Notice the ETag is now 1:
----------------------------------------------------------
http://localhost:8080/petclinic/owner.htm?ownerId=10
GET /petclinic/owner.htm?ownerId=10 HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost:8080/petclinic/addPet.htm?ownerId=10
Cookie: JSESSIONID=13D2E0CB63897F4EDB56639E46D2BBD8
X-lori-time-1: 1182364403109
If-Modified-Since: Wed, 20 Jun 2007 18:29:03 GMT
If-None-Match: "-1"
HTTP/1.x 200 OK
Server: Apache-Coyote/1.1
Etag: "1"
Last-Modified: Wed, 20 Jun 2007 18:33:36 GMT
Content-Type: text/html;charset=ISO-8859-1
Content-Language: en-US
Content-Length: 4317
Date: Wed, 20 Jun 2007 18:33:45 GMT
Finally, we ask for owner 10 again. This time our ETag kicks in and we get a 304 Not Modified:
----------------------------------------------------------
http://localhost:8080/petclinic/owner.htm?ownerId=10
GET /petclinic/owner.htm?ownerId=10 HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: JSESSIONID=13D2E0CB63897F4EDB56639E46D2BBD8
X-lori-time-1: 1182364493500
If-Modified-Since: Wed, 20 Jun 2007 18:33:36 GMT
If-None-Match: "1"
HTTP/1.x 304 Not Modified
Server: Apache-Coyote/1.1
Date: Wed, 20 Jun 2007 18:34:55 GMT
We have saved bandwidth and computation cycles by taking advantage of HTTP caching!
The Fine Print: In practice, we can achieve greater efficiencies by tracking object changes at a more granular level, using the object id for example. However, the idea of correlating modified objects to views is highly dependent on the overall design of the data model used in the application. This implementation (of ModifiedObjectTracker) is illustrative and is intended to provide ideas for further exploration. It is not intended to be used in a production environments (it would not be suitable to use in a cluster for example). One option for further consideration would tracking modifications using database triggers, and having the interceptor access the table the triggers write to.
Conclusion
We have looked at two approaches to reducing bandwidth and computation using ETags. My hope is that this article has provided you with food for thought for your current and future web based projects, and an appreciation for the under utilized ETag response header.
As Isaac Newton famously said: "If I have seen further, it is by standing on the shoulders of giants." At its core, REST style applications are about simplicity, good software design, and not reinventing the wheel. I believe the growing use and awareness of REST style architectural principals for web based applications is a good move for mainstream application development, and I am looking forward to leveraging it further in my future projects.
About the Author
Gavin Terrill is Chief Technology Officer at BPS Inc. Gavin has been developing software for over 20 years specializing in Enterprise Java applications, yet still refuses to throw out his TRS-80. In his spare time Gavin enjoys sailing, fishing, playing guitar and quaffing quality red wine (not necessarily in that order).
Thanks
I would like to thank my colleagues Patrick Bourke and Erick Dovale for their help in providing feedback for this article.
The code and the instructions can be downloaded HERE.