I’m Tyler Close, I’m currently an engineer at Google, working on Application Security. Before that I was at HP Labs, doing security research and I’ve been working on web security for 15 years now.
All of the major attacks that web applications or developers are commonly familiar with today result from trying to apply the wrong security models for the web. Most security work is done still thinking in terms of the client server security model where a client is assumed to only be talking to one server and is not exchanging messages with other clients and so he only talks to that one server. When additional message channels come into play, you get these other attacks. For example in Cross-Site Request Forgery, the way that works is the user clicks on a link from one site (the attack site) and what gets sent back is an HTML form that points to a resource hosted by a victim server. When the user clicks to submit that form or when JavaScript in the attack page automatically submits it, the post request that gets sent there by the browser to the victim server is almost entirely under the control of the attack server, because it wrote the HTML form that’s submitting the request. But the browser is automatically attaching the user’s cookies for the victim server to that request.
So the victim server receives a request with all the proper credentials on it that looks like a legitimate request, but that was formed by the attacker and automatically submitted it without the user’s knowledge. The victim resource gets modified in some unintended way: deleting your account or spending money at your bank or something like that.
People have given all sorts of different labels to all the different attacks. Sometimes they’ve given them different labels that don’t always make sense, but in the case of Cross-Site Request Forgery, it’s very similar to another attack called "Clickjacking". The main difference there is that the attack page that comes down instead of submitting a post to the victim server, it creates an iframe on the victim resource on the victim server, such that when the user clicks, they are clicking without realizing on a dangerous button that’s hosted by that victim server, deleting their account or spending money or something like that. Both of those attacks you can tell they are similar, so that the Clickjacking is done by doing a "Get" request to put an iframe in an unexpected position and Cross-Site Request Forgery is doing a "Post" request by putting in a form in an unexpected place.
Both of those attacks are particular examples of a kind of attack that’s been known in the research literature for a couple of decades now, called "Confused Deputy". There is a paper done in 1988 by Norm Hardy on the Confused Deputy and it basically explained exactly the same pattern of message exchange, but in the context of compilers running on a Unix operating system. It’s even a very analogous scenario: you can think of today’s browsers as being very much like the compiler in that original paper, taking in commands, processing the input and firing off additional commands as a result. The Confused Deputy category of attacks in particular is a great example of the kind of thing that people won’t expect in when they are thinking solely in terms of a client server system because it makes use of an additional message channel to this other server.
When the programmers were writing the victim server, they weren’t thinking about messages being generated by other servers. They thought "I’m creating the UI to my bank account and it’s my code running in the browser that will be submitting the request back to my server like a traditional client-server application. The idea that there could be other servers or other people also contributing to those message exchanges just never enters into the thinking. An awful lot of the vulnerabilities that we have in web applications today are the result of that: just failing to think about the additional message channels that we have in the browser, the fact that the browser can open up connections to any server in the world, the user can receive email from anyone in the world. Failing to think about how those messages are going to influence the messages that your application receives is the main cause of most bugs.
The very nature of social computing is making use of those new data channels that we didn’t have in the client server world. In a client-server application I’ve got a client that’s hard-coded to only know to talk to one server and that’s all it ever does, but in a social web, we take advantage of the fact that the browser can open up connections to all sorts of different servers and can receive messages like tweets and emails and even napkins from other users to get all sorts of other data channels involved in the application. The idea is that we can make the application richer by integrating all those other data stores as they are available. So it’s harder to secure because, if what you’ve got is a traditional client-server application like what you think of is a prototypical banking application, the way you secure that is by doing your best to turn off all of those additional channels. You try to prevent messages from other websites from interfering with your banking operation.
You try to tell your users not to click on links in messages, like emails, because that’s only an attack factor; it doesn’t contribute to the functionality of your application, it’s only an attack. But in a social application there is also work being done over those same channels. It’s often legitimate and part of the intended purpose of the application that I get a message from another user that contains a link in it and I’m supposed to be able to click on it and do something useful. Now you’ve got a new problem where you can’t just blindly reject all of these messages that came from the other channels that were available. Now you have to pick out the good ones from those channels and decide they are legitimate ones that need to be processed. It’s a more difficult challenge than a client-server application.
That’s a perfect example of additional channels that are available on these social web applications. In client-server applications the entire UI is under the control of the one application. A mashup is taking UI from many different applications and putting them all together in the same presentation. That’s not a challenge that you have to address in a client-server application, so all of your work goes into preventing that from happening. Now you have to keep the bad guys out but let the good guys in.
I think the way that whole issue turned out with ECMAScript 5 being the standard that was in the end agreed upon by everyone was really the best way it could have turned out. It’s a small change to the language as the programmer experiences it, but as we’ve seen in some of the talks today, it’s very easy to turn that into a secure programming language that makes it much easier to do things like mashups.
I wasn’t involved in the design for that work, I was one of the security reviewers for it. What is does is it finds all the places in JavaScript where references to dangerous objects maybe unintentionally or accidentally lead to a part of the program that should not have access to them. It then limits the language that you program in such that those pitfalls aren’t there anymore. The initial Caja effort was to build a translator for JavaScript; it would take in an arbitrary slightly limited syntax, but mostly plain JavaScript and perform a translation on it and output a program that had the same semantics as the original one, but had been verified and had some additional one-time checks inserted to ensure that the pitfalls hadn’t been triggered.
Unfortunately that turned out to make a very slow program and also working with translating code is often very difficult in the development process because when you hit a bug in your code, you are not looking at your own code any more, you are looking at the output of a translator. One of the major emphasis for the ECMAScript 5 work was to make a strict mode of the language that is enabled by putting that strict declaration at the start of your code that Mark showed. That strips away all of the really difficult pitfalls that necessitated the translation approach. Now, just a very simple initialization at the start of your script can turn ECMAScript 5 into a secure language to do mashups in. I think that definitely was a victory for technology there.
The team of researches that did this security review for Caja also included a member from Facebook who was I think one of the main designers for FBJS and so I was able to learn a little bit about it during the security review, but that’s basically the extent of my knowledge there.
I think there are many things that are making HTML 5 a challenge from the security perspective. One is that a number of the features are still being designed in this client-server mindset where for example the core specification is a good example of that. I think there are a number of pitfalls in there that are very similar to the bugs that people are experiencing now. So there will be again variants of the Cross-Site Request Forgery attack that is well-known are also going to be present in applications that use cores. There is a perspective problem on some of the APIs and then there is also just a sheer question of volume.
I know that there are other security researchers who are still just now getting their hands around important parts of that specification and finding subtle issues with them, but that work hasn’t even been meanwhile known yet just because the security community can’t really keep up with the pace of API innovation that is taking place with HTML 5.
10. Do you think it will take some time before the spec that comes out is secure enough?
I’m confident we’ll be finding problems with HTML 5 for many years to come.
I think the new patch deployment systems that all the modern browsers have deployed are much better than the old system of having users downloading new versions of the browsers. Some of them can put out updates very quickly and get them into the user base. I’m less familiar with that aspect of it, but from talking to people who are, it seems that things have gotten a lot better on the browser side in terms of deploying security patches. There is no technical reason why a plug-in should have a better patch deployment mechanism than the browser should.
12. Now that I think of it, plug-ins have a bad record.
Some of them have a particularly bad record. Some of them have left important vulnerabilities wide open for extended periods of time.
I don’t want to comment on what bugs we may find in individual features because we have yet to discover those so we can’t say for certain we’ll find them. But there are definitely APIs in defining HTML5 that are going to have security problems. I am worried actually about how people will react to the need to plug those holes. You could find browser vendors needing to turn off large parts of APIs because they don’t have the granularity needed to put out a quick remediation for a particular problem.
The one that I’ve studied most in depth is the cross-origin messaging and there are a number of features in that spec that make it easy to shoot yourself in the foot and some of them which cannot be used securely. I’m expecting that we’ll be finding vulnerable applications in that area for quite some time, so developers look in to use that. There is actually a small subset of the application that is even usable across all browsers. To date, Internet Explorer has only implemented a small subset of that spec, so by sticking to that subset you can avoid some of the less mature parts of it.
I think we need to define what the browser security model is because there really is no definition for it today. The way it works today is you look at an API and you try to find some unexpected way of using it and then you try to construct a sexy story around how that unexpected use might come in to play. Based on how sexy a story you can make, people decide if it’s a security problem or not. That’s a hard way to work it would be better if we had a model of "This is the security model we have for the browser and all APIs conform to it." If they violate that model then we say that’s a security problem, even if we can’t come up with a sexy story at the time because in the fullness of time, attackers will find that sexy story, but coming up with it before the browser’s next major release in the next three weeks is sometimes a difficult challenge. I think it would be wonderful if the browser vendors and the W3C actually wrote down what the security model is and then started making all the APIs conform to that.