A buffer overflow bug has caused a small number of requests to Cloudflare proxies to leak data from unrelated requests, including potentially sensitive data such as passwords and other secrets. The issue, which has been named ‘Cloudbleed’, was discovered and documented by Google Project Zero vulnerability researcher Tavis Ormandy. After applying fixes and attempting to clean search engine caches, Cloudflare’s John Graham-Cumming provided a detailed explanatory blog post. Despite some sensitive data being leaked, Cloudflare’s Founder and CEO Matthew Prince tweeted ‘I think we largely dodged a bullet on the actual impact’.
In the explanatory blog post, and their CEO Matthew Prince’s email to customers, Cloudflare took pains to highlight that SSL private keys could not be compromised because they’re used in separate isolated instances of Nginx. The leak was caused by a combination of Nginx plugins that Cloudflare uses to handle customer’s requests. The introduction of a newer plugin caused a previously latent issue in an older plugin to become exposed for a small subset of requests that combined certain features with improperly formatted HTML.
The issue is estimated to have impacted 1 in every 3,300,000 HTTP requests through the Cloudflare proxies between 13 and 18 February, 2017. Those odds were compared to ‘winning the lotto’ by security expert Troy Hunt in his post ‘Pragmatic Thoughts on #CloudBleed’, though the problem here is twofold: firstly, the sites doing the leaking are innocent victims of the sites causing the leak bug to be activated, and secondly, there’s no way to ‘check your ticket’ on whether you’ve been a leak victim or not (which applies equally to sites using Cloudflare and their visitors). Troy makes a good point that those visiting and hosting public sites like blogs have nothing to be concerned about, but it’s not the same for sites with sensitive data such as dating and banking. AgileBits took pains to point out that 1-Password was safe, whilst Monzo pointed out that the issue only potentially impacted a small number of developer API users.
Cloudflare held back on notification whilst they worked with search engines to purge cached data - tracking down 770 request URIs from 161 unique domains. This was done to prevent exploitation of private data that would have found its way into those search engine caches. Despite Cloudflare and Google working together from the beginning, it seems that there was some tension on this issue, and that the purges may not have been as quick and complete as initially hoped.
Graham-Cumming’s blog post suggests that their after incident actions were very much guided by comprehensive logging that allowed Cloudflare to identify the scope and scale of the issue and target remedial action. Prince’s view that they dodged a bullet might well be true not only for Cloudflare and their customers, but also web users at large. Cloudflare is estimated to serve greater than 10% of all web traffic, and most of us won’t get through a day without using something behind Cloudflare. If the bug had been less obscure, then it would surely have been noticed sooner, but might also have caused far greater damage to Cloudflare and its customers. The risks that this raises of using services like Cloudflare that act as a ‘man in the middle’ between web users and the services they access, have to be balanced against the alternatives - with herd immunity comes the potential of herd contagion. The other side of this coin is that very few organisations would be able to respond to such a failure in their own infrastructure as quickly and thoroughly as Cloudflare did; things like their use of global feature flags allowed them to rapidly neuter parts of the stack that were causing issues.
A common question after issues like this is, ‘should I change my passwords?’, and the (risk averse) consensus is yes - better safe than sorry. On balance though, the chances of any one person’s secrets being leaked in an exploitable manner are very low (and likely much lower than the chance of the same secrets being leaked due to malware or some other means). Cloudflare and the broader web security community will have learned a lot from this; but we can be sure that something similar will happen again whilst unsafe languages like C continue to be used in security critical infrastructure (and yes, goto was there in the middle of things once again).