Envato, a CDN provider, migrated their edge network providers to unify their Distributed Denial of Service (DDoS) and Web Application Firewall (WAF) systems onto one provider. An automated test-based approach to making infrastructure changes combined with monitoring and continuous feedback for regressions helped them make the move without any downtime.
Envato provides content delivery network (CDN) services. Like other CDN providers, they have a network of edge servers. An edge network is used to serve content to users from a location that is geographically closer to them. Edge servers are proxy caches that serve content after fetching it from the origin servers where the content is actually hosted. Most CDNs, including vendors like Envato, have security measures built into the edge network to guard against DDoS and other attacks.
Envato’s edge network servers perform DDoS scrubbing and also function as a WAF. This ensures mitigation of security issues at the edge instead of at the origin infrastructure. It isolates the origin from having to deal with such attacks. Such WAFs are usually rule-based, which are derived by analyzing past requests. The response to possible attacks is to block, slow down such requests, or enforce human intervention like a captcha to be able to get through. Similar techniques are used by other vendors like CloudFlare, Akamai and Fastly. DDoS scrubbing involves the detection and mitigation of possible DDoS traffic. Incidentally, worldwide DDoS and WAF attacks have grown by 14% and 10% respectively over the past one year according to Akamai’s latest State of the Internet Security Report (PDF).
Envato’s previous system had both these systems daisy-chained, where user requests used to traverse the DDoS scrubbing first, followed by the WAF, before hitting the origin.
Image Courtesy : https://webuild.envato.com/blog/migrating-edge-providers/
Debugging, the rollout of new deployments, differences in automation and the inability to put a unified alerting in place for both systems drove the need to move to a single system where both the DDoS scrubber and WAF could exist. The two systems had differences in terms of generated request ids, and the way they alerted on potential security issues. This made debugging harder. Due to differences in the underlying service providers, infrastructure automation was possible for only some of the systems. Maintaining compatibility was hard and led to extra effort when new changes were rolled out.
The migration had to be achieved without downtime for end users. The team's strategy was to prepare a test suite, roll out changes to a limited set of users backed by tests, and then do a full rollout after fixing issues noticed during the partial rollouts. The idea of a test suite was to prevent regressions in functioning infrastructure, just like it's done for application code. RSpec, a Ruby-based existing framework used for Behaviour Driven Development (BDD), was already in use in the engineering teams at Envato. The team wrote a library called HttpSpec which could handle HTTP interactions (request/response) and run within the ambit of RSpec. They also then adopted Spotify’s open source library rspec-dns. There is a related project called ServerSpec which allows for declarative testing of server configuration by writing RSpec tests. An in-house tool for transforming API call responses between providers eased the migration from one provider to another.
Finally, changes were rolled out by prioritizing them by traffic and possible customer impact. A constant cycle of automated tests kept up the confidence levels.