Kelly Shortridge from Capsule8 talked at the Velocity conference in Berlin about how using chaos engineering can help to integrate Infosec within a DevOps culture. Shortridge discussed how distributed, immutable, and ephemeral infrastructure, or the D.I.E. triad, is an organizationally friendly way of building security by design. With this triad, users can continuously raise the cost of the attack.
The D.I.E. triad is about the ability to be resilient and recover effectively, whether in the face of threats to performance or security. It's a triad that promotes quality making systems more secure, helping Infosec to integrate with DevOps. Shortridge stressed that the infosec industry for decades had espoused this idea of building in security by design. The D.I.E. triad is an organizationally friendly way to do so because it supports the work the team does to have reliable systems.
D.I.E. is an acronym where D is for distributed, meaning that service outages, like a denial of service, are less impactful. I is for immutable, meaning that changes are more comfortable to detect in reverse. And E is for ephemeral, where users try to reduce the value of assets as close to zero from the attackers' perspective. These system properties are what chaos security principles will help to build secure systems by design. Starting with the expectation that security controls will fail, and organizations must prepare accordingly. Then, embrace the ability to respond to security incidents instead of avoiding them.
Shortridge recommended using game days to practice potentially risk scenarios in a safe environment. Moreover, she recommends using production-like environments to have a better understanding of how things will work in a complex system. Also, Shortridge recommends starting with simple testing before moving on to more sophisticated testing. For instance, build tests that users can run effectively with accessible scenarios, something like phishing or SQL injections.
When talking about distributed systems, Shortridge mentioned that multi-region services are a way of misleading attackers. With load balancing in place, teams can rapidly redeploy services, and can change the composition of how services look and where they're set up; for instance, shuffling IP blocks and making them different regularly. Or, if using a service mesh, configure the mesh so that attackers are forced to escalate privileges, like the IP tables layer, to access and modify access control capabilities. The net result is to change the lateral movement game for attackers, how they move from resource to resource.
Then, Shortridge talked about how to continue applying chaos security principles with immutable infrastructure. Attackers can’t reliably store data for exfiltration on a local disk, as the disks are ephemeral and will be periodically destroyed and be replaced like a phoenix, as Martin Fowler puts it.
Immutable systems restrict the ability for teams to write or modify systems in any way. Ensuring immutability involves testing for unauthorized changes, then ensuring they're being detected and reversed. Users are either preemptively shutting down specific instances that are under attack (and these instances will then respawn at a different location), or are preemptively shutting down and re-initializing instances in order to mitigate potential performance problems (such as restarting an application that has a subtle memory leak).
Shortridge also stated that the infrastructure that could die at any moment is a nightmare for attackers because it generates a formidable level of uncertainty when they persist. For instance, completely restrict shell access to servers. If shell access is disabled, it's much harder for attackers to access or modify servers without being noisy in their operations.
Finally, Shortridge covered the ephemeral portion of the D.I.E. triad. Most security bugs are state related; if users get rid of the state, they get rid of the bugs and vulnerabilities. Ephemerality reduces the ability for attackers to persist in the system, and they don't rely on persistent storage, which makes the window of opportunity for the attacker to seize data minimal.
Chaos testing ephemerality can include checks that the system doesn't accept outdated resources anymore. For instance, a test can change API tokens to simulate the "sign out of all sessions" functionality in a browser. Then, by injecting old API tokens, users can confirm if the API is still accepting expired tokens. The result is to ensure the verification process is working and applications aren't expecting old tokens, which would defeat the point of ephemerality.
Shortridge closed by saying that chaos resilience represents a natural home for Infosec. For Infosec to evolve from a silo model to become embedded throughout the SDLC, responsibility and accountability have to be unified, just as dev and ops had to go through the same evolution.