BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Podcasts Guy Podjarny on OSS Security, Serverless, and the Equifax Hack

Guy Podjarny on OSS Security, Serverless, and the Equifax Hack

In this podcast Wes talks to Guy Podjarny, founder and CEO of Snyk. The two discuss the space between open source software and third-party dependencies, including a discussion of the Equifax hack (and what we can learn from it), the role of serverless architectures today (and what it means to application surface area), and then finally they wrap with security hygiene best practices with OSS and serverless.

Key Takeaways

  • The majority of security vulnerabilities that exist in applications today come from vulnerable third-party libraries, rather than the application’s own code.
  • An application shouldn’t permit total leak of all data because of a single vulnerability - defence in depth is important.
  • Equifax couldn’t have failed more spectacularly in the way they handled it.
  • The Equifax hack serves as a wake-up call to pay attention to vulnerabilities in dependencies.
  • If your build system breaks the build when a dependency vulnerability is found automatically, it will be applied sooner.

Show Notes

Tell us a bit about the work you do at Snyk?

  • 3:00 At Snyk we help companies stay secure while using open source code.
  • 3:05 We try to tackle any risks of using open source software for mission critical systems.
  • 3:30 Specifically today, we focus on known vulnerabilities in open source libraries to build developer-friendly tools to highlight when vulnerable versions of libraries are used.

What kind of analysis do you use?

  • 4:15 We have a vulnerability pipeline that does analysis, such as listening to the GitHub firehose looking for issues being created.
  • 4:30 All of these feeds go through our security team for manual vetting, and are ingested into our database that end users consume via our tools.
  • 4:45 They don’t do static analysis of your code; they just look at the dependencies that you are using through the manifest of dependencies in CI tools.
  • 5:10 The tools can summarise the vulnerabilities and offer fixups that will allow the incorrect versions to be upgraded.

So it’s looking at the dependencies?

  • 5:30 It’s looking at the threat model of using insecure dependencies.
  • 5:40 It’s much easier for attackers to exploit a known endpoint - like when you don’t patch your servers or operating system.
  • 6:00 The same premise happens for applications, but it’s less conscious as there isn’t a repeated nag dialog to upgrade.
  • 18:30 A single vulnerability can have many victims, especially since the vulnerability is often found.
  • 6:25 Applications are more assembled than built these days.
  • 6:40 The bigger security vulnerabilities that exist in applications today come from vulnerable third-party libraries, rather than the application’s own code.
  • 7:30 I used to build black-box testing tools for AppScan, which did static analysis of source code, but leads to false positives.
  • 8:10 With dependencies, if a versioned dependency has a known issue, and the code depends on that known bad version, it’s not going to give a false positive.

With regards to the Equifax hack, can you explain a little more about what happened?

  • 9:25 It’s uncommon that a breach of that magnitude comes down to such a clear-cut single source of attack.
  • 9:40 It was a trivially exploited remote execution vulnerability in Struts 2.
  • 9:50 You can see a demo at https://github.com/snyk/java-goof which shows how to identify the vulnerability used.
  • 10:15 The vulnerability is severe but not unique; there are exploits that get released frequently.
  • 10:30 A remotely exploitable vulnerability in a popular open source library happens a few times per year.
  • 10:40 The timeline was that the vulnerability in Struts 2 was identified in March 2017, caused a lot of noise, and although Equifax was aware of it they missed this particular application.
  • 11:05 Some time between four days after and two months after they were hacked, they found out about it in July 2017 - although the reports didn’t occur until September 2017.
  • 12:00 It’s a loss of very personal data - dates of birth, social security numbers, previous addresses - and it impacted half of America, along with tendrils into Canada and other countries.
  • 12:20 The way that Equifax handled it wasn’t great - and even if you can exploit one vulnerability, it shouldn’t be sufficient to be able to access 140 million records.
  • 12:30 They went on to handle poorly everything afterwards, from standing up a phishing site for ascertaining whether you had been breached, to blame on one person.
  • 12:55 They couldn’t have failed more spectacularly in the way they handled it.

What lessons can be learned from this hack?

  • 13:15 The most immediate one has to do with open source libraries.
  • 13:20 If you have a really good system for patching your servers and doing that on a regular basis, it wouldn’t have saved you.
  • 13:25 You have to understand the type of threat exists in the application layer as well.
  • 13:40 The libraries that are used in an application from part of its threat model, but aren’t often considered as part of the code.
  • 14:00 People don’t tend to like upgrading if there’s no need.
  • 14:15 The hack serves as a wake-up call to pay attention to vulnerabilities in dependencies.

How can this kind of hack be avoided?

  • 14:20 One of the key components is to avoid using vulnerable libraries.
  • 14:30 How can it be solved at scale?
  • 14:40 I don’t like the blame game - there’s no benefit in pointing fingers.
  • 14:45 I wrote a blog post on the subject at https://snyk.io/blog/equifax-breach-vulnerable-open-source-libraries/
  • 14:50 It seems trivial to ask why that library wasn’t upgraded, but things that are easy to do for a single unit are hard to do at scale.
  • 15:05 Patching your application is easy; assess it, test it, and then patch it.
  • 15:10 Patching a thousand applications, or ten thousand, and doing that regularly and knowing which application is using which library - that’s a much harder problem.
  • 15:30 Wrangling applications and knowing what you have isn’t simple.
  • 15:35 Don’t just think about a single action - think about how you build systems to solve this at scale.
  • 15:55 An interesting anecdote - Equifax too ages to patch their system, but the attacks started happening very quickly.
  • 16:05 There are some statistics from Imperva and AlienVault that track these things, and show within hours of the security disclosure happening there were attacks in the wild.
  • 16:40 The defender has to block these issues all the time, but the attackers only have to succeed once.
  • 16:50 It’s about not identifying how to fix a single issue, but how you prevent these becoming issues at scale.

It’s not just about one language or one vulnerability?

  • 17:35 Absolutely - open source is everywhere.
  • 17:45 Today, the notion of a package manager and libraries are a core component of a programming language.
  • 17:50 Maven is one of the older platforms, and the stats on that are growing in leaps and bounds.
  • 18:00 NPM is the fastest growing package manager - over half a million different packages there.
  • 18:15 It’s also true for Go, Rust, Elixir … 
  • 18:20 People from other languages like to point a finger at NPM and suggest it’s ridiculous, because they have a 5-line library that’s downloaded 20 million times per month.
  • 18:30 You could argue if that’s a good thing; one the one hand, you have the richness at your fingertips, but it adds potential fragility.
  • 18:45 In the end it doesn’t matter what the size of the library is; it’s the dependence on the open source packages that is the common factor.
  • 19:00 They all have some kind of vulnerability potential, like the Heartbleed disclosure, or deserialisation vulnerabilities, denial of service calls etc.
  • 19:15 So Struts 2 (or NPM) shouldn’t be specifically blamed for issues like these; rather, it’s a general problem.
  • 19:25 Different ecosystems have different implementations which lead to different vulnerabilities, but fundamentally the vulnerabilities are similar to all.

How do you keep track of these issues?

  • 19:50 The answer is to use Snyk - we do a lot of work in this space!
  • 19:55 We really need to use tooling to solve this problem. It’s not a core competency for general developers.
  • 20:05 In the same way that you use RedHat or Canonical to get your operating system updates, you should use the appropriate tools for your environment.
  • 20:15 I’ve written a book for O’Reilly: “Securing Third Party Code” [http://shop.oreilly.com/product/0636920051695.do].
  • 20:25 At the high level: you need to use tools to look out for those vulnerabilities, and they need to be integrated into your existing development workflow.
  • 20:40 If you require your developers to go out to a different tool to do something, that they have to remember to do it each time, then they won’t do it regularly.
  • 20:45 On the hand, if you have a system where it will break the build and mandate that a fix will be made to move the code forwards, then it will be much more successful.
  • 20:50 So, find tools that supports your stack.
  • 21:00 There are existing tools; the OS dependency checker; RubySec; the NodeJS security checker are open source.
  • 21:05 They aren’t as good as the commercial tools in this space, but they are far better than doing nothing.
  • 21:10 You can use commercial tools (like Snyk) that support a broader set of ecosystems and have a broader database.
  • 21:20 Whatever tool you use, make sure that you can integrate it in your system smoothly.
  • 21:35 You need to keep in mind that the goal is not to find issues, it’s to fix them when they are found.
  • 21:45 Don’t look for the tool that alerts you to the issues fastest; you’re going to get a lot of e-mails anyway, and if you don’t end up filtering them then they are more likely to be missed.
  • 22:05 Think about your strategy for fixing in the four phases Find, Fix, Prevent and Respond.

What’s the value proposition for serverless computing?

  • 22:55 I’m a big fan of serverless computing.
  • 23:00 The word itself is annoying, however.
  • 23:05 I think of it as “server management”-less, because it frees you from managing the server itself.
  • 23:20 The value proposition for serverless is really about lower cost of ownership; and that lower cost manifests itself in people time, less machine time, lower operating costs - it’s a much more cost-effective way to run a system.

Are serverless environments more secure?

  • 24:05 Serverless reshuffles security priorities.
  • 24:10 There are a lot of great improvements from a security perspective.
  • 24:20 Amongst them is that you don’t need to patch your servers - and as we mentioned, it is hard to do at scale.
  • 24:35 There’s also some statistics that suggest the vast majority of successful exploits today happen because of known vulnerabilities in dependencies.
  • 24:45 You need to trust your operator to do it, but it means you have one less thing to worry about.
  • 24:55 It reduces the potential for a consistently compromised server; when you think about how attacks happen today, they don’t happen in one go - they happen in multiple phases.
  • 25:20 Since there is no stateful system in serverless architectures, it takes that risk away.
  • 25:25 There’s a lot to celebrate with security in serverless computing.
  • 25:30 But: attackers aren’t going to give up - if we move the applications there, and they’re sufficiently targeted by hackers, then we have to deal with that somehow.
  • 25:50 One key area of security that serverless architecture highlights is application security.
  • 26:00 All of application security concerns - from SQL injection to authentication to vulnerable libraries - all of those become more relevant in this space.
  • 26:25 If you’re in a serverless environment, put your focus on that.

What did you mean by “functions are the security perimeter of your application”?

  • 26:55 That’s the area where serverless increases the security concern.
  • 27:00 Serverless creates more flexibility, and with it, increases the attack surface.
  • 27:05 Serverless gets you to a place where it becomes free to deploy additional functions.
  • 27:20 Each of these functions is an opportunity for an attacker to come in.
  • 27:30 The very same flexibility that allows you to build functions and assemble them together gives additional entry paths for opportunities to break in, or for unintended consequences to occur.
  • 27:50 It increases the complexity of the attack surface, and in doing so, increases the risk.
  • 28:00 In reality, enterprises think of applications as a sea of these functions.
  • 28:10 What happens is that security blocks are often put at the entry point for the application, but would reduce or remove the security controls later on.
  • 28:20 This might be input sanitisation, permission or authorisation blocks.
  • 28:25 You get this combination of behaviours where there are a lot of functions, and might be a potential breaking location.
  • 28:35 Many of the functions rely other (security) functions to be ahead of them, but that’s not coded in any way.
  • 28:50 This natural implicit trust between the functions evolves because they’re on the same network, or the same system.
  • 29:00 That’s a fragile system - but it’s an M&M security system; a hard shell on the outside but soft and gooey in the middle.
  • 29:10 As soon as you get in, you can roam to other functions and cause a lot of damage.
  • 29:20 The good thing is that these systems give you the controls to think about every function as its own independent autonomous unit, and therefore secure it.
  • 29:35 If compromised, this allows you to constrain the damage that it would cause.
  • 29:40 You could put some schema validation on the API gateway to say what the service is allowed to do.
  • 29:45 There’s a lot of goodness in the platforms that you can take advantage of; think of each function as having its own perimiter.
  • 29:55 Some of it is just applying good permissions by the platform, and some of it is in the code to create good shared input validation libraries that are going to be used across functions.
  • 30:15 Make each function a fort unto itself; it’s hard to do this at scale.
  • 30:25 So what concerns me about serverless is that we find ourself in a few years time where an organisation has thousands of these functions with light controls and permissive models.
  • 30:40 Permission models never contract; they always expand and expand, and it’s hard to claw back permissions.
  • 30:55 My concern is that we’re going to trade hard-to-wrangle servers for hard-to-wrangle functions if we don’t plan ahead.

How do you manage updating security policies?

  • 31:30 Taking a page from the core DevOps manifestos, it’s about automation and visibility.
  • 31:35 You need to measure everything: you need to know and track how many buckets and functions are being created.
  • 31:45 You need automation, to inspect or generate the security policies.
  • 31:50 For example, iRobot generates their IM user role management based on a set of configurations as part of their deployment model.
  • 32:10 You can disallow deploying multiple copies of the same serverless function in the configuration YAML.
  • 32:15 If you cannot deploy functions without this YAML, then it forces everyone to build functions in that way.
  • 32:30 It comes back to trying to mechanise the controls within the system.
  • 32:40 The functions are being managed by a small number of platforms, such as AWS Lambda.
  • 33:50 You can use the APIs of those platforms to enumerate all of your functions, and to test them.
  • 33:00 It’s a slightly newer area for Synk, but it’s proving to be quite good, because we can connect to the Lambda system, use the APIs to download the ZIPs, see which libraries are in them, and flag which ones are vulnerable.
  • 33:20 We can do this in a recurring fashion, and the good thing is that you can actually do it in an easy-to-use flow.

What about multiple cloud environments like AWS, GCP or Azure?

  • 33:55 Not at the moment - in general, the tooling in this space is very nascent.
  • 34:05 Lambda is king in this space; the other systems are growing in popularity, but there is no doubt the vast majority of users are in Lambda, so any tool that comes up is going to be first and foremost Lambda.
  • 34:25 You need tools to get to the second level of maturity to get support for other cloud environments.
  • 34:30 The space is so new that very few tools have reached the second level yet.
  • 34:40 There are a certain amount of concepts that apply across these functions; so if you’re a Google Cloud user, and you see a vendor that has a Lambda tool, odds are that if you see a vendor that supports Lambda you could talk to them to get a critical mass for supporting other platforms.
  • 35:00 Usually these tools aren’t inherently tied to one platform; they just need a critical mass of demand to support other platforms.

What does the anatomy of a serverless hack look like?

  • 35:30 Hacks happen in stages.
  • 35:55 There are two classes of attackers: there are statistical attackers, who run automated scripts that run and try and find a victim.
  • 36:05 Then there are targeted attacks; they have a specific target in mind, like entertainment companies, financial institutions, political motivation.
  • 36:35 Most people aren’t targets of the second class - they’re targets of the first class.
  • 36:40 The hackers have automated tools that try and compromise anyone.
  • 36:45 They work on patterns; for those tools, they attack many systems, and use a series of exploits or heuristics to break in.
  • 37:10 Once they manage to get in, they try to get to the next step, such as installing an agent.
  • 37:15 This is sometimes referred to as a post-exploit phase; you proceed deeper and deeper into the network.
  • 38:40 Your next step is going to explore what you have around - and doing it while avoiding detection, so doing it at a certain pace.

How do you work in stages in a serverless environment?

  • 38:05 The server is still around; it’s not entirely stateless!
  • 38:15 Once you’re in, the containers in all of these systems aren’t disposed per execution; they remain warm.
  • 38:35 If you made a small stream of calls to the same function in Lambda, you could see that it’s running on the same instance again and again.
  • 38:55 If you exploit this vulnerability, and have a remote command execution in the Lambda, you can modify the files stored in that container so that any future victim that comes along could be compromised.
  • 39:15 It is more short-lived, but it’s very much not single use.
  • 39:30 Even if the container was discarded after each use, if you break into the function you can explore what it is allowed to do, like enumerate databases or access other functions.
  • 39:45 There are persistent entities out there - you can poison data or extract data.
  • 39:55 So although serverless improves your security posture, it doesn’t mean that it’s impossible.
  • 40:05 At the same time, you now need to monitor for suspicious behaviour.
  • 40:15 It’s easy to do for one entity, but hard to do it for many of them.

Do you have any security hygiene tips for serverless architectures?

  • 40:35 Security hygiene (at scale) is the most important aspect of security for an organisation.
  • 41:00 The statistical attacker is the one that is targeting you the most.
  • 41:10 You really just need to be better than the next one to defeat those automated attackers.
  • 41:15 If it is more effort to attack your systems than others’ systems, then you can divert the attack.
  • 41:25 You need to implement the basics of security at scale across your system.
  • 41:30 That implies patching open source libraries and servers; those are the most prevalent vulnerabilities.
  • 41:40 In the application layer, you need to update and deploy open source library dependencies.
  • 41:50 You need to do both of these at scale, across all applications.
  • 42:00 Alongside that you can talk about permissions and access - we just saw Accenture leak a bunch of information because they didn’t properly configure access permissions on their S3 buckets.
  • 42:10 Configuring access permissions for a single S3 bucket is very, very easy; but configuring bucket permissions for a hundred thousand such buckets is hard.
  • 42:20 It all comes back to being able to do the basics at scale very well.
  • 42:30 It’s not about the fancy super-disastrous hurricane that would make you crash - most of the time, it’s a lot of small resilient entities that you didn’t put in place.
  • 42:50 The last point is defence in depth; don’t assume that a sequence of events or a single layer of protection is invulnerable.
  • 43:00 Understand that everything is fragile and the world is dynamic.
  • 43:10 When you talk about applying security basics at scale, think of solving for a single atomic unit, and the core concept applies beyond systems that are running.
  • 43:30 This applies to developers working on your team; can you minimise permissions so that they are unable to see all of the customers’ data?
  • 43:35 Do you really need your dev or staging environment to have a full copy of the production database?
  • 43:55 If you deal with security hygiene at scale well, then you have solved 99% of your risk.
  • 44:05 There might be some cyber arm of an army that would still be able to break in, but to be in danger of that you need to be in a different league.

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and YouTube. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

Rate this Article

Adoption
Style

BT