Transcript
Rao: I'm Sindhuja Rao. I have with me my colleague, Deepank Dixit. We both are network security engineers at Cisco and work closely on zero trust solutions for customers across APJC regions. This session is aimed to share some insights on zero trust security for your organization and your customers, and how can you get started with it?
Are You Guilty?
I want you to take a moment and think about, when was the last time you changed your credentials without a warning from your company, or thought about what encryption actually goes into the web application you've been designing. If you had changed things recently, had you thought of revoking your previous access for resources? It doesn't mean that anyone is a bad employee for being guilty of any of these things. It is one of the many glitches that we have in our organizations. We just don't do user behavior analysis properly. I say that because 34% of data leakage cases come from inside of an organization, and out of which 90% attacks come from phishing emails, which trigger ransomware attacks very often. Last year, it was roughly estimated that a ransomware attack took place every 11 seconds.
Unfortunately, we have been seeing more of these attacks recently. See the supply chain attack on SolarWinds, or the Colonial Pipeline attack in the U.S. The way REvil attacked one of the on-prem VSA servers of Kaseya was by using sideloading, which means that the actual malware was run under the pretense of a genuine anti-malware application of Windows. One of the reasons why this went unnoticed was because these anti-malwares were probably never upgraded. Unfortunately, many businesses and critical services like hospitals and nuclear power plants also do not upgrade their systems or improve their security features for decades, as they cannot afford any downtime. For example, about 92 ransomware attacks took place each day on the healthcare sector during the pandemic. This affected 600 different organizations, losses of $21 billion were incurred, and about 18 million patient records were compromised and leaked.
Ransomware Has Evolved
It's astonishing to see how ransomware has evolved to something called the blackmail ware model, and with this, they don't want to hurt their media reputation. Most likely they're going to give back your data, but that doesn't mean that they already don't have a copy of it ready to be sold in the black market or on the dark web. As we saw, due to the need of the art, they knew that hospitals being so critical needed the services back ASAP, so they knew that they would pay the ransom. Just to paint a pretty picture, imagine you're working in a power plant, which is fully automated. One day the whole grid goes on lockdown, and you're nowhere getting out of it without the attacker's mercy. The plant, it had the code. It had the automation. It had lock-ins. Still why? Because probably the old IoT security, the software upgrade deadlines that were missed, the people, the organization's ignorance, or might as well be the Illuminati if you asked me. That's worth a Mr. Robot episode, we would not want to be part of.
The Need for Zero Trust
Where I'm getting at is despite companies spending so much resource on development and security, these data breach news are always out there. We have seen data leakage cases of renowned MNCs with one of the biggest IT infrastructures. When their services go down that acts as a very good opportunity for attackers to create new backdoors for web applications or data centers of your social media accounts. It is very costly. Companies need to pay a lot of money not only to save their businesses, but in litigations, government disclosures, and other stuff. Also, the way we are working has changed drastically. You're now able to log in and work from your iPads and phones. The current security scenario just can't keep up with it. The same rule checks that you run for your workstation or on-prem devices, they won't work for your personal devices. What would work are the exploits. Finally, it is important to understand that just because one connects to a secure organization, the endpoint which is you, the user and your device, it doesn't become secured itself. It is still vulnerable as much as it was before, and poses as threat not only to itself, but to the lateral movement of traffic inside the organization, and far more quickly than you can save it. Protecting customer data is no longer our only requirement, the organization's data is now equally vulnerable. Hence what zero trust basically talks about is building trust with resources and devices equally, inside and outside your network. That is just by not trusting either of them.
The Multiplayer Surprise
What does zero trust bring to the table that you already don't know? We can think of it as the current approach is based more on static and network based perimeters like network location or the IP addresses. What's new is we now focus more on the user's assets and resources instead of the network itself, which makes more sense because the originator of traffic and exploit would be among these. Also, think about the implicit deny rule as a default ACL that you might have heard of. That's the approach. Just because your username is admin, you shouldn't be given full privilege. Privilege escalation attacks are often exploited this way. Rules should be built around denying access by default, rather than allowing even some level of access. Also, the linear approach of security revolves around threat by nature. How to recover from it. Then decide how to mitigate it. However, the approach should be to limit those chances rather than taking them. Of course, it doesn't mean that threat hunting is any less fun as zero trust, it rather takes threat hunting more rigorously, and can be modeled to act in real time using orchestration, studying IOCs.
Zero Trust eXtended (ZTX) Framework
Zero trust is not a new concept, honestly. It began as a discussion in about 2004 in Jericho Forum. Why are we talking about it now? The problem was that even though knowing the advantages of zero trust, it was not easy to implement as the networks changed a lot through years. Now we are at a point where we can accept the fact that just like an IT perimeter, our networks are also changing. That means we can't run the same decade old security structure. We're also able to implement complex scenarios and network designs easily, thanks to the available resources in technology and advancements, as well as brilliant engineers, architects and network admins. Forrester then changed their original design a bit in 2017, and said, data is the center of the universe so we should focus on how to manage it, classify it, categorize it, and most importantly, protect it. This is the zero trust extended framework that is widely used today. When we talk about the ZTX framework, there are about seven pillars to it that Forrester in their audits determine in the industry to decide who is doing a better job at implementing these zero trust principles, for their customers and for themselves as well. Based on these criteria, we have some leaders and strong performers in this field.
When you want to implement the ZTX framework, in addition to selecting which vendors you can select your products and services from, you can answer the following questions which can give you an idea how to implement the zero trust. For example, what are the firewall rules you're using? How are you encrypting your data? How are you controlling it? How are people in your organization actually understanding security internally? How are you identifying devices, which is one of the main concerns, and IAM complete picture there?
NIST 800-207 Zero Trust Architecture
To understand zero trust architecture, this is the ideal model for zero trust, developed by the National Institute of Standards and Technology. In this, the policy decision point is the brain of the operation. It consists of the policy engine, which is responsible for the ultimate decision to grant, deny, or revoke access to the resource if required. The policy administrator is like the middleman between the control plane and the data plane. It will command whatever action it receives from the policy engine to the policy enforcement point, which is part of the data plane, and it sits between the requester and the resource at all times. This is the PEP. The policy enforcement point would be what the requester is going to interact at all points, say your firewall, a login page to a web application maybe. For this PDP to work, the architect or the admin should feed some external constraints and rules as per their network. For example, should the devices be at a particular industrial standard, or if you're using certificates, what should be your CA, and other constraints? Also to note, a lot of companies nowadays prefer building their own models and architectures, so you might find an additional trust engine and other models. They're mostly based out on this architecture itself.
What a Cybersecurity "Mesh"
One of the variations of zero trust is microsegmentation. As the name suggests, it segments your network to install multiple policy enforcement points for your smaller perimeters, which shapes the cybersecurity mesh. It shifts the focus from implementing PEPs at the perimeter to a custom based identity based verification approach, still keeping orchestration centrally so that any threat remediation can happen simultaneously on all nodes, which helps to minimize the lateral movement of any attack vectors that we discussed. Now your IT team can create smaller perimeters based on certain aspects. For example, one layer can be for remote users working on application sandboxing. Other layers can be on deploying on-premise workflow. This gives less scope for cyber criminals and hackers to exploit an entire network, again, meaning less lateral movement of attacks.
Get Started With ZTX or Cybersecurity Mesh
Be it ZTX or cybersecurity mesh, the goal for our approaches is the same. Protect the resources and requesters from harming each other, and the entire network. To make that decision on PDP, and implement your PEPs, one should know majorly what type of assets are in the deployment, because rules for company assets will differ from the rules for personal devices or IoT devices that belong to the network infrastructure. Additionally, the packet flow visibility is very important, as this defines how we can continuously maintain that trust we've been defining. Intrusion prevention systems integration on PEP, like a next generation firewall over the network is something one should consider. Also, never let the requester interact with a resource directly. It's like inviting Satan for dinner. It's not going to go well. Obviously, there are many other approaches and network requirements to consider which will change based on type of devices available, importance of the resources, how scalable is your deployment, and the cost that can be allocated for this network, and other factors.
Remote User Use Case
As an example of what we saw till now, we can take a use case of the work from home scenario. To implement that with zero trust, what do you need? First, a headend that is smart, where the users can terminate their sessions. This can be your next generation firewall or existing firewall with some IPS capabilities. Next, assuming that you're always on a public network, it's a good way to incorporate encryption in your businesses. That can be using VPN, which, unfortunately, a lot of us think that it is replaced by zero trust, but that is not at all true. Zero trust doesn't talk about security principles of confidentiality, integrity, and availability for data by default. It does have values of it, but we need some tool to implement the same. Assuming that you're always on a public network, works. Then choose a happy policy engine. This will help with authenticating the users. Again, just username and passwords will not work. Your beloved Admin123 password can be cracked in less than a second. We need more than that definitely. That brings us to multi-factor authentication, and certificate-based authentication, along with EAP-TLS.
Again, authentication by itself is not enough. We need a dynamic policy as you might have noticed in the architecture, and so static rules are no good in zero trust. Often missed but very important is when you want to go back and find out what was done is logging decisions. It can be done through Syslog servers, your monitoring tools. A major pillar of zero trust is to continuously monitor the traffic in the network. It's a cherry on the cake if you can have that threat defense tool integrated at all possible points of intersections of potential attacks. Now to notice why I didn't put a linear list of these ingredients is because that's the beauty of ZTX, or zero trust. You don't have to start at a single point. You can first choose your IAM server for example, make rules there, and then integrate the other sections of the framework. Similarly, you can begin with your threat orchestration tool or logging method, and then move to your headend implementation. This stands true for a remote user or on-premise user application anywhere. We'll visit this with a short example, after Deepank walks you through the policy engine and PDP, in terms of identity and access management.
AAA (What is A, A, and A?)
Dixit: I will take this further by discussing a little bit about IAM, which stands for Identity and Access Management. Starting with AAA, this is the technology involved behind IAM. It stands for Authentication, Authorization, and Accounting. To quickly recap, authentication means, who is requesting for access. Authorization is what permission do we have based on who is accessing the resource. Accounting largely helps you update sessions and monitor the resources a user consumes.
The Trust Algorithm
In any enterprise with zero trust implemented, we have the trust algorithm as the primary thought process behind granting or denying access for a user. It is broadly categorized into five groups. The first is Access Request, which is request from supplicant down to the IAM server. Subject and Asset Database contain information on users, their context, and their assets. Resource Policy Requirement would be the minimum policy requirement you have to meet before you can gain network access. These requirements are generally put in place by network administrators or data custodians who understand the impact of assigning incorrect policy to an incorrect user. Threat intelligence is the last one, which is concerned with looking for any malicious activity in your live network.
How to Quantify the Trust
So far, we have discussed about the term trust a lot, but for IAM server to understand this terminology, the meaning of the word trust has to be quantified. The way we do the quantification is by thinking in terms of assigning brownie points for meeting each predefined criteria. For example, imagine yourself at a bar entrance, and say you have to meet at least four out of five guidelines before you can enter the bar. This is much like how profiling works. IAM server tries to profile a device based on a similar scoring system. In this example, we assume that 20 points is what is needed for a PC to be profiled as a valid Windows 10 workstation.
Probes Aren't Magic
Let's look into detail what this scoring system looks like. Here we are using probes, which are just protocols used to collect as much info on the endpoint as possible. Our journey starts from zero points for a PC because IAM server knows nothing about it. Using RADIUS protocol, RADIUS access request messages sent from PC to IAM server. An IAM server will look into the RADIUS packet attribute and identify that it has the OUI that matches with the OUI field of Microsoft Windows. There we have a little bit of idea that this might be a Microsoft device. If configured with DHCP, we can have the endpoint send a copy of DHCP request down to the IAM server, and IAM server will look into the DHCP class identifier field and say that this definitely is a Microsoft device. To dig deep, we have a third probe. If the endpoint tries to browse to internet, IAM server will read off the user agent field from the HTTP packet and see that it's actually a Windows device not just a generic Microsoft device. Lastly, we have AD probes and Nmap probes working together, which will work similarly to the previous probes. They will be able to dig deeper than the last one, and understand this is a Windows 10 PC, and it must now acquire 20 points as we decided. This is how profiling works for any endpoint in a zero trust environment.
CoA - Trust Is Earned, and It's Temporary
Going to CoA, I would say, CoA, which stands for Change of Authorization, is one of the most crucial elements of a zero trust architecture. For example, imagine I come into office and connect my company laptop to the wired network, and everything is good. Ten minutes later, I decide to turn off my antivirus because I want to download a game off the internet. Does my threat profile change after this particular action for the IAM server or it remains the same? It must change because I just shut a crucial antivirus application from my PC. In the worst case, I myself am now a potential threat to devices on the same network in case my device is compromised. In such cases, IAM server re-authenticates my PC with a restricted access policy, which is what this flow shows here.
What Is Posture? Why in Zero Trust?
Moving into posture. Again, after CoA, it's another central element of the zero trust architecture. One question that it helps us answer is that, is the endpoint compliant with the company's security policy or not? The way we answer that is via some condition checks. Here in the example we have antivirus check, latest patch check, and even the USB plugged in check to see if the device that is onboarding to the network has any USB device plugged in, because that's not a good sign. That shows that maybe you're trying to introduce a rogue device into the network. You can create your policies accordingly and say that any such devices which do not meet these condition checks, they should not be granted access. Or if they are granted access, that should be a very limited access because they will be marked as non-compliant after posture check completes.
Native Supplicant - Cisco AnyConnect
Let's have a quick walkthrough of how I provide zero trust for a workplace. I suggest Cisco's way of implementing an IAM solution. On the left, we have users coming in via different mediums trying to gain access to network resources. Their access request has been sent to an IAM server, which then tries to authenticate the users with the help of SAML providers, or it could be LDAP servers, or a bunch of Active Directory servers on the backend. The authentication protocols in most cases would be PEAP-MSCHAPv2 or EAP-TLS for the user or a machine authentication.
Threat Detection and Incident Response
We now move over to the part where, how you manage trust and how you handle threats come together. One of the key things that threat detection and incident response focuses on is the continuous evaluation of the traffic flow in a network. This idea of continuous evaluation stems from the fact that in the security world of today, it's always good to be suspicious. You can achieve that in many ways. You can integrate your threat detection tools with your IAM server, or you can have your policy enforcement points like firewalls, or have cloud based services on lease, or you can have a combination of them. The idea is to make it as robust as you best can.
Demo
Whatever we have seen so far, we can see through a demo now. Let's understand the topology first. We have a remote VPN client, Cisco AnyConnect, in our example. It can be any VPN client. We also have a malware protection module installed on top of this VPN client, so that it's able to detect malware on the device itself. In addition, if this enterprise takes its security solution seriously, then they at least have a two-factor authentication solution in place. Duo is one such two-factor authentication solution, but obviously it can be any solution you want to put into the network.
For starters, let's assume that all of this works well, and the VPN client connects to the firewall. This firewall is then connected to an IAM server for primary authentication, and to Duo for secondary authentication. This is how multi-factor authentication is being achieved here. This firewall is also connected to an orchestrator to its left, SecureX, in this example. It can be any orchestrator. Orchestrators are a great tool to create flowcharts of how many devices you have, how they fit into the topology, and what actions to take when a malicious activity is detected in your network. Assuming everything is working fine, but at some point later on a malware was detected on the PC. Then let's see how you handle that.
VPN endpoint sent an IPS alert to firewall, where IPS is Intrusion Prevention System. This IPS alert then triggers a SSE alert to the orchestrator, where SSE is Secure Server Exchange alert. Orchestrator now initiates a workflow which we had created to handle such situations. What does the orchestrator do? It sends a CoA to IAM server to terminate this endpoint's session, because now we want to contain this threat to just that one compromised endpoint. By doing this, we are essentially limiting the lateral movement of any attack within the network. Next optional step could be to send block user message to Duo application to mark the user as blocked in Duo database. Lastly, you can have your malware protection solution, for example, Cisco AMP. It can be any. To get hold of that file and study it using a threat intelligence solution, and gain more insight into why the attack happened, and how could it have been prevented.
Government Compliance
A little about government compliance and zero trust. We have included the white paper so that you have enough time to go through it later on, and understand how governments around the world have not just been adopting but also mandating the implementation of zero trust.
Some Hiccups
Let's be honest, it's a lot to take in and we understand it. It's not so simple. There are definitely some hiccups along this way of implementing zero trust. While many of the tenets that we have discussed have merits, but at the same time, some might be extremely complex because of a few reasons. The most notable one, which knits these all together is that many organizations today have technical debt, meaning that they are running applications more than a few years old, and they have been building their own software for consumption around the same infrastructure. As a result, redesigning and redeploying your architecture, or shifting to an entirely new one for that matter, could be quite costly and service disruptive. Service disruption is bad for businesses, be it a major bank, military, university, or a hospital. Technical debt is a major factor here.
Summary
To summarize on the big points, when we are following zero trust ideology, we want to grill the endpoint on proving its identity. We want to periodically re-authenticate and follow the least privilege access principles. Since static rules do not apply here in this network, you need to understand your network to design a better zero trust architecture. Lastly, there's no company that can sell you a box and say, "There you go, you can now have a completely zero trust architecture implemented now." That's just not possible. It's not the nature of whole zero trust paradigm.
Key Takeaways and Action Items
The first takeaway would be giving up access of unused resources. Giving up access of such resources does not only increase data security, but also reduces data wastage. As many here might be software developers, knowing secure coding is a very good practice. You can look up OWASP Top 10 vulnerabilities that might be relevant to your code. Network engineers here can think about whitelisting better than being the whole blacklisting thing. Because honestly, if you think about it, no one would be going now and blacklisting thousands of ads that manually show up every second. That's something that we as a security community should start focusing on. Zero trust eventually is a journey for all of us as network professionals, a tough journey, indeed. Looking at the future, I think it's the one definitely worth taking.
Questions and Answers
Westelius: I think a lot of companies would want to be able to implement a zero trust architecture. To your point, Deepank, you can't really buy a box, and it just implemented it and get it one and done. That would be wonderful, but that is not really realistic. For any company working from largely an open perimeter, or a closed perimeter with no internal trust systems, this can introduce some issues for users, you're suddenly not able to access everything within your ecosystem. How would you recommend one implements a zero trust architecture without introducing too much friction for users? Are there any immediate things that come to mind?
Dixit: I think, a lot of time, we run into these situations where customers want to basically change the nature of their entire network. The reason why they are doing it obviously, is because they want to implement more robust policies, from the security perspective. What ends up happening is that when we get an issue in the network, it's mostly happening because of the poor technical documentation reading. You cannot start implementing anything before you have gone through the complete documentation that that technology or the product has, that company has put in over the internet. That is definitely one of the factors. It takes a lot of effort to go through the documentation and figure out, ok, this is something that you specifically need, and these are the things that you can circumvent, and you don't particularly need them. That's the first thing.
The second one would be to not have a lot of numbers when it comes to network engineers, because it often makes things very bad. For example, if I am managing a network, then having more than a few people at a time clashes with the different sorts of ideas. It makes things worse down the line because you want to stick to one set of ideology. If you have a particular topology in mind, and that has been approved, so that should go forth, and that should be implemented, given the pros and cons. If you make changes to a dynamic network, that messes things up down the line. It makes things very difficult to manage. Once things are in place, it makes it harder for you to manage if you want too much changes there.
Westelius: Also, in terms of suddenly introducing more difficult, if say, a user can no longer access any particular resource, where do you see the value of self-serviceability of escalating those privileges? What does that look like from your perspective?
Dixit: From IAM point of view, whenever we run into issues like that, those issues are inevitable, it's just that when you are implementing that you have to have a maintenance window in place, so that you can basically run those tests out for over a period of at least 24 hours or 6 hours. There is a big network disruption that is bound to happen, but it depends how smartly you manage it. If you have an upgrade planned on a software that's been running for a few years, you can possibly try to segment that upgrade over a few steps, so not take one entire upgrade over a course of few days. Separate that out over a course of four hours, maybe. Have failover in place, so that whenever a node is upgraded, you have a failover to serve those users in the meantime.
In terms of privilege escalation, if you have made your policies correctly, then possibly you won't run into issues like incorrect access, or shortened access of any sort. If you are expected to get some access, that is the only thing you should be hitting. If that's not the case, then that should be figured out in the testing scenario and should be brought to us as network engineers, or the specialists who are well aware of the implications of those policies. The troubleshooting comes into picture. It's not just one path you can take. You can definitely plan better to implement better.
Rao: I think one of the most important things also is to understand who's a good consultant to your network, who understands your network very well. Who knows the impact of what it's going to cause to the users who are going to come in. For new users, what type of changes you're looking for? Are your users comfortable with that or not? How scalable is that idea that you're trying to build on your network. With all of this in place, I think good consultants are what a lot of companies are looking for. If anyone is looking for that skill to build up as well, I think that's a great choice being in security, helping your IT team with what should be good for your company. How about implementing new security scenarios and upcoming ideas in terms of security. I think that's very important. Making a little bit of changes at a time, not going to drastically limit the number of users. Where you're making the changes, make test scenarios. Pen testing with your attacks, all of these are very important.
Westelius: Both of you refer to defining the right policies and finding the right people who understand the network and the users. Especially as implementing zero trust in the beginning, I think it's usually important to get those starting policies right. Because otherwise, if you implement something, that change can be quite significant to your users. Curious how you get to a good starting point for a policy, and how do you partner with the organizations that you would impact to establish a good starting point?
Dixit: First of all, you have to draw it out. I think there is no way to work around it because you have a lot of moving components in a network. You have administrators. You have network users. You have maybe scientists or students, faculty, everyone if it's from a university's perspective. It could be a lot of basically types of users that are coming in. Some users might be accessing via a wired network, some from the wireless network, some will be accessing network from their personal device or a VPN user. There are a lot of variables, but you have to list them up and see, if this user is coming in via this route, and this is the type of the user, then this is the least privilege access I want to give him under these circumstances. On the basis of three or four variables, I would say, largely, like what type of user is it? What medium it is coming through. What kind of access does it deserve? I think those would be three major variables, in this case, to ascertain what access we want to give to that user.
There are a lot of devices. On the devices, there are a lot of privilege levels. Variables are there. You can go as granular as you want. It's just that you have to start very simple in a way that you're not granting over access to anyone. Because if the access is not there, you can always come back and tweak that policy to grant more access. You don't want to give them unauthorized access to any point. That's the whole point here. Draw it out. Start somewhere. Go through the technical documentation. Reserve time for it, because without understanding the implications of the commands you're putting in a device, it might get really nasty sometimes, and you might suffer a really huge downtime. Those are the things to keep in mind. That's what we advise basically, on everyday basis to all the customers who get in touch with us. That's what the purpose is basically, the ultimate goal is.
Westelius: Just expanding on that slightly, and that is, what do you see usage's role in dynamic access or dynamic privilege on policy setting being? Should you in terms of like a zero trust policy, continuously make that evaluation and adjust policies based on what your user groups are actually utilizing within the network.
Rao: I think one of the key things that we are moving towards, or we have already implemented is automation. It can help you drastically if you're going with zero trust, and you're evolving with your networks. Make use of whatever you have already. Try out the tools that you have. Understand the type of users. As Deepank said, it's very important, because now we're moving towards user based everything. We're giving too much importance to users so you better know what type of users you have. Once you have that data with you, you can automate a lot of things with knowing what you want to implement, how drastically you want to change those dynamic policies. Create workflows, orchestration. Orchestrator services are one of the key things that can help you with understanding that point of creating dynamic policies. What should your IAM server actually try to implement to your users without your intervention, or the admin's interventions? The network engineers are actually very underrated, so you can consult one of them.
Dixit: At one point, I spoke about Change of Authorization. That's something that's encoded in a lot of IAM servers. If you want to manage things dynamically, you can configure an IAM server to basically take actions according to what's happening in the network. If I get an event stating that something malicious has been detected, then your IAM server should be automated to take an action against it and try to isolate the endpoint, and basically keep that attack vector from spreading in the network. Change of Authorization is extremely important, because you don't have to manually watch the network all the time. Because there are so many moving components, you cannot possibly do that. If we have some policies in place to detect a certain behavior, you can create policies to take actions against those kinds of behaviors, which you want prevented. That's a big help in such cases, definitely.
See more presentations with transcripts