Transcript
Bedra: This session is going to explore a lot of the options that are available today with computing with hardware enclaves. The different providers. The different platforms. The different structures. The things that we can do, or consider when we're making a choice about using or not using enclaves. Really, it's all about the trusted execution environment. This is really the pitch. The 10-second pitch is you can do something with secrets in a trusted environment where nobody else has access or visibility into what's going on. It's ability to encrypt, decrypt, sign, provide attestation. A general purpose environment that is structured separately from the rest of the runtime. It changes the model for security. I think this is a really important concept to have going forward for traditional server hardware for mobile devices, for tablets, for IoT, really anything that requires some sensitive computing really should be considered in the realm of trusted execution environments. Really, hardware enclaves are a subset of trusted execution environments.
How Do You Solve Secure Execution for Critically Sensitive Data?
There's a big problem that we want to solve. It's, how do you solve for the problem of secure execution for critically sensitive data? This is a notoriously hard problem. There are lots of things involved here. How do you keep key material secret? How do you keep other people from spying or seeing the transaction happen with sensitive data? How do you search encrypted data or search sensitive data? How do you sign things? How do you really just go about anything that might have some sensitive execution? There are a notorious set of problems associated with this, especially around the bootstrapping problem. You might have secrets stored. You'll need a secret to get to your secrets, and how do you secure the secret that gets to your secrets? There's a whole bunch of cascading problems that are associated with it. It's really cool to look at these trusted execution environments and these hardware enclaves as a way to maybe tackle some of the problems that were otherwise unsolved or otherwise intractable in some kinds of environments.
This simple drawing demonstrates what the state of the world was like before these came around. You effectively had some process running on some machine that would contact a hardware security module and perform an operation that could be sign. It could be encrypt. It could be attestation. It could be whatever. Ultimately, it was some transmission to a remote process. It may not always be a network thing, it could be something that was plugged into a USB device or to the PCI bus, but there was some "remote communication." It would typically happen over a PKCS 11 channel, or as the different cloud providers came online, they started offering HTTP APIs or REST APIs for the same kinds of transactions. If you wanted something encrypted, you would send it to the service and it would come back encrypted, keeping the key material safe, tamper resistant, all kinds of protections around the key material. Then if you wanted to decrypt, you would send that encrypted blob and you would get a decrypted blob back. Pretty simple idea, very complicated setup, very costly to use. Ultimately, redundancy and resiliency were difficult problems to solve. They're a great tool to have in your toolkit. You should always be thinking about using them for general-purpose encrypt, decrypt operations, but they don't fit all the problems. While this is good, we need something else as well.
The trusted execution environment changes the model a little bit. It effectively says I'm going to live inside of the execution, inside of the runtime of the application. Really, I'm going to sit inside of the processor or be adjacent to the processor. The machine instructions are run, the trusted execution environment can be invoked within a process. Data edge shipped in and out of the process runtime, and special instruction sets or special machine code can be run to perform the enclave or a trusted environment execution. This changes things a little bit, and we'll dive a little bit into the threat model and what happens here. This is a more copacetic environment when you're in the Internet of Things world, or you're in a world where there's not maybe a reliable network, or the ability to remotely communicate with an HSM. It provides a different set of parameters.
Not All Created Equal
A big idea here is that not all these are created equal. I'm going to list through some of the options here, and just know as I'm exploring these, your mileage may vary on what's capable in your environment to what your architecture provides access to. What the features of these hardware enclaves really are providing at the end of the day. When I talk about these, I'm going to either talk about them in the frame or the context of a single vendor, or maybe generally, but always encourage you to go through and think about what the real true offering is in your environment.
Available Options
There's a few available options. This is a little bit of an asterisk. There's more available options here. I'm going to talk about the more broadly available things. Intel has a platform called Software Guard Extensions, or SGX. I'm going to go a little bit more into detail here. The examples I'm going to provide at the end of this talk, including a full end-to-end example of how to seal and unseal a secret inside of SGX will be weaved throughout this. AMD has a similar platform, they have a secure execution environment, their SEV environment. It is similar to SGX but there are a few notable differences. I definitely encourage you to read the documentation on what's provided and what isn't. ARM has a technology called TrustZone. This is available on a majority of the ARM platform system on a chip environments. Apple has their Secure Enclave. This has been around for actually quite a while. I think it's probably, maybe not the first players, but the first popular players in the enclave space. AWS has an offering called Nitro Enclaves which offer a similar trusted environment. Notably off this list, but worth mentioning are the Google Cloud and Microsoft Azure clouds, have similar enclave offerings, but they tend to rely on either the AMD SEV, the Intel SGX world. They're building on top of one of the other providers in this list. It's redundant here. It's worth noting that the major cloud providers provide access to enclaves in different forms. The nice part about that is you get a uniform layout for that particular cloud provider. You can do some better management of expectations where a custom owned or custom data center environment may be a little more difficult to pull off.
Hardware Enclaves Represent a Change in the Threat Model
What's really important here is that the hardware enclaves represent a change in the threat model. When you're thinking about how to attack a system or how to attack software, you consider the threat model as a way to think about what could go wrong. That threat model is a really important thing to consider here, because we go from a attack the entire runtime, the application, the OS, if there's any virtualization layers, anything that might exist, they're all vectors for attack. This is a diagram from Intel. This is a little bit more Intel specific, but the theme exists throughout, I think, really all of the vendors, where it really changes it to potentially leakage from inside your application if you design it wrong, and very specific attacks against the hardware itself to leak these secrets.
Intel SGX Application
Again, this is another Intel diagram. A lot of these have a very similar model. I happen to like these diagrams, I think they illustrate the point very well. It's effectively inside of your runtime of your program, you have a bunch of untrusted code. When you need to operate on a secret, you'll enter a trusted code or trusted environment to perform your operations. You'll ship the code or you'll ship the data you want to encrypt, decrypt into the enclave, let it perform its duties, and it will ship you back the results. You have a hard wall between them.
First Impressions - Ergonomics: Apple
Not all things being created equal, I think it's worth noting that there's a few things to consider here. For me first impressions go a long way. I want to note a couple things that I experienced as I went through the different vendors and looked through what was available. I think the first thing is ergonomics. How easy is it to use? What are the available SDKs? How can you quickly load and develop and work with it? The ergonomics award absolutely goes to Apple here. The ease of use in Apple's Secure Enclave environment, awesome. Great documentation, available in Swift and in Objective-C. Build very easily to Xcode. It feels batteries included. It's a really nice environment.
Broad Application: ARM
My racehorse here, the thing that I think is going to go the furthest and the broadest is really the ARM TrustZone, the system on a chip, applications, mobile devices, you got the tablets and phones. The ARM platform is just going everywhere these days. I really think that ARM has the ability to really penetrate the market in terms of what it means to have a nice Secure Enclave, especially in the Internet of Things category, we just don't have that reliable network connection and that kind of isolation. I think that TrustZone, because of how it's positioned and how it's used, definitely has some advantages.
Most Depth: Intel
I think the depth category, Intel wins. They seem to have really gone deep in this problem, offered a lot of different ideas. Unfortunately, I think it probably is one of the more difficult ones to get going and use. That's one of the reasons I provided some examples. I do think that Intel is thinking about a really complete picture here. It's important to note that I think Intel has a lot of depth in their offering.
Considerations - Performance
Some considerations though, as we think about how to work with enclaves, how to think about their application, there are some things you want to consider. It would be remiss of me in a hardware or mechanical sympathy track to not talk about performance. Using hardware enclaves is going to come at a cost of performance. If you're simply encrypting data with OpenSSL, there's a lot of optimizations that have been done on the chip with intrinsics, with just speeding up general hashing and encryption operations. You really can't beat that native chip enabled OpenSSL performance you're going to get out of the box. You're going to take a performance hit. You're going to take a little bit of a throughput hit. You're going to jump protection rings. You're going to run different hardware instructions. You're going to encounter marshaling or shipping data back and forth to other parts of the hardware that may or may not be as fast. There's going to be a bit of a performance hit.
There's going to be a link at the end of this talk to a paper written specifically about SGX. I think it's a good paper that documents the kinds of performance impacts you're going to see as you go to hardware enclaves. For me, I don't really tend to think about these as something that I need a lot of throughput or a lot of sensitivity around. I tend to think of them as a really great way to solve the secrets bootstrapping problem, a way to maybe create an environment where you could have a good certificate authority producer. Things that are maybe not high volume or high traffic, but really more extremely sensitive.
Threat Model Evolution
One thing to note, though, is the threat model has changed, but it still exists. There are still threats. You can still create a side channel in your application by leaking the data that you're putting in and out of the enclave in the wrong fashion. There's still ways to attack your software in a meaningful way to extract that data. You want to be careful that while the threat model changes, and while side channels are a lot more difficult to pull off, it doesn't mean they don't exist. You really want to think through how you're designing your software. As you're designing for the enclaves to include an updated and legitimate threat model.
Fixing Microcode Issues
It's really hard to fix microcode issues, when an instruction set is shipped and microcode issues have to be corrected. SGX had an issue early on with performance due to some OpenSSL initialization. Those updates are impactful.
They can be fixed with shipping updates to the operating system. A lot of times it requires a reboot. Reboots can be expensive in some environments. Sometimes it requires flashing. These BIOS level updates need to happen to fix issues as well. That's downtime that potentially is hands on a machine in a data center. It's a much more expensive, potentially more impactful and difficult thing to fix. If you're relying on this and something is wrong, it's going to take a lot longer for that fix to come out in some cases. You want to consider, what is the cost to fix if something goes wrong?
Speculative Execution Vulnerabilities
Speculative execution. There's a Spectre class of attacks. While this provides avenues out of some of these situations or some scenarios, it doesn't mean that you're bulletproof or completely free of speculative execution. There's been multiple instances of these type of attacks impacting hardware enclaves. It's not that all of them impact them, but some in some degrees do impact the hardware enclaves. You have to be really careful and really sensitive to the speculative execution attacks and some of the things that are going on in the security landscape right now, and paying attention and patching when those things come around. If you are using these environments, make sure that you're always up to date with what's going on there.
Designs Rely on Protection Rings vs. True Separation
Some designs rely on protection rings versus true separation. Some designs have even separate chips. It's not just a separate instruction set, but a separate chip that's provided on the system. They're really truly separating everything. Some of them require protection rings. Effectively you're going to operate at, maybe you're at a kernel level at ring 0, and you're going to drop to ring -3 to do the enclave transaction. Maybe you're in user space, and you're going to drop down to ring -3. That management engine style Intel provides. I don't want to speak that there's one true way to do this. Some designs have just a different set of consequences. You want to think through those when you're thinking about what your threat model requires.
Implementations Gated by Vendors
Some implementations are also gated by the vendor, unfortunately, and it requires registration with the vendor. Registering a set of keys, authorizing your keys to run inside of an environment. There's not just a plug and play deploy at will and manage this wherever you want, you have to do registration. That requires some back and forth. It can be a little bit cumbersome. If you have a broad application, it's certainly good to go through. You want to think about the consequences of what the vendor offers and what the hurdles are to get to a true production environment. The demonstration I'm going to show you at the end that I'm going to link to, is something that can run in simulation mode and in hardware debug mode. You'd have to have a true set of keys that are blessed by Intel to run in a real release mode.
Implementations That Are Difficult To Use
Some implementations are also difficult to use. It's a little unfair to say they're difficult, these are low level things. They're not going to have warm, fuzzy APIs to use. You're going to use more raw native style code. It's going to be C, C++, in a lot of cases, potentially Rust in some cases I've seen as well. You're going to think of maybe a lower level, more machine friendly language to use some of the implementations. It requires special drivers, kernel drivers, special devices. The stack can be a little difficult to set up. It's not always clearly documented how to go about writing software for these environments. There's a lot of questions. If you dig, you can really get your answers you're looking for, but it's not complete. I've offered a lot of links and different things at the end to maybe help guide you a little bit. It's going to be some hands-on critical thinking time to really work through how to use some of these environments.
Limited Availability
Some suffer limited availability. In the ARM category, it seems like, here are the chips that's supported. It's a pretty well-defined thing. Apple supports it for a broad majority. I don't think they make anything new that doesn't support this. There has been some stagnation on Intel's part for the Intel platforms. Desktops are pretty readily available with SGX hardware, the prosumer level servers, the pizza boxes, the smaller servers with a single socket, you can tend to buy with SGX ready hardware. Some of the more capable enterprise grade servers don't come with any support for SGX. It seems like Intel's noting that the next generation will support these or maybe more freely available, but you have to pick and choose how you run that and where you run that. Limited availability is more in terms of what you can buy that has that batteries included support with SGX.
Hardware Enclaves Are a Key Component in Advancing Design of Secure Software
Ultimately, and really the main message here is that hardware enclaves I think represent a pretty key component in advancing software security and the design of secure software. That goes with a sorely missed a piece of designing secure applications, designing things that handle critical information in an interesting way. It's very easy, in some cases, to attack a system where even though something might be encrypted, the encryption keys are laying around or not protected very well, where attestation of an environment may be difficult or tricky. You want to be able to really nail it and really prove who's doing what, where. It's a good missing piece. I think the secure design is one of those tactical things. It's something you want to do. Make critical use of that important hardware and solve some of those more difficult trust problems. When I thought of giving a security twisted talk on a mechanical sympathy track, really, this is the first thing that came to my mind. Which is, how do we use hardware to solve some of these more interesting and almost unsolved security problems?
Links and References
I provided a few links and references here. The first is going to be a repository that I pushed out that is a complete end-to-end example of sealing and unsealing inside of a hardware enclave using SGX. The second is a Getting Started documentation. There's documentation in all of Intel's various repositories, how to set things up. This is more of a point in time, start to finish, type these commands kind of thing that puts all together. I find it helpful for recreating my environment. Next is a performance paper, what are the performance expectations of using a hardware enclave, specifically SGX, but I think it's applicable to a lot of the other platforms as well. Then the last few are just some general brochure-ware links into Intel, AMD, ARM, and Amazon's offerings. The last I think is really worth noting is the open enclave project, which aims to bring ubiquitous API across all the enclaves. You can plug and play different enclave technologies across a broad variety of environments, whether you have a hybrid environment, whether you're across multiple cloud vendors. Giving you a better offering to design a more holistic solution that spans lots of different data centers and architectures.
Questions and Answers
Montgomery: I saw Aaron's posts on SGX and I had done some initial evaluation work for SGX years ago in college for our class. I am wondering if there's a bug in the SGX implementation, what constitutes your patch. Because I've heard that there's a limit to how many patches you can burn into the chip. Are there limitations about changing in the microcode? If there are those unresolved there are side channel attacks that leak memory, which there I believe there was something that had to do with that as well. There's some question about microcode based OS's as an alternative, and other things like that.
Bedra: I'm not familiar with the number of flash limitations. I don't think I can speak to that in any expert way there. That barrier is difficult to solve for. There really isn't an instance where you can just like, in real time, update the microcode, or flash the BIOS, where you're not taking some downtime, or adding some complexity to what's going on. If you're going to do this, and you're going to do it with your own hardware, now you need two and probably three of everything, so that you can deal with the cost of fixing, patching, rolling. The enclaves are typically enclave specific. If you want to run something, you seal something in one enclave, you can't throw that sealed data to another enclave and automatically just have it unseal. It's going to be specific or unique to the enclave. There's other things you have to do, some of them are attestation tools, other things help you make it more uniform and give you that maybe active-passive versus active-active environment. Yes, you want to plan that accordingly, of course.
Montgomery: In the case of SGX, SEV, which are supported by some platform server and HEDT, such as EPYC, TR Pro, are there challenges when coding testing on consumer level desktops?
Bedra: In terms of challenges, I think, with SGX, in particular, they gave you the simulation mode. The simulation mode doesn't even matter if you're on Intel hardware. As long as you have the SDK and the SDK can be compiled and installed, you can compile and run SGX based applications in simulation mode, which is great. I think these x86 instruction sets in general, are all that's required. I think you can go along the way of the simulation mode, if I understand it correctly, still just using OpenSSL under the hood. You're getting all of the same types of operations you'd get. OpenSSL is inside of the enclave as well. You're going to get very uniform in terms of behavior, in terms of what's going on. Performance will be different, of course. You're going to use some of the first class normal intrinsics for OpenSSL in the simulation mode, whereas the enclave will take the actual marshaling hit in running those instructions inside the enclave. Otherwise, yes, I think it's pretty good experience on a consumer desktop. All the examples that I have, I made on a laptop I had, and then a desktop that I had. I share them across two environments. The laptop was all simulation. The desktop was a real SGX enclave.
Montgomery: Something that hit me was you mentioned hardware support modules and availability. I'm curious as we have started to see chipmakers embrace this, the enclaves and other types of support modules themselves? Has that actually changed so that the viability of things like hardware support modules, and some of the cloud vendors and other things are declining, or do you expect them to decline? If you're looking at this as something that you need to adopt, a hardware enclave, should you just stick to the chips or should you see what other things are out there?
Bedra: I think it's more of a design problem at the end of the day. If you have a fleet of thousands of machines in a fully networked environment that's very capable, I think the scale problem, HSMs are going to solve that in a more meaningful way. You have the network HSMs with 10,000 machines, yes, it's going to be way easier to deal with getting all of that out the door quickly and easily. I think that's going to be a big thing. If you're designing an IoT system, I think it's a whole different conversation. I think the architecture of your system, of your platform should really dictate how you look at solving this problem. I think that to me is going to be the winner in terms of how you proceed.
Montgomery: That's an interesting point about the performance angle. I hadn't thought about that from the HSM's perspective that it could potentially be a way to at least claw back some performance gains with that.
In terms of the performance side, is there a rule that you would look at in terms of what hit you take by using the enclave approach, as opposed to what you normally get from specialized instructions?
Bedra: Yes, you're going to take a couple different hits, one is marshaling. You need to marshal in and out of the enclave. The size of the data you're putting in and out is going to make a difference. You're going to take the cost of marshaling, whereas maybe you had that in a hot path that you had it loaded into cache. You have different things. You're going to basically take the cost of the context switch, the protection ring jump, the marshaling, and then the enclave instruction set, which is just not going to be as optimized. It's meant to do a very specific thing in a separate protected set of memory that may not be as fast because it's hardened. You've got all those problems working against you. I think measurement is the best thing you can do, is really measure what does your particular implementation give you? There's nothing I would recommend other than measurement, and really beat this thing up. Make sure it does exactly what you think it does. Get really good predictive benchmarks, and then design from that and figure out where you want to go from there.
Montgomery: That's always a good piece of advice, always measure when you can, guess when you have to, is usually what I think about. That's good.
Going back to Intel and SGX, for a second. Given some of the recent history of SGX, and other chipmakers history of vulnerabilities and patches and things like that. Are we placing a lot of trust in the chipmakers to get this stuff right? Of course we are, and we're trusting them with something. I'm going to follow up this with the gatedness of the vendors. It's probably the safe and right bet, but is there a way to lower this risk that we are taking in some way?
Bedra: Yes, we're putting a lot of faith in vendors. Much like we put a lot of faith in the hardware security module vendors. There used to be a lot of them, and they keep buying each other. If you're thinking about vendor trust, the vendor trust is actually shrinking in the HSM space, as everybody buys each other and competing. I think there's only a couple left. There might be a lot of different model names in the universe of options, but it's really controlled or owned by a couple companies. I think even that space, the trust is, in my opinion, eroding. You want to have some versatility and security. The diversity of implementation is a big win in terms of versatility and security. Huge thing. Open enclave, or some of the uniform APIs that people are providing across different enclave vendors, I think is probably the best way out of this. Where you're saying, I have an enclave, here's what it looks like. I'm going to let you plumb the pieces here and there, the different environments, but I have one design on my system. You can run inside of AMD SEV. You could run inside TrustZone. You can run inside of SGX. It doesn't really matter, we don't care. Design and run across a multitude of things so that if you have one hit or one environment that's compromised, you might be able to swing all of your resources to another while you're patching and then swing back. I think that's probably the better way out of this.
Montgomery: I would agree with that as well. I think that one of the ways to lower risk in the enterprise is not let yourself as a system designer or system implementer get tied into too much of vendor lock-in. Sometimes you have no choice, or your choices are somewhat limited, and so lowering the risk by using a more standardized API, or at least de facto API that can do a breadth attack. That I think does lower risk, and probably is the only way that we can lower risk.
Bedra: Unfortunately, it's one of those things too, where if you have any optimized code, it's not going to work here. If you have any instruction set specific, so assembly code you've written, it's not going to be viable across there. You are going to have a tradeoff, there is a hit. That idea is worth recognizing, but I think for the sake of just flexibility, there's probably a better way out there.
Montgomery: In terms of the gatedness of the vendors themselves, doesn't this mean that the vendor is now a trusted party in a way, because you're giving them an awful lot of information? Which, honestly, if I think about it from a security model perspective, seems unnecessary. Why does the chipmaker need to be involved in this? Just additional thoughts you may have on that.
Bedra: Yes, it is providing more information than I would otherwise want to, to a vendor. You're signaling, I have something worth protecting. I think that's the first thing. I understand some of their positioning here, which is, we're going to control the universe that way you can rely on some root of trust, some anchor that says, only these blessed or signed encryption keys may push into the enclave. We can control who can run and not run on the enclave. They're trying to provide a mechanism so you can at least say, I can trust the enclave. I understand the position here. I don't know that I agree with the whole implementation, the whole idea, but I understand at least where they're coming from.
Montgomery: It's interesting that as we move to something like secure enclaves to give us more trust, in effect by doing this, we're changing the attack surface. Like from a warfare perspective, if you put more defenses in one area, that means that you might get attacked in another area. Do you see other attack surfaces that maybe now haven't been as potentially hit as hard being more vulnerable to attack because they're now being focused?
Bedra: Absolutely. It's asymmetric warfare in general. The attackers have 100% of their time to attack and you have some very small fraction of your time to defend, unless your entire business is security and protection. You have a business to run, and that's where you need to spend all of your time. Your focus should be there. You're going to spend less time defending than they are attacking. When you change the threat model, you have to pay attention to what is now at risk. The enclaves actually give you a smaller defense area, but I very much expect as these become more popular for the Spectre style execution attacks to become more prevalent.
There was another version of Rowhammer that came out as well, depending on how the enclaves are using memory. I don't think it works in this world, I think it is protected well. I don't think any of the Rowhammer attacks have exposed that quite yet. It doesn't mean that it can't, down the road. A very important piece is, what do the other types of attacks mean to us? You have to think about it. You have to think back and say, I'm doing this now because Rowhammer and the use of memory, or flipping memory bits affects my enclave. You have to think about those things versus what you've otherwise done by shipping data in and out of an HSM.
Montgomery: Are there equivalent technologies available for software that run primarily on graphics cards, FPGAs, and ASICs?
Bedra: In the world of ASICs and FPGAs, you only get what you design. It's a very specific thing. Unless maybe the vendor is going to produce a enclave that you can then program into. I think, maybe that's probably more the FPGA world than the ASICs world. In the ASICs world it's like, here is a chip, you're going to burn it and you're going to run it. There's not much flexibility there. They're great at performance, but you're going to sacrifice a lot in the process. If you're designing an ASIC, you probably want to have a companion chip on the board you're designing that's doing the enclave specifically. I think you have to think about how you're going to design that. Of course, graphics cards are not going to have those. Graphics cards are designed to go fast. That's all they do. They go fast. They do very complicated mathematical operations in parallel. I really don't think there's probably a place for enclaves in the graphics card. I think that's going to be a separate world.
Montgomery: If someone is looking at application security best practices around enclaves right now, what do you recommend they concentrate on? I know you gave a nice set of links and everything else. I thought I'd just let you have the last word on where to go to here.
Bedra: The most critical thing here is, in your application, preventing the side channel. As you're marshaling data, making sure that you're not leaking the key material. The key material goes into the enclave sealed. It stays sealed. You're never exporting key material out. The idea of the OCALL versus ECALL. The sensitive data needs to always happen in the ECALL. Then what's shipped out in the OCALL boundary needs to be vanilla plain, not secret. You want to make sure you're always keeping those separate. That's the best piece of advice I can give is really draw very clear lines. If you're using a type language, make sure it fails to compile if you're trying to ship or marshal data in one boundary versus another. It just can't do that. Make it a compile error. Make it something that just crashes your program. Make sure you stop that at all cost.
See more presentations with transcripts