Key Takeaways
- The EU’s GDPR is the most forward-leading privacy regime on the planet, with fines of up to four percent of global revenue
- According to a study, a whopping 96 percent of companies admit to not understanding the GDPR
- GDPR is about data privacy (when, what and why data is accessible), and that means the classic encryption and role-based access controls that organizations have been relying on won’t cut it
- There must be a data abstraction layer to enforce the audit and privacy of the data
- Organizations need to divide their data governance model into three “buckets”: collection (focused on user accounting), usage (focused on managing access controls), and audit (focused on GDPR traceability requirements)
GDPR for Software Engineers
This eMag examines what software engineers, data engineers, and operations teams need to know about GDPR, along with the implications it has on data collection, storage and use for any organization dealing with customer data in the EU. Download Now.
The European Union approved a new General Data Protection Regulation (GDPR) in late April 2016, which will come into effect in May 2018. This regulation will require companies to adopt a comprehensive data governance approach, including data profiling, data quality, data lineage, data masking, test data management, data analytics and data archival.
The data protection reach will extend to genetic data, email and IP addresses, to name a few. Also, users will have finer-grained rights on the protected data kept by companies. Explicit consent will be required for an organization to use individual pieces of information such as email or phone number, and their combined use will also require explicit consent.
InfoQ talked with Immuta’s Andrew Burt, chief privacy officer and legal engineer, and Steve Touw, chief technology officer, to better understand the implications and challenges of the GDPR.
InfoQ: Can you try to summarize what GDPR is and why should organizations care about it?
Burt: The General Data Protection Regulation is the EU’s primary data governance regulation and applies to any business using data from EU data subjects. It is the most forward-leading privacy regime on the planet, with fines of up to four percent of global revenue. With such staggering fines, breaching the GDPR is a risk that many enterprises quite literally may not be able to afford.
InfoQ: So we're less than a year and a half away from this legislation coming into effect in the European Union. How prepared do you think organizations are to address it?
Burt: Not very. There have been some studies on this recently, and surveys of those charged with data privacy and security in large organizations illustrate that they’re just waking up to the risk they face. According to one such study conducted by Dell, about 70 percent of the professionals they surveyed reported that their organization was either not ready or unaware of how to prepare for GDPR. Only three percent reported having any type of plan at all. According to another study by Symantec, a whopping 96 percent of companies admit to not understanding the GDPR. And that says a lot. How can organizations prepare for the GDPR when they don’t understand it? 2017 is really shaping up to be the year that enterprises start to realize the compliance risk they face and start looking for comprehensive solutions.
InfoQ: What is the number one cost or penalty for organizations failing to comply with regulations?
Burt: The number one cost is the potential four percent fine on global revenue. To put that into perspective, global revenue for Apple was over $200 billion each of the last two years. So privacy violations under the GDPR could cost Apple upwards of $8 billion. By contrast, privacy violations in the U.S. frequently incur fines that range from hundreds of thousands to a few million dollars. The GDPR is more serious than other data regulations by orders of magnitude.
Aside from monetary fines, though, it’s also important to put the GDPR into perspective and ask: what is the regulation protecting? The answer for many businesses is trust — between them and their customers, between them and other business, and really throughout the data landscape. The cost of a breach isn’t just financial; it can cause serious reputational damage and lead to a breakdown in trust between the enterprise and the consumer.
InfoQ: What are the biggest challenges for compliance, from a technical perspective? Will they require new technologies and processes to be developed? And how will those be retrofitted or integrated with existing solutions?
Touw: Some misconstrue the GDPR to be all about data security and data breaches. While data security is a big part of the regulation, it’s actually a very small part of GDPR. GDPR is really about data privacy, and that means the classic encryption and role-based access controls that organizations have been relying on won’t cut it when it comes to protecting privacy. Instead, organizations need to measure analytical “bang” against compliance “buck,” and this requires measuring the analytical purpose of the data: how it’s been used, for what purposes, and towards what gain. This is not as simple as aligning database roles to all possible purposes — that would be an implementation nightmare across all organizational data silos.
Instead, there must be a data abstraction layer to enforce the audit and privacy of the data, and that’s what Immuta has focused a lot of our energy and time on. Imagine, for example, an analyst in the morning working on one task, and seeing personally identifiable information because it’s relevant to the purpose of their task; then, in the afternoon, the analyst switches tasks and is now seeing an anonymized version of the same data, through the same connection, for a different task because to see that level of detail isn’t required or authorized. That’s the type of regime we need to have in place for analysts across the enterprise.
InfoQ: In the past we already saw strong commotion when the EU forbad citizen's personal data being transferred outside the EU. How do you compare that to the scale of changes required by the new GDPR?
Burt: The GDPR builds off of the Data Protection Directive, which is the current regulation that applies to EU data. In general, the GDPR is very similar to the Directive in its approach to privacy, but it has much sharper regulatory teeth. Regarding data transfers outside the EU, this may be one of the few areas in the GDPR where it makes things easier for many enterprises. The GDPR spells out a number of additional conditions that allow for data transfer under the GDPR that didn’t exist under the Directive. Under the Directive, something referred to as “standard contractual clauses,” which are EU-approved contractual clauses used to establish a certain level of safeguards between data sharing parties, required the approval of state data protection authorities; under the GDPR, however, no such approval is necessary. And there are a host of other modifications when it comes to transferring data between parties.
InfoQ: So we're not just talking about changes in data storage, but also data governance and even the actual roles and structure inside organizations?
Touw: Yes, while data storage is a component, it’s more about the decisions on when, what, and why that data is accessible from the storage system. Typically, data is collected for an original purpose. Later, however, the data “exhaust” created from that original purpose ends up being valuable for other revenue streams. Under regulations like the GDPR, however, you may not be allowed to use that data for other purposes. And so it’s important to inject users into the data workflow who understand these different regulations and can embed them within the workflow. And the GDPR is also very specific in mandating that data-intensive organizations have what’s called a “data protection officer”, who is empowered to make sure the regulation is being adhered to.
Burt: At Immuta we’ve focused on developing an interface in our platform specifically for these types of governance personnel, to ensure that the right data is always used for the right purposes and is only seen by the right people in the right state. These controls can spur governance personnel to ask questions like: have I anonymized the data sufficiently to use it for another purpose? Or: is the purpose I intend to allow the data to be used for a legitimate purpose, as my organization defines it? These can be gray areas that require humans, in some cases, to make these decisions and can also require technology to then enforce and easily and rapidly audit these decisions. If these people make the wrong decision, or if their decisions aren’t implemented by technology, heads will roll.
InfoQ: Do you think online companies that operate in the EU but are based outside it are aware of the implications for them as well?
Burt: As a whole, we’re seeing companies just starting to wake up to the risk burden created by the GDPR, both inside and outside Europe. Firms everywhere are really only now beginning to realize the extent of their liability, and because the Eurozone comprises the largest economy in the world, the GDPR really applies to any company that considers itself global.
InfoQ: So given the regulation aims to provide users with finer-grained control of their data (including retrospectively) than we've ever seen before, how can organizations effectively and efficiently "translate" those requirements into a clear plan of action that is feasible yet flexible?
Touw: The first step is to divide your data governance model into three “buckets”: collection, usage, and audit. Each “bucket” needs to sync with the others. Collection needs to focus on user accounting: what user is stored where, what has the user consented to, and how do I capture consent. Usage is the most complex area in that it needs to focus on managing access controls in a consistent way, across data silos. Usage also needs to regulate when company personnel can access specific data, how will it be perturbed, masked, or restricted, what purposes it can be used for, and more. Finally, auditing requires capturing all requirements of the GDPR in terms of “data actions” so that you can build reports, sometimes within a relatively short period of time, such as when a data subject demands to see how their data is being used in your organization.
InfoQ: Your company, Immuta, has been actively trying to tackle some of the challenges with GDPR and data protection in general. Could you tell us a bit more about that work?
Touw: Immuta is focused on enabling data science; if the data scientist isn’t happy, enterprises aren’t going to get the most out of their data. Our software platform focuses on giving data scientists a unified view into all the data they’re allowed to see and use and makes that experience as easy and as empowering as possible. Under the hood, though, we’ve put a lot of effort into embedding laws and policies into the software so that data usage policies are applied to the data on-the-fly, as it’s queried, without the data scientist needing to stop what they’re doing and focus on the rules. In order to do this we provide a customizable interface for governance personnel, where they can set rules and policies across their organization and between user groups and regions.
Burt: Oftentimes, when people think about data protection policies, they think it’s just about who can see what data. But GDPR and other regulations require so much more. Immuta’s platform can enforce not only access rules (who can see what) but also purpose-based restrictions on data, which only allows users to use the data for specific reasons, or specific periods of time, when certain conditions are met.
InfoQ: What advice would you give, both from a technical and a business perspective, on how to cope with these new regulations?
Burt: Start planning now. Behind the daunting rules in the GDPR are privacy and security requirements that companies will be able to meet as long as they take their planning seriously. This will require developing a roadmap both for how your organization is going to protect the data it uses, from both a collection and a usage side and for what types of technical solutions you’re going to implement to ensure you keep that data secure. You’re going to want to use and invest in solutions that are secure, that are proven, and that can empower individuals in your organization and ensure compliance, without getting in the way. This technical solution aspect is very much what we’re focused on at Immuta by embedding laws and policies into our software platform so that data scientists can concentrate on maximizing the value of their data, rather than worrying about the rules.
About the Interviewees
Steve Touw is co-founder and CTO of Immuta, a unified data platform for the world's most secure organizations. Steve has a long history of designing large-scale geo-temporal analytics across the US Intelligence community, to include some of the very first Hadoop analytics and frameworks to manage complex multi-tenant data policy controls. This experience drove he and his co-founders at Immuta to build a software product that frees data science teams to access and work with high-value data.
Andrew Burt is Chief Privacy Officer & Legal Engineer at Immuta, where his focus is on automating regulatory compliance within big data environments. He is also a visiting fellow at Yale Law School’s Information Society Project. Previously, Andrew served as Special Advisor for Policy to the head of the FBI Cyber Division, where he served as chief compliance and privacy officer for the division, and as lead author on the FBI’s after action report for the 2014 attack on Sony.