NoSQL databases offer alternative data storage options for storing semi-structured or unstructured data compared to the traditional relational databases. Though the NoSQL database implementations have been getting lot of attention lately, the security aspects of storing and accessing the NoSQL data haven't been given much emphasis. This virtual panel discussion article focuses on the security considerations and best practices in accessing the NoSQL databases.
Panelists:
- Ben-Gurion University Team (BGU): The BGU team includes:
- Lior Okman
- Nurit Gal-Oz
- Ehud Gudes
- Yaron Gonen
They have recently published a research paper "Security Issues in NoSQL Databases" in proceedings of the IEEE TrustCom-11. Nov 2011, Changsha, China.
- John Heasman: John is General Manager at NGS Secure. His areas of expertise include network and application penetration testing, vulnerability research, black-box testing and source code review.
- Emil Eifrem: Emil is the founder of the Neo4j graph database project and CEO of Neo Technology.
Questions:
- What all security vulnerabilities the application developers need to be aware of when using NoSQL data stores?
- What types of NoSQL implementations (e.g. Document based, Graph based etc) are prone to different types of security vulnerabilities (Connection Pollution, JSON Injection, Key Brute Force etc.)?
- What is the current state of security in NoSQL space in terms of authentication, access control, and encryption?
- What tools and techniques are available for architects and developers to "build security in" to their applications that are accessing the data stored in NoSQL databases?
- What are the advantages and disadvantages of using NoSQL solutions compared to relational databases (RDBMS) when it comes to security?
- How does the security validation efforts in NoSQL databases differ from those in traditional DB and application development which typically include automatic static and dynamic code analysis, penetration testing and manual security code reviews?
- What are the security best practices and gotchas in designing and implementing NoSQL data based applications?
- What are the emerging trends in NoSQL database security space?
InfoQ: What all security vulnerabilities the application developers need to be aware of when using NoSQL data stores?
BGU: The security vulnerabilities that application developers need to be aware of when using NoSQL data stores are exactly the same as the ones that applications based on relational databases need to be aware of: insufficient or ineffective input validation, errors in the application level permissions handling, weak authentication, insecure communication, illegal access to unencrypted data, etc. Most web-facing software today uses middleware and effectively hides the database (regardless of type) behind a thick application level. Permissions are rarely handled in the database level, usually the database is not aware of the specific user actually being served by a connection, due to the use of connection pooling. In some cases, a relational database's ACID capabilities are also ignored (e.g. when using optimistic locking in ORM frameworks) in order to increase concurrency and allow horizontal scaling on the middleware level. In this sense, the distinction between RDBMS and NoSQL data stores really isn't that well-defined.
In spite of the above general statement, we identify three major security vulnerabilities that should be addressed by application developers that use NoSQL data store: insecure connection between web application and database, insufficient support for special authorized users (e.g., DBA) and insufficient authentication.
In addition, there are vulnerabilities that integrators and architects need to be aware of when using NoSQL data-stores. A NoSQL data-stores' built-in horizontal scaling capabilities (i.e. clustering, sharding, replication) opens a new attack surface where the data-store is vulnerable to node impersonation attacks. Most cluster solutions don't really provide a robust mechanism to counter-act that.
John: Many NoSQL databases are designed to work out of the box, i.e. post-installation there is little or no configuration required. Authentication is often not required. This is great for testing out a new product in a lab environment but obviously requires addressing prior to deployment in a production environment. Furthermore it's a misconception that injection attacks are not possible. Many NoSQL databases provide an interface that serializes objects directly into NoSQL storage (e.g. documents); if this interface is exposed directly to the user it can be abused.
Emil: There is no silver bullet. Data, however it is stored, will need to be guarded. The shape of the data doesn’t make much difference generally.
However, when working with data, Graph stores have the advantage that they don’t need any accompanying map-reduce (or similar) framework to represent business information, since the database itself contains rich information without the need for post-processing. That means there are far fewer moving parts to coordinate, and therefore a smaller attack surface compared with, say, Hadoop.
With that said, because it is possible to write queries which mutate data, care should be taken to decouple user-level searches from the queries which fulfill the requests.
InfoQ: What types of NoSQL implementations (e.g. Document based, Graph based etc) are prone to different types of security vulnerabilities (Connection Pollution, JSON Injection, Key Brute Force etc.)?
BGU: There are tens of different implementations of NOSQL databases. Our study consists of two types of NoSQL implementations, namely Column based and Document based. The document based databases include MongoDB and CouchDB which address security in a different manner. Therefore we find it difficult to classify security vulnerabilities according to implementation types.
In general, any piece of software (including the simple “hello world” program) can be subverted to do something it was not intended to do. As complexity increases, the attack surface of an application increases with it. Any network oriented software is vulnerable to connection pollution and MITM (proxy) attacks. Any software that requires input from a user is vulnerable to data manipulation attacks from the user, from SMTP software (SMTP headers) to RDBMS (SQL injection) to NoSQL systems (JSON/BSON/JavaScript injection attacks). In this sense any NoSQL data store that has a query interface is exposed to injection attacks.
Emil: We're not vulnerable to any of these things since: we’re not a JSON store (though we can store JSON), and since we’re embedded, the app (and its security infrastructure) can help.
If you have access to the graph store, you can however corrupt the contents in a way that is unhelpful to applications/users. If you have access to the file system, it is possible (though difficult in practice) to directly corrupt the database, including re-writing parts of the graph.
InfoQ: What is the current state of security in NoSQL space in terms of authentication, access control, and encryption?
BGU: There is no standard in the NoSQL space for authentication, authorization and encryption. Generally, there is the understanding that these things are required, but there is no recognized “best-way” to do this yet. The current best-practices in the field usually involve placing the security in the middleware layer, and ignoring security on the cluster level.
Specifically in our study, we address authentication and authorization for Cassandra MongoDB and CouchDB.
Cassandra exposes the IAuthenticate interface with two out of the box implementations. The AllowAllAuthenticator basically disables authentication while the SimpleAuthenticator allows passwords to be passed as plaintext ad stored as plaintext or unsalted MD5. The SimpleAuthenticator is not considered production ready yet.
In Mongo prior to version 2.0 which was recently released, authentication was supported only if the cluster is not sharded which is actually meaningless.
CouchDB provides three authentication handlers out of the box: HTTP basic authentication, Cookie-based and OAuth protocol based authentication.
Authorization is handled in Cassandra at the Column Family granularity level. The current set of permissions includes the READ and WRITE permissions. Cassandra provides two implementations of IAuthority interface. The first is a pass-through implementation that always allows full permissions, regardless of the user, and the second uses a flat Java properties file to allow matching permissions to usernames.
The Mongo data store supports two types of users: read-only and read-write at the database granularity level. CouchDB provides a very limited version of role-based access control at the database granularity level.
Data at rest is unencrypted in all three data stores. In MongoDB encryption is available for Intercluster Network communication but not for client communication. In couchDB SSL (https) is supported in version 1.1.0 and up for both Intercluster Network communication and client communication.
Emil: Though not available natively, these services can easily be integrated with Neo4j. Access control is a particularly natural addition to any application domain, because it is easy to directly annotate business entities with permissions information. A common practice is for the entire ACL to attach directly to the domain within the graph, making authorization checks simple.
InfoQ: What tools and techniques are available for architects and developers to "build security in" to their applications that are accessing the data stored in NoSQL databases?
BGU: We have found no tool that provides "built-in security", such as Virtual Private Database (VPD) for RDBMS. However, we identified that the ease of creating new namespaces in the NoSQL world can be used to offer such security. For example, read-only documents on one namespace, and read-write ones on another.
Most middleware software includes support for authentication, authorization and access control. In Java, you would use the JAAS framework, or the Spring Security framework, or the Apache Shiro framework, or the J2EE framework, or any of the other libraries available. Additionally, in NoSQL clusters that utilize HTTP as a transport, security can also be validated on proxy servers and load-balancers in the middle of the way. This has nothing to do with NoSQL databases, as much as it has to do with an application design that fits the problem being solved. The current "best-practice" is to place the security at the middleware level.
Emil: Embedded - When deployed as an embedded database, security is just part of the application, therefore anything can be used in order to maintain consistency with corporate standards.
Server - Developers can simply implement Security Rules to plumb in anything they need to maintain consistency with corporate standards.
InfoQ: What are the advantages and disadvantages of using NoSQL solutions compared to relational databases (RDBMS) when it comes to security?
BGU: Since most application developers using NoSQL datastores usually place the security handling in the middleware, most NoSQL databases don't provide any support for enforcing it explicitly in the database. This is not as huge an advantage of the older relational model as it might seem – the ability to enforce security in the database level is usually ignored. In most applications that I've seen, the security layer in the database is not utilized because the developer wants to use connection pooling, and re-authenticating a connection after it was already established is not supported by most databases (ORACLE supports it via “Proxy Authentication”). NoSQL solutions are usually more cluster oriented, which is an advantage in speed and availability, but a disadvantage in security. The problem here is more that the clustering aspect of NoSQL databases isn't as robust or grown-up as it should be.
However, when accessing the DBMS directly and not via an application, RDBMS has a huge advantage over NoSQL due to the availability of built-in security measures.
John: NoSQL databases are in general less complex than their traditional RDBMS counterparts. This lack of complexity is a benefit when it comes to security. Most RDBMS come with a huge number of features and extensions that an attacker could use to elevate privilege or further compromise the host. Two examples of this relate to stored procedures:
1) Extended stored procedures - these provide functionality that allows interaction with the host file system or network. Not only are many of these dangerous in themselves (e.g. xp_cmdshell on SQL Server allows admin users to execute operating system commands) but they have also had a high number of security problems such as buffer overflows.
2) Stored procedures that run as definer - RDBMS such as Oracle and SQL Server allow standard SQL stored procedures to run under a different (typically higher) user privilege. This is analogous to a "setuid" program in Linux. There have been many privilege escalation vulnerabilities in stored procedures due to SQL injection vulnerabilities.
One disadvantage of NoSQL solutions is their maturity compared with established RDBMS such Oracle, SQL Server, MySQL and DB2. With the RDBMS, despite a checkered security past discussed later, the various types of attack vector are well understood and have been for several years. NoSQL databases are still emerging and it is possible that whole new classes of security issue will be discovered.
Emil: For supporting authentication and authorization, graphs are clearly superior to RDBMS/KV stores because security policies can be interconnected with data. Users may belong to many groups, and have individual access rights to various levels of systems. Computing the aggregate authorisation is hard even with trees like LDAP, but in a graph it’s straightforward.
InfoQ: How does the security validation efforts in NoSQL databases differ from those in traditional database and application development which typically include automatic static and dynamic code analysis, penetration testing and manual security code reviews?
BGU: From the application side there is no difference. Every open port and input provided to the application needs to be checked. The difference is in clustered environments – more work needs to be done making sure that the cluster itself is not vulnerable to impersonation attacks or other cluster-level data-theft attacks.
John: NoSQL databases are fairly new technologies. New technologies invariably contain vulnerabilities. If you look at the security of traditional RDBMS, they have had a very checkered security history. Almost all of the major RDMBS have had serious issues leading to unauthenticated remote code execution at one point or other. While the rate of security vulnerabilities has generally fallen, some such as Oracle still regularly publish updates to resolve serious issues.
NoSQL databases have themselves had some scrutiny but nowhere near the level of that Oracle, SQL Server and MySQL have had. The Memcached developers, for example, resolved several integer overflow issues in 2009 that could have led to execution of arbitrary code. I expect there to be a wide range of issues in NoSQL databases and I expect many of these to be discovered in the next 3 years as NoSQL databases gain in popularity and attract attention from security researchers. It certainly helps that a large number of NoSQL databases are open source and so the code can be downloaded and analyzed.
Emil: Such facilities are not yet available in Neo4j.
InfoQ: What are the security best practices and gotchas in designing and implementing NoSQL data based applications?
John: It is possible to inadvertently make an application susceptible to "NoSQL injection" by accepting user input and serializing it into the NoSQL storage format directly. In order to avoid this, developers should ensure they call the recommended functions for creating or modifying entries instead of taking shortcuts and should always validate input.
NoSQL databases such as MongoDB are designed to work out of the box so developers can get up and running very quickly. A default MongoDB database will not be configured for authentication. While this is great for speeding up the initial stages of application development, security concerns such as authentication and authorization should be addressed well before it is time to deploy the application to a production environment. Access to the NoSQL database should also be reviewed; it is not uncommon on a penetration test to find an instance of a key-value store such as Memcached accessible remotely (and without requiring authentication).
Emil: As with any database, the application should insulate the user from direct data access. Avoid exposing database identifiers and decouple any user-level search requests from query expressions.
InfoQ: What are the emerging trends in NoSQL database security space?
BGU: Security in NoSQL data stores is a major concern among developers as can be learned from professional blogs. Reading these blog we realized that developers actually avoid using NoSQL databases (see e.g., https://jira.mongodb.org/browse/SERVER-1105). Consequently we can see that as more developers join the NoSQL community, vendors attempt to meet their security needs continuously and provide more inherent solutions to the basic requirements such as authentication and authorization.
John: Microsoft recently announced their intention to port Hadoop to Windows and the Azure cloud. Security researchers from iSec Partners have already presented on the Hadoop design at the Black Hat USA security conference in 2010 but it is likely that this announcement from Microsoft will garner additional interest.
Emil: In the past, an enterprise had to balance access to a database with security concerns, but today data storage is always accessed through a domain-aware persistence layer. Rather than managing security on multiple levels, the responsibility for authentication and access has migrated up the stack to the application level, allowing a unified approach. This is true even for web applications accessing an RDBMS – there is typically a single “application account” for access to the database. All users and permissions are otherwise managed by the application itself.
About the Panelists
Lior Okman received his B.A. from Tel-Aviv Jaffa Academic College, Israel, and is currently studying towards his M.Sc. from the Open University of Israel. He has 15 years of experience in the industry as a Java and databases consultant and developer. Currently, he is working as a software team leader at a start-up company developing a network visibility product.
Nurit Gal-Oz received her B.Sc. and M.Sc. (Hons) degrees from the Department of Mathematics and Computer Science at Ben-Gurion University, Israel. Prior to her Ph.D. studies she headed the R&D of several companies in the software industry in various domains including timetabling solutions, business intelligence and distributed web-based applications. Currently she is a postdoctoral researcher in the Deutsche Telekom Laboratories at Ben-Gurion University. Her current research interests include Trust and Reputation systems, Privacy, Database security and Data mining with special focus on role mining.
Ehud Gudes received his B.Sc. and M.Sc. from the Technion - Israel Institute of Technology, and his Ph.D. in Computer and Information Science from the Ohio State University in 1976. Following his Ph.D., he worked both in academia (Penn State University, Ben-Gurion University, Florida Atlantic University), where he did research in the areas of Database systems and Data security, and in Industry, where he developed Query languages, CAD software, and Expert systems. He has published over 120 papers in the above general areas, and was the chair of several international conferences, including the 2002 and 2009 IFIP WG11.3 conference on Data and Application security. He is currently a Professor in Ben-Gurion University heading the Computer Science Department and leading several research projects in data security. Ehud Gudes is a member of both ACM and IEEE Computer Society, and an active member of the IFIP WG11.3 group on Database security. His research interests encompass the domain of knowledge and databases, data security and Data mining especially Graph mining and sequence mining.
Yaron Gonen received his B.Sc. from Bar-Ilan University, Israel and his M.Sc. from Ben-Gurion University, Israel. Before his studies he worked as a DBA and as a software team leader in the information systems world and between the degrees he worked as a senior software developer at a Start-up company in cellular technology. Currently he is a Ph.D. student under the supervision of Prof. Ehud Gudes. His current research interests include data mining, cloud computing,NoSQL databases and the MapReduce framework.
Emil Eifrem is the founder of the Neo4j graph database project and CEO of Neo Technology. He created a text role-playing game (that is still being played 15 years later) in C, but is better known for being a developer, an evangelist, mentor, and consulting architect for graph databases while preaching the demise of tabular solutions everywhere.
John Heasman is General Manager at NGS Secure and joined the organization as a Senior Security Consultant in 2003. He has a Master's Degree in Engineering and Computing from Oxford University. He has invaluable experience in network and application penetration testing, vulnerability research, black-box testing and source code review. He has released numerous advisories in enterprise-level software, including Microsoft Windows, Exchange Server, PostgreSQL and Java. He is frequently interviewed in the IT security press and is considered a subject matter expert on hardware attacks and rootkits, an area in which he has previously presented ground breaking research. John has recently redeveloped NGS Secure's web application security training course to include over 100 labs mirroring real world vulnerabilities such as SQL injection; it also includes several NoSQL-specific labs.