Key Takeaways
- Blockchain technologies can help with data governance in the areas of transparency and data provenance.
- Blockchain is used less for confidentiality and more for authenticity and integrity.
- Based on the combination of massive replication, well-designed incentives, and cryptography, Blockchain enables immutable data shared by all parties.
- Data management in Supply Chain systems is a great example where Blockchain is very useful. Keeping track of data between hundreds of inter-operating vendors is challenging and Blockchain helps these participants with creating data of their own and track related data of others.
- Companies who have data retention policies that require data to be destroyed after certain periods of time should typically not use Blockchain for those applications.
Seth James Nielson recently hosted a tutorial workshop at Data Architecture Summit 2018 Conference about Blockchain technology and its impact on data architecture and data governance.
Nielson said that blockchain is an immutable append-only ledger where data, once “entered” cannot be changed. Data can only be added to the end of the blockchain.
He talked about the differences between public and private blockchains. Public blockchain can have any participant. The entire system is decentralized and only consensus makes it work. Private blockchain has controls over who can participate. In a private blockchain, proof-of-stake or proof-of-authority can be used for consensus. Microsoft Azure and Ethereum supports this model.
Some of the use cases that can benefit from blockchain solutions are healthcare, supply chain, and IoT devices.
Nielsen also discussed the Smart Contracts in Blockchain technology. Smart Contract is code that's run as part of the distributed ledger. It's a mini-program that's executed when a new block is mined.
Blockchain is about eliminating trusted third-party verification and validation. It can help with the following aspects of data governance.
- Transparency
- Data provenance
- Incentives for interoperability
InfoQ spoke with Nielson about how Blockchain technologies are influencing the data architecture and database professionals.
InfoQ: What's the impact of Blockchain technology on Data Architecture?
Seth James Nielson: Truthfully, that remains to be seen. Some believe that we are in the early stages of a massive Blockchain revolution. Others think that the technology is an over-hyped fad, with little truly useful value. The key idea behind the technology is a completely distributed, append-only ledger. Because it is distributed, there's no central authority and, in theory, the parties can trust the authenticity of the data in the ledger without any third-party to prove it. By way of example, when you buy something at a store with a credit card, both the buyer and the seller require a third party (the credit card company) in order to execute the transaction. The concept behind Blockchain is that a combination of massive replication, well-designed incentives, and cryptography enable immutable data shared by all parties. Transactions, once inserted to this ledger, cannot ever be removed or altered. This enables parties to conduct business with each other independent of other organizations.
The problem is, this overly simple version isn't the whole truth. A trusted third party has been traded for a trusted crowd. Data in a blockchain can be altered if the majority agrees to the alterations. There are also number of interesting problems with scale.
In all fairness, the Blockchain community is aware of all of these problems and is working on neat solutions right now. But the problems have to be solved if the revolution is going to move forward.
InfoQ: What are some use cases where Blockchain can help with data security, data integrity and data management in general?
Nielson: Blockchain is a great way to create a trusted record among untrusted parties. Public Blockchains allow anyone to join, and the untrusted parties are people, organizations, and even governments. So long as there is no single coalition large enough to control a majority of the Blockchain resources, data integrity is guaranteed.
Data security and data management are much more complicated. Every member of the Blockchain must preserve and protect a private key. If that key is ever compromised by an unauthorized party, there is little that can be done to revoke the compromised key. Perhaps just as bad, if the key is lost (e.g., accidentally deleted), that user's access to the system is permanently lost as well. It is estimated, for example that 20% of all the Bitcoins in the world are lost in this manner.
Finally, by itself, Blockchain doesn't really offer much for data management. Rather, it enables new forms of data management. Supply chain is a great example where Blockchain appears to be having some great success. When you look at world-wide, complicated supply chains, keeping track of data between hundreds, or even thousands, of inter-operating vendors is extremely challenging. Creating a Blockchain for these participants to create data of their own, and track related data of others, is a fantastic fit.
One last side note about data management. One of the interesting effects of an append-only ledger is that data cannot be deleted! That leads to some interesting problems. Many companies have data retention policies that require data to be destroyed after certain periods of time or under certain circumstances. Data of this kind should typically not be stored in Blockchains.
InfoQ: Can you talk about some Blockchain design patterns in terms of data management, data encryption etc?
Nielson: I think my answer to your previous question is a start. Interestingly, data in public Blockchains is not encrypted by the Blockchain itself. It inserts whatever you give it. If you give it encrypted data, that encrypted data will always be stored in encrypted form forever (or for the life of the Blockchain, rather). If you store it in plaintext, it will be stored for the entire world to see for so long as the Blockchain is operational.
Blockchain is typically used less for confidentiality and more for authenticity and integrity. With that said, you can use the Blockchain to attest to all kinds of cryptographic operations. For example, you could put encrypted data on the Blockchain and then later insert the decryption key as well. This is a way of "proving" to the world that you had the data at a point in time, without revealing what that data actually says until a much later point in time. Perhaps the encrypted data is the solution to an exam question. By adding it to the Blockchain, you have evidence that you had those answers on a certain date even if you don't reveal the answers until later.
I know some great cryptographers at Johns Hopkins that are doing all kinds of interesting things with this kind of algorithm. But it is still in its infancy, so we're still learning all the patterns.
InfoQ: What is a “smart contract”? How can “smart contracts” be used when developing applications based on Blockchain technology?
Nielson: Conceptually, smart contracts are mini programs that execute automatically upon certain triggers or conditions. In the Blockchain world, certain Blockchain technologies, notably Ethereum, support code that executes as part of inserting a new transaction.
But it's critical to remember that Blockchain is a distributed ledger. There are some limitations with Blockchain smart contracts because every machine in the entire network must execute the same code and get the same answer. Otherwise, the distributed ledger would not have consensus and integrity. Unfortunately, sometimes people hype up their explanations of what a Blockchain-based smart contract can do.
To get around the limitations of a smart contract, a certain portion of the processing typically has to be done "off chain" (outside of the smart contract itself). For example, one of the suggested uses of a smart contract is to automatically pay a farmer insurance money if there is a drought. It's a nice idea, but how does the smart contract determine the weather? It cannot use sensors, websites, or any other weather sources directly because every node in the network would have to get the same result when executing the distributed code. That simply doesn't scale. So, instead, some kind of trusted third party has to push the temperature to the Blockchain as a piece of data. Once the smart contract can see this data within the ledger, then it is easy to process.
But some people point out that by going "off chain," you've undone many of the advantages smart contracts were built for in the first place. Decentralization, for example.
InfoQ: What online resources you can recommend for database developers who want to learn more about Blockchain?
Nielson: Unfortunately, I think that there's so much hype right now that the only good advice is to look for absolutely every "point of view" you can find. Make sure that you are looking for critical ones to balance out the overly positive. I've found Coindesk to be a pretty good source for a lot of information.
InfoQ: What development tools are available for the database developers?
Nielson: I'd actually recommend looking to some of the big players like Microsoft and IBM. Microsoft Azure, for example, is really looking to build out a Blockchain story. They have case studies, including examples related to supply chain, insurance, and others. Although it's Microsoft, they can connect into multiple backend Blockchains.
In the interest of full disclosure, my expertise is more in the realm of applied cryptography and data security. I haven't built products with these tools, so take my recommendations with several grains of salt.
One last thing. On my to-do list is to try creating some smart contracts using an Ethereum Simulator. This is the easiest way to try out some smart contract code without actually deploying to the actual Blockchain.
He also talked about the potential of Blockchain technologies in organizations in their current projects and initiatives.
Blockchain is really exciting, and I say that despite all the hype. This technology really has some interesting potential and it will be interesting to see how it evolves and develops. This year is the 10-year anniversary of the whitepaper that introduced Bitcoin and catapulted Blockchain technology forward. Even still, it's only been in the last couple of years that the research and development efforts have really exploded. I imagine that there are many advances yet to come and we really are just seeing very early prototypes of what this technology will eventually become.
But it's good to get in early and see how it works right now! You're just in time for a front-row seat to some paradigm-changing innovations. Don't wait!
About the Interviewee
Dr. Seth James Nielson is the Founder and Chief Scientist of Crimson Vista, Inc., a boutique computer security consulting firm. Dr. Nielson has consulted with everyone from small technology start-ups to large medical device manufacturers on matters of cryptography, computer security, and computer networking. He has also served as a testifying expert in various high-tech litigation. Dr. Nielson is also the Director of Advanced Research Projects at the Johns Hopkins University Information Security Institute (JHUISI). He teaches the network security and advanced network security courses with his own custom curriculum and lab work and mentors Masters students in capstone projects. Under a grant from Cisco, he is currently driving the development of a cryptographic knowledge base designed to help non-experts know how to correctly use cryptography in their organizations.