Last year I purchased the book Blockchain Revolution by Don Tapscott. Once some of blockchain's privacy and commerce kinks solve, the book posits that the technology will be positioned to revolutionize different aspects of society (e.g., finance, government, voting, etc.).
I've been thinking about blockchain as a storage system and have concluded that it is evolutionary and not revolutionary. It's an interesting thought exercise to trace storage evolution and draw your conclusion as to whether or not you agree. I've re-read some of my storage-related posts over the years and looked for a pattern of evolution onto which blockchain builds (like adding another block onto a chain).
I've concluded that blockchain is an evolution of a decades-long challenge to provide trusted storage platforms. It can also provide a surprising new twist: a data valuation protocol.
The first comment I would make is that storage functionality advances because applications evolve. New storage features typically emerge because new applications outpace them (e.g., they need more capacity, speed, scale, availability, etc.).
When I graduated from college storage was attached to the end of a cable. Applications drove the evolution from a single disk to multiple disks (RAID) by requiring increased capacity and performance. RAID made advancements to the concept of "trust by math". New approaches for data integrity bits, parity shedding, and non-volatile RAM provided better trust in more circumstances. The disk array era began. Cabinets filled with disk drives and increasingly large amounts of raw (block) data flowed onto these systems. For the first time, an application would ask for data on a failed disk and the data would have to be mathematically rebuilt. Storage users, therefore, trusted that when data from a failed drive was returned to them, the math had regenerated their data correctly. Once the market accepted that this form of trust worked well enough, the disk array era took off.
Over the years many layers of functionality were added on top of this layer of trust: mirrored-caching, storage area networks (SANs), multi-pathing, and networked-attached storage (NAS). The trust of the underlying mathematical layer was augmented by redundancy and availability. Failures could now occur at the disk layer, the processor layer, the switch layer, and all the way up to the application layer. Applications asked to receive the data quickly no matter what failed (redundancy/availability) with mathematical correctness in the face of disk failure (data integrity). The metadata traveling alongside this data also required trusted stewardship. More and more mission-critical data flooded onto these trusted systems until the requirements of applications again outpaced these architectures.
New application requirements insisted that all data was checksummed from the application down to the storage to preclude tampering, and in some cases, to enforce data retention and prevent deletion. These features resulted in the rise of content-addressable storage (CAS) and ultimately to the object-based systems that are still popular today. Object-based systems started shipping roughly fifteen years ago and perhaps the most remarkable characteristics of these types of systems were the extensive use of cryptography from the application layer down to the storage system. Systems like Centera became some of the most "trusted" systems in the industry (in fact the engineering team ended up achieving six-nines of availability to go along with the integrity provided by the cryptography).
All of the systems described above began to fill up beyond their capacities. Applications also became increasingly global and expected to read and write data from anywhere in the world. Worldwide availability requirements resulted in the creation of globally-scalable object stores that were driven by trust policies. For example, a copy of a video stored in Los Angeles could be checksummed, encrypted, and stored in four different locations around the world. One of the first storage systems released by EMC with this capability was Atmos (and eventually ECS). Now these globally distributed, trusted data stores are more commonplace.
If we consider blockchain as the next evolution of trusted storage, can't we just think of it as a variant of globally-distributed object stores? While we need to recognize that a blockchain doesn't store large data sets like an object store does, we can certainly draw some comparisons:
- Applications insert data into a blockchain from anywhere in the world.
- That data is immediately checksummed.
- The data is accompanied by identity metadata which must also be preserved.
- The data is replicated to multiple, global locations.
- The data is tamper-proof and cannot be deleted.
- Applications across the world have access to the data.
There are, however, a couple of evolutionary additions that blockchain brings.
First of all, every blockchain transaction has to be digitally signed by the owner via a private key. The storage systems described above don't work that way. This feature permanently associates each storage entry with a specific private key. This enables a wide variety of use cases, not the least of which is data ownership.
The second interesting addition is the insertion of data value into blockchain's storage protocol.
This statement is subtle. The very first blockchain (the Bitcoin blockchain) transported value in the form of a cryptographic token (subsequently associated with fiat currencies). Newer blockchains can now transfer data AND tokens. The industry can now start building a trusted data exchange platform for digital business.
These data exchange platforms may or may not embed value transfer directly into their protocols, but the trust inherent in global ledgers will remove friction and unlock value to the bottom lines of next-generation digital companies that participate in blockchain ecosystems.
Value transfer, when combined with the concept of data ownership, means people can start getting paid directly (and sometimes indirectly) paid for their digital assets.
As a proof point for this trend, one only needs to look at the research for an internet standard for value transfer at the Digital Currency Initiative at MIT Media Lab
There is, however, a problem that still exists. There is currently no blockchain storage implementation that can scalably carry forward the performance, availability, resiliency, and trust represented by the decades of evolution described above.
Fortunately, some of the world's experts in building scalable blockchain implementations work within Dell Technologies. They have developed new algorithms that will bring enterprise-classs storage to the blockchain applications that need them (I will describe these algorithms in upcoming posts).
These algorithms, of course, can't exist in a vacuum. They must be packaged and deployed into a solution that runs alongside their existing mission-critical data assets.
We're working on that ;>)
Steve
Twitter: @SteveTodd
Hi Steve,
seems here is the example of the alternative to the cloud storage for the coolest data.
https://www.enterprisetech.com/2018/03/12/docker-founder-joins-blockchain-storage-startup/
This might be the storage of enterprise level (low tier). The idea definitely will evolve and the main point of improvement is the data security (at least the users should trust the model ;).
Posted by: Vitaly Kozlovsky | March 23, 2018 at 08:05 AM