I love talking/writing about the process of innovation, and looking for patterns resulting in the invention of products that bring benefits to customers.
At the time of my retirement I was in the final stages of delivering the last invention of my career. We called the idea a Data Confidence Fabric (DCF).
I thought it would be interesting to explore the evolution of the idea, how it advanced into working code, and how it eventually made it to market.
Here is the history:
- 1986-2011: The first 25 years of my career could be summarized as "inventing products that deliver trusted data to applications." I enjoyed solving the myriad of problems that our customers faced as digital transformation became an essential part of any company. These customers increasingly deployed more and more applications that directly impacted the balance sheet in the form of revenue, operational savings, or risk reduction.
- In the 1980s, delivering trusted data to applications meant moving the data directly from the disk to an application that was "nearby" (e.g., on the other side of a cable). I called this application nearness. Over time, the applications stretched themselves further and further away from where the data was stored. And then the applications started to move back towards the storage again. I became fascinated with the problem of how applications could trust their data in those scenarios where it was travelling a long distance.
- 2013: The Big Data hype was in full swing. EMC CTO John Roese and EMC Storage CTO Barry Burke began to discuss why customers might want to store their data on a Symmetrix ($10/Gb) versus an Atmos device ($0.50/Gb). They began classifying mission critical data as "vital data" (a term that never quite took off outside the company). But John wanted to test a hypothesis: does more valuable data belong on more expensive, mission-critical storage? And how does one calculate data's value? He decided to turn to an academic expert in the study of Data Value: Dr. Jim Short of UC San Diego.
- 2014-2015: Dr. Short and I started studying the monetary value of the data that was sitting on our customer's EMC storage systems. How much revenue or operating expense did a data set represent? How much monetary loss would a customer experience if they didn't handle their data properly? Every enterprise customer that Jim and I interviewed cared A LOT about how to calculate the value of their data. But not only did they have no idea what the value was; they had enough problems just trying to manage that data. Jim and I published a paper about our findings, and I also spoke publicly, wrote a white paper and a lot of blog posts on the topic. An industry colleague named Doug Laney had a set of equations that calculated data's value. But the research did not indicate that enterprise customers prioritized performing data valuation.
- 2015: Inside the company we began to understand that customers cared less about data value and more about data trust. In other words, trusted data is more valuable data. The linkage between data trust and value became very apparent. We decided to focus on trust first, and valuation/monetization later.
- 2016: I began studying blockchain with a goal of creating a white paper that represented Dell Technologies' technical point of view on the topic. I discovered that Dell.com used to accept Bitcoin when customers purchased Dell products. I studied how the Bitcoin blockchain worked underneath the covers.
- 2017: I worked in the Dell EMC Dojo for a month, studying several different types of Distributed Ledger Technologies (DLTs): Ethereum, Hyperledger, MultiChain, etc. I especially had an eye on how DLTs could help our customers increase revenue, reduce OPEX, and reduce risk. I began to understand the elements of blockchain and how that overlapped with Dell Technologies' product portfolio. I started to imagine using the technology as a mechanism to assist in the trustworthy delivery of data (e.g., a trusted data exchange mechanism). We published a white paper describing Dell's technical point of view on blockchain.
- 2017: IoT and Edge computing hype takes off. Dell forms an (version 1.0) IoT/Edge business unit by combining VMware and Dell employees. As I studied IoT and Edge computing, it became clear that "trusted data delivery" would be a complex challenge. The distance between an application and the source of its data would become ridiculously large. Very few people were talking about how an application could trust the data when it was crossing time zones, geographies, networks and vendors. Could DLTs play a role in the trustworthy delivery of data from IoT sensors to applications?
- 2018: I began to talk to customers about using blockchain/DLTs as a way to facilitate "trusted data exchange." I also noticed that as data was exchanged using DLT, the value of that data could flow alongside of it because transfer of value was part of the DLT protocol. And then it hit me: distributed ledgers are trustworthy storage systems at global scale. I studied VMware's blockchain consensus algorithm, and Dell Boomi's use of blockchain for data anchoring. We put these pieces together in the UC San Diego Lab and built a prototype (publishing a paper in the process).
- 2018: I decided to leave Dell and join the new IoT / Edge business unit. My new boss was Jason Shepherd. He, like me, believed that DLTs would be a critical technology on the Edge. Jason was one of the founders of the EdgeXFoundry open source IoT platform. It was in this new group that I met Trevor Conn, who was the Lead Architect of the Core Technical Working Group for EdgeXFoundry. Although we weren't on the same team at the time, I discovered Trevor also shared a keen interest in DLTs and wanted to write some code to understand it better. We began to imagine the use of a DLT to register data at its birth, and then prepare that data for monetization by moving it (in a trustworthy fashion) across an edge ecosystem, towards applications and perhaps data marketplaces. These initial ideas led to a white paper called Getting Started with IoT Data Monetization. We decided to call this "trusted data pipeline" approach a Data Confidence Fabric (DCF), and we began building an internal prototype. The prototype had a goal of assigning a "confidence score" to a piece of data. One of the key interns that helped us build the first PoC was Sulav Adhikari. Jason Shepherd reaches out to DLT company IOTA, and IOTA's Mat Yarger becomes the earliest industry partner and evangelist for DCF.
- 2019: We introduced the term "Data Confidence Fabric" to the industry for the first time. We told the industry that DCF brings trust to the edge in the same way that enterprise storage stacks brought trust to the data center. We also began thinking about open-sourcing DCF; there was no way one vendor could implement a sensor-to-gateway-to-edge server-to-core-to-cloud data pipeline. We started laying out the complexity of delivering trusted edge data as a pre-cursor for pushing for open-sourcing DCF. We completed the internal prototype and started approaching external partners to discuss the open-source potential. Chief among them was Intel and a distributed ledger company called IOTA. I reached back across the corporate boundaries to my old team (the Office of the CTO) and successfully managed to persuade them to approve the open-sourcing of Data Confidence Fabric.
- October 28, 2019: The LINUX Foundation announces Dell's intention to donate the DCF codebase to LFEdge. The code base (and the community developing it) became known as Project Alvarium. Dell supported and evangelized the idea by publishing a blog about our internal prototype: Building the First Data Confidence Fabric. We also published a white paper: Project Alvarium: The Future of Edge Data. Trevor proposed a scoring architecture which leveraged the use of an Open Policy Agent to create scoring policies and algorithms that were based on the trusted handling of the data (e.g., trust insertion points). He also proposed a "view model" that allowed for a faster calculation of the score (instead of querying a ledger, which would be slower). His algorithms would become the first scoring example in the open-source repository.
- 2020: The version 1.0 IoT/Edge BU folds, and the version 2.0 Edge BU starts within Dell. Trevor and I transfer back into Dell proper and a team of experts help harden the code and ready it for donation: Michael Estrin, Jeremy Phelps, and Jason Bonafide. As Project Alvarium got off the ground, we needed a distributed ledger partner to join the community and help build the first "industry" Data Confidence Fabric. IOTA was that partner and Mat Yarger and Dyrell Chapman (both now working at Demia) worked with Dell to integrate their IOTA Tangle technology into Dell's internal PoC. This allowed the community to demonstrate the first-ever industry DCF. Dells Nicole Reineke stressed the importance of vetted data with DCF, and Intel's Paul O'Neil talked about the benefits of DCF for privacy and confidential computing on the edge.
- 2021: It took a while for Dell to formally donate the code, but the github came online. Trevor Conn took over as lead architect for the Technical Steering Committee for Project Alvarium. The first priority was to find a customer use case and build a DCF on real hardware for the first time. IOTA's Mat Yarger came to Dell and requested that we donate a server for a climate use case. The focus on climate allowed Dell's Nicole Reineke to convince Dell's sustainability team to donate a server to the project. The rest of the year was spent building out the use case in partnership with Tom Baumann (CEO and co-founder of ClimateCheck), and Kathy Giori. Trevor Conn, Nicole Reineke, and I also began to evangelize inside Dell, and formed an internal partnership with Dell Technologies Fellow and Edge BU CTO Dan Cummins to prepare the way for Dell to build Data Confidence Fabrics as a feature inside Dell's soon-to-be-announced Edge Platform: Project Frontier.
- 2022: The Project Alvarium community finished building the first-ever "live" Data Confidence Fabric at a winery in Molina Chile. The winery had installed a brand-new biodigester that could capture methane emissions from organic feedstock (e.g., manure, plant waste) and then convert it to biogas or burn it using a flare. A carbon emissions statement was produced, along with a "confidence score" describing the trustworthiness of the statement (an industry first). The Alvarium community did a webcast describing the project, and several members of the team presented at the United Nations Environment Programme. The first-ever live DCF generated a significant amount of press and buzz. Another important development was that Dell Technologies joined the Hedera Governing Council and began an (internal) integration of Alvarium and Hedera as a second source ledger (in addition to the IOTA Tangle). The Dell DCF team begins experimenting using Hedera smart contracts (and cryptocurrency) for data monetization.
- 2023: Trevor Conn invents a new method of calculating confidence for edge data that builds upon confidence scores by calculating scores for the underlying hardware, operating system, and container layers. He presents this new approach to the Alvarium community and work began to socialize an implementation with the member companies. After gaining their approval, several Dell Technologies software engineers work on the implementation: Karim Elghamry, Ali Amin, Omar Eissa, and Michael Mikhail. Dell researcher Ahmed Khalid, as part of European Project CLEVER, works with University College Cork researchers to prove that leveraging DCF confidence scores as part of AI model training results in more trustworthy analytic models. The work is presented at the 2023 CEC workshop in Iceland. Analog Devices (ADI) approaches Dell Technologies to discuss data monetization, and plans begin to build a DCF with ADI by embedding Project Alvarium code into Dell's Native Edge framework.
- 2024: Dell researchers present how the combination of DLT smart contracts and DCF allow for monetization of edge data and edge services. ADI and Dell demonstrate a working Data Confidence Fabric at Dell Technologies World 2024 and record a joint video describing DCF benefits. Trevor Conn contributes an article Data confidence begins at the edge to CIO Magazine.
Comments
You can follow this conversation by subscribing to the comment feed for this post.