I've been taking a look at data valuation through the context of a data insurance use case. In my last post I mentioned that as data architects go through the process of moving silo'ed data sets into a Business Data Lake, they should simultaneously build a governance structure known as a Metadata Lake.
I like to think of a Metadata Lake as an infrastructure construct that enables partitioning of an overall Data Lake. Indeed, for the data insurance use case, the Metadata Lake becomes a one-stop shop for an insurer to subsequently audit data sets that are insured through premium payments.
One of the key proof points that a data insurer would be looking for is data set compliance with agreed-upon data protection levels. If the insured cannot conclusively prove that the data set is (or was) protected according to the data insurance policy, then the agreement may well be null and void.
My theory is that Metadata Lakes, while clearly useful for auditing insured data sets, can also enable a much larger ecosystem for supporting data valuation and many other advanced IT features as well.
At a minimum, a data insurance auditor would want visibility into (a) the insured data set(s), (b) the agreed-upon data protection policy for each data set, and (c) the actual level of data protection currently employed for each data set. In order to visualize this please refer to the graphic below.
The applications under the insurance policy need to be registered in the lake as a form of application metadata. The policies themselves should also be stored and associated with these applications. At the same time, the location of the data sets themselves, and the current data protection levels at each location, should be gathered from the infrastructure and time-stamped as an audit-able proof point of compliance.
If your company is considering data insurance, the Metadata Lake construct will be mandatory. This brings us back to first step in the data underwriting process:
A Metadata Lake is a form of a Data Audit and Inventory System (DAIS). If a data insurer discovers that this structure doesn't exist during their initial visit, then some form of it must surely be created.
Here is some advice for Data Architects that are in the midst of migrating silo'ed data sets to a Data Lake: build a Metadata Lake at the same time. While doing this a Data Architect should pay careful attention to the mapping of that data set to the underlying trusted (data protected) infrastructure (or at least think about how it could be segmented in the future).
I'd welcome any thoughts on this topic from any Data Architects that are creating Data Lake ecosystems at this time.