The article Metadata Lakes and Data Value, written in February of this year,stated the following:
...as data architects go through the process of moving silo'ed data sets into a Business Data Lake, they should simultaneously build a governance structure known as a Metadata Lake.
The diagram below emphasizes that the Metadata Lake should contain, among other things, application policy metadata and infrastructure metadata.
The idea for creating a Metadata Lake was in response to the use case of Data Insurance. A data set insurer would need access to a Data Audit and Inventory System (DAIS). In the context of data insurance I wrote:
A Metadata Lake is a form of a Data Audit and Inventory System (DAIS). If a data insurer discovers that this structure doesn't exist during their initial visit, then some form of it must surely be created.
A Metadata lake can certainly function as a DAIS, but in the case of placing an application (and its data) onto the appropriate infrastructure based on data value, a Metadata Lake can be used for much more than just a data inventory system.
For example, in my last post I highlighted how tools like CloudFoundry can perform manual application placement by having an operator choose to place an app on one of seven cloud choices (e.g. VMware, Amazon, Google):
Given the current state of the art, the diagram above raises several questions:
- How can the industry move towards an automated approach where application deployment is constrained by the anticipated economic value?
- If the line of business for the app can articulate the anticipated value, can tools such as CloudFoundry automatically constrain application placement?
- Can the deployment layer contain business logic that matches line of business value versus trusted infrastructure services?
If a Metadata Lake contains (at a minimum) the two types of attributes depicted below, not only will it serve as a Data Audit and Inventory System, but it also can be referenced by PaaS platforms that wish to introduce Governed Placement Services into the deployment process.
The diagram highlights two forms of metadata that must be added into a Metadata Lake:
- Trust Services. These services are essentially the trust dimensions, or trust taxonomy, that have been surfaced up by all members of a cloud infrastructure, and are now being advertised as services containing different level of trust support by all of the candidates depicted in the diagram above.
- Data Policies. These policies represent business statements about the value (and therefore the level of trust required) of the data set being generated by the application. Example policies could include the need for data retention, encryption, immutability, etc.
As these policies and services are stored in the Metadata Lake (along with an API for accessing them), it allows for the development of new Governed Placement Services to be added into a PaaS framework.
In an upcoming post I will explain some of the flow and logic that can allow tools like CloudFoundry to change their current mode of operator-specified placement and shift to an automated, governed placement approach.
Steve
EMC Fellow