This week I began a set of summary blog posts from Evanta's Global CIO Executive Summit at the Skytop Lodge in Pennsylvania. During one of the sessions I presented a keynote focused on Data's Economic Value in the Age of Digital Business. In focusing on data value I followed the three themes of the summit: Innovation (summarizing several years of innovation in the area of data value), Execution (status update on some of the internal execution on those ideas), and Results (the industry results for calculating the value of data). I presented five key insights about data value, the first of which was recommending that CIOs insert themselves into the conversation about data value. In today's post I'd like to describe the second key insight: considering the IT touch points for data valuation.
Second Key Insight: Data Workflow and Ingest are the IT Touch Points for Measuring Data's Value
This second insight identifies exactly where within an IT architecture the data valuation algorithms should be located. Before we highlight these areas, let's review the algorithms themselves.
In my last post I introduced the exploration of the problem space: traditional product companies were struggling to increase the percentage of revenue coming from data products and services (as opposed to their traditional products and services). Dr. Jim Short conducted an extensive survey in this area and identified a sample set of problems being encountered in the area of data valuation. Corporations were struggling to...
- accurately price data assets for sale/purchase
- identify which data assets within an organization are the "most monetizable"
- turn those assets into new products and services
- understand which specific analytic models within an organization have resulted in business results
In spite of these and many other problems, the good news for CIOs is that industry luminaries such as Bill Schmarzo have made significant progress outlining a framework for having the conversation about data's value. Below is an image of Bill sharing his insights with Wikibon's Peter Burris. In the image below I have also depicted two articles (Bill's approach for calculating data value and analytic value). Bill contributed both articles to CIO.com in order to publicly document his approach (the full video of Bill's interview with Peter can be found here).
Bill's recommendations for calculating value could certainly be automated once the CIO has gained a full understanding of how data maps to specific business decisions. This would allow, for example, a valuation algorithm to be codified in a similar fashion to the algorithms described by Doug Laney as part of his Infonomics research.
This equation calculates the Business Value of Information by multiplying together several variables that represent data characteristics. Chief among them is the data's relevance: a summation of how many lines of business the data is mapped to, and how relevant (on a scale of 0 to 1) each data set is to each particular line of business.
So when considering the location that these algorithms run, CIOs have a variety of choices (as depicted below).
If we use the example of calculating business relevance, we see five potential options:
- Calculate the value of content at rest (e.g. after it has been stored in a data lake). Earlier this year I highlighted potential methods of implementing this approach, which parses content and maps it against relevant lines of business. Two of the shortcomings of this approach are (a) too much data to parse, and (b) many CIOs do not want to disrupt the production system.
- A second option that may be more attractive is to perform this valuation within the context of the data protection ecosystem. This not only allows valuation algorithms to execute outside of the context of a production system, but these algorithms also have a rich set of protection metadata that can inform valuation algorithms (e.g. application metadata, user metadata, backup schedules, etc). This approach has the shortcoming, however that the valuation algorithms may be evaluating older (potentially stale) content.
- A third option is to perform valuation upon ingest. This option is often preferred because options #1 and #2 can be more difficult given the vast amount of legacy data that would need to be valued. CIOs can use frameworks such as Apache Storm for real-time and in-memory valuation.
- A fourth option is to perform valuation via a tight integration with application deployment frameworks. In particular, if a devops team can correlate the frequency of continuous delivery to data sets, these data sets become more valuable as newer versions of applications generate and store new forms and types of data.
- The final option is the lowest-hanging fruit: track the usage of data in the context of data scientists who are performing analytic workflows (which are ultimately intended for making business decisions that produce value).
After considering all of these options it became clear that option #5 was the lowest-hanging fruit. Data scientists are often exploring data in the context of a business problem. This offers the opportunity to join business context together with data. The joining together of business and technical metadata maps nicely to CIO Insight #1: joining the discussion about data's value to the business.
Our research also concluded that option #3 (valuation upon ingest) is highly desirable, and we therefore recommend that the CIO advise the infrastructure team to explore IT touch points for ingest as well as monitoring workflow for value.
The overall recommendation, therefore, is to focus on content workflow first. The implementation of workflow valuation frameworks is something that we have already done internally. Exactly how we perform this valuation is the 3rd CIO Insight, and I will focus on the specifics in Insight 3 of 5.
Fellow, Dell Technologies