This is the last in a series of posts describing a methodology (EMC's Data Analytics LifeCycle) for using analytics to measure innovation at a multi-national corporation. This lifecycle is taught at the Data Science and Big Data Analytics course created by EMC, and I've blogged my way through each phase of the lifecycle and have arrived at the end (Phase 6).
As a review, here is a graphical view of the lifecycle, followed by a summary of all the posts written thus far:
Phase 5
Global Knowledge Flight Patterns
Phase 4
Phase 3
Phase 2
Phase 1
Introduction to Innovation Analytics
Given the foundation of the first five phases, let's finish with the final phase.
Phase 6 is called "Operationalize". My team and I have not yet reached this phase. My understanding of Phase 6, however, is influencing our journey through the steps. The journey that my team has undergone so far can be summarized as follows:
Running analytics against a sandbox filled with notes, minutes, and presentations from innovation activities has yielded great insights into EMC's innovation culture.
Phase 6 moves the analytic models out of the sandbox and into production. The course advises a production "pilot" be run first (as opposed to deploying the model on a wide-scale). This approach minimizes risk. Smaller-scale deployment allows the team to learn about the performance and make adjustments before a full deployment.
Phase 6 may require a new team of people to join the initiative (the people that are responsible for running the production environment). These people will help feed data sets into the production model. During the execution of the model in the production environment, it is important to detect anomalies on inputs before they are fed into the model. This may not be 100% possible. Consider doing a logistic regression on a training set of the data if possible.
What does this specifically mean for the project I've been running? The points mentioned below are key aspects to remember for any company wishing to run innovation analytics:
- We need more data, which means we need a marketing initiative to convince people to submit (or inform) the global community on their innovation/research activities.
- This data is sensitive and some thought needs to go into "who" can run the model and "who" see the results
- In addition to running models, a parallel initiative will likely be to access the repository for search (people want to search for innovation/research initiatives). This may impact the performance of the analytics.
- We need a mechanism to continually re-evaluate the model after deployment. Assessing the benefits is one of the main goals of this stage, as well as defining a process to retrain the model as needed.
This last point represents a challenging and often overlooked aspect of Phase 6. The team needs to assess whether the model is meeting goals and expectations, and if desired changes are actually occurring. The data may change over time, or live data may morph to the point where the model needs to be updated or retrained.
As I reach the end of this series of blog posts, I'd like to thank Dave Dietrich, who has proofread nearly all of my posts for accuracy!
As the efforts of the data scientists come to a close, Dave has one final piece of advice:
Hold a post-mortem with the analytic team to discuss what would change in the process or project if you had the chance to do it over again.
Steve
Director, EMC Innovation Network
Comments
You can follow this conversation by subscribing to the comment feed for this post.