This series of blog posts has been describing a methodology (EMC's Data Analytics LifeCycle) for using analytics to measure innovation at a multi-national corporation.
There are eight different hypotheses that the project is attempting to prove. My last few posts have focused on one specific hypothesis:
H5: Knowledge transfer activity can identify research-specific boundary spanners in disparate regions.
Phase 4 of the lifecycle focuses on running analytic models against high-quality data contained in an analytic sandbox. The chosen analytic method (Social Network Analysis) seemed to indicate that the measurement of knowledge transfer activity did indeed identify geographic boundary spanners. This was confirmed by focusing on a specific innovator (EMC Labs China innovator Jidong Chen) and observing that he indeed attended a large number of meetings with geographically-dispersed innovators.
Our hypothesis, however, insists on identifying research-specific boundary spanners.
Can the "voice" of an innovator be mapped to one or more specific research themes? If the answer is 'yes', then our global team of data scientists have come up with a model that proves the hypothesis.
As part of Phase 4 activities, EMC Data Scientist Tao Chen (also of EMC Labs China) ran topic modeling algorithms against the data in the analytic sandbox. In a previous post I highlighted the identification of twenty-five topics that emerge from this analysis (number 00-24 in the graphic below):
If Jidong's minutes, notes and presentations from all of his meetings were mapped against this topic model, what would it look like? As a reminder, the following events that Jidong participated in were queried from the analytic sandbox. I have color coded several of them in order to map them to the model.
- In 2011 Jidong attended the SIGMOD conference in Greece
- Jidong visited EMC employees in France that are part of the IIG business unit (e.g. Documentum)
- Jidong presented his thoughts on the SIGMOD conference at a Virtual Brownbag session (GREEN) attended by
- Three employees in Russia
- One employee in Cairo
- One employee in Ireland
- One employee in India
- Three employees in the U.S.
- One employee in Israel
- In 2012 Jidong attended the SDM 2012 Conference in California (RED)
- On the same trip he visited innovators and researchers at Greenplum (PURPLE) and VMware (ORANGE)
- Later on that trip he stood before the monthly CTO Council and introduced two of his researchers (and his research) to dozens of EMC innovators and researchers (YELLOW)
The Stanford Topic Modeling toolkit can take the minutes, notes, and/or presentations from these meetings and map them against all twenty-five topics. For each color listed above, here is a visualization of the mapping of Jidong's activities:
Four out of five of Jidong's activities map to topic #12. The fifth activity (his meeting with VMware) mapped to topic #1 (just barely edging out topic #12).
What is topic #12? The graphic below identifies the word cloud, and we see the theme of "Big Data" begin to emerge:
Interestingly, Jidong's VMware meeting mapped to the following word cloud:
This post has demonstrated that Phase 4 of the Data Analytics Lifecycle allowed us to prove one of our eight hypotheses: analyzing a global repository of research and innovation activity identifies research-specific boundary spanners.
There are seven more hypotheses to prove (or disprove). As our team of global data scientists begins to run their models, I will continue to publish the results.
Steve
Twitter: @SteveTodd
Director, EMC Innovation Network
Comments