In a previous post I described a set of hypotheses about innovation at my corporation (EMC). One of the hypotheses focused on the role of a boundary spanner in the transfer of geographic knowledge:
H5: Knowledge transfer activity can identify research-specific boundary spanners in disparate regions.
In Phase 3 of the Data Analytics lifecycle (described in my last post), I mentioned that:
“Phase 3 represents the last step of preparations before executing the analytical models and, as such, requires you to be thorough in planning the analytical work and experiments in the next phase.”
The third phase also gives the data scientist another opportunity to explore the data in ways that are specific to the set of hypotheses. In the case of identifying boundary spanners, EMC Distinguished Engineer and Data Scientist John Cardente already had a gut feel for the model that he wanted to use:
"I believe that the boundary spanner hypothesis can be explored via social network analysis. I increased my knowledge of SNA by reading Albert-Laszlo Barabasi's excellent book, "Linked: How Everything is Connected to Everything Else and What it Means". While exploring some of the data in the analytic sandbox (e.g. EMC’s 2011 Innovation Showcase data), I confirmed that I had enough information to re-construct the social network associated with the contest entries. The social dynamics of EMC’s Innovation Showcase have always fascinated me. Was it the result of lone geniuses or large teams of collaborators? How connected was EMC's innovation network? Were there key innovators that acted like "hubs" tying ideas together?
I decided to use R for the social network analysis. I chose to use the igraph R package based on my knowledge of a great talk by Drew Conway entitled "Social Network Analysis in R". Thanks to the power of R and packages like plyr, it only took a small amount of code to transform the names associated with each contest submission into a form suitable for use with igraph. I produced the social network graph that was published as part of the “Irish Butterfly” post. From there, I explored the capabilities of the igraph package and experimented with using cliques, components, degrees, and betweenness metrics to identify individual and groups of highly effective innovators."
John’s use of R to explore his model choice (Social Network Analysis) took only a few lines of R code, but the code yielded immediate impact within EMC. A highly active network of Irish innovators was identified. As a result of his work I contacted the Irish team and invited them to share their effective approach with other countries.
The igraph package also allowed John to explore the “betweenness” metrics of individual innovators, and we began to consider the possibility that betweenness may help us to prove that research-specific boundary spanners are alive, well, and active at EMC (which is one of our hypotheses).
John’s success in this regard allows us to ask a pivotal question for moving to the next phase:
“Do we have a good idea about the type of model to try”?
In this case, the answer is yes. For other hypotheses, we have found the answer to be no. I will dive further into these Phase 3 scenarios in future posts.
Steve
Twitter: @SteveTodd
Director, EMC Innovation Network
Comments