Every IT organization (and IT vendor for that matter) has one of those performance geeks who answers every question with "It depends".
When I designed storage systems, I tried to justify implementation choices with our performance team. I would ask specific questions about I/O workloads to our performance guru (Malcolm). I could never get a straight answer out of him, so I stopped asking!
Dave Vellante and his Wikibon team are publishing interesting research on the use of Amazon public cloud in the enterprise. One article in particular contains a discussion on speed and agility using the AWS approach. The conclusion...
'However, for complex applications with significant interdependencies, this ratio will decline substantially and in some cases flip in the negative direction. For customers the reality is, "It depends."'
As customers build hybrid data centers that mix private and public cloud implementations, what percentage of their data center will be public versus private? What are the rules for determining what part of the infrastructure is public versus private?
The answer, of course, is that it depends.
Wikibon has also published a fascinating use case which highlights the volatility of the private/public phenomena. Zynga moved from a 100% private implementation, to a public (AWS) approach, and then modified the public approach by re-introducing private.
I encourage you to take a deeper look at the current (and future) Wikibon research in this area.
I also encourage you to consider the data analytic questions that arise as a result of constantly shifting public/private assets:
Do you think companies like Zynga disabled their ability to run analytics as they shifted from private to public and back again? Did they have to re-write their analytic models to fit each new architecture?
The beauty of the hybrid approach is that (in theory) it should improve "analytic correctness". Correctness is correlated to volume and variety. The enterprise that can run analytics across private data AND massive amounts of public (or partner) data will win.
A Pivotal Decision
Providing a platform that allows the enterprise to abstract away the hybrid cloud balance is the answer. Analytic models should be written without regard for data center plumbing and service provider choice.
CIOs can derive much greater value and innovative output if their programmers can develop such a framework. They will be presented with a variety of architectural options for such a framework, and there are two main concerns that need to be overcome:
- They don't want analytic downtime when they rebalance or add public/private assets.
- They don't want their algorithms to get the wrong answer by locating themselves too far away from the data.
The recently-announced Pivotal Initiative is building a hybrid analytic framework using the following building blocks:
- Domain Expertise: Many enterprise developers are accustomed to building analytic applications on top of private/proprietary infrastructure. They need consulting and training on how to migrate their apps to a hybrid environment. This consulting will be provided by Pivotal Labs, which has the domain expertise required to run analytics across hybrid environments.
- App Development Platform for Hybrid: Enterprise developers utilizing platforms such as node.js, Rails, Python, Spring, etc., need an underlying, elastic platform that can span clouds. This platform is the open-source CloudFoundry platform. Cloud Foundry facilitates application portability in the cloud.
- Data Grids: the insertion of an in-memory data grid across hybrid environments accelerates analytics (I wrote about this in a previous post). The Pivotal Initiative can leverage Gemfire and/or SQLFire for this layer.
- Hybrid Analytic Capability: Analytic platforms such as open-source Hadoop are being augmented for hybrid environments, and Pivotal can feed the results into analytic applications such as Cetas. Cetas can actually run as an analytic service. Analytic capability can be run closer to the data using the Greenplum parallel data flow framework.
- Data Scientist Collaboration: App developers often work at the behest of distributed data scientists. In a hybrid environment, these data scientists may exist inside and outside of the enterprise. They need a Facebook-style collaboration tool that grants permission to socialize and discuss different modeling approaches around specific data sets. Data scientist collaboration will be enabled by Chorus (learn more about Open Chorus here).
- Cheap and Deep Storage: Analytics begets more data!
Each of these categories represents an area that I hope to explore in more detail in future posts.
Balancing analytics on top of shifting hybrid assets will certainly be an area of focus for CIOs in 2013. The Pivotal portfolio has the right technologies (and people) to navigate this shifting landscape.