A year and a half ago I heard Paul Saffo of Stanford speak at the World Innovation Forum. I described his insight that intelligent sensors would be built into all kinds of devices, and that the harvesting of information from these sensors would become fertile ground for innovation.
For example, sensors can be embedded in running shoes. These sensors can transmit GPS coordinates and running data. Other types of sensors can report weather conditions, or environmental measurements from water and/or soil.
Creating IT infrastructures to harvest all of this data is a critical goal.
Especially when the sensors are embedded in products that are sold globally.
Think about the logistics of gathering sensor data from all points around the globe. The bottom line is that sensor-producing corporations want to "turn loose" their programmers to do real-time queries on incoming sensor data:
- What percentage of the Eastern Hemisphere is sunny RIGHT NOW?
- What was the average distance covered today by all runners wearing a certain brand of running shoe?
- Which continent is currently experiencing the poorest air quality?
This week I was perusing our internal social media site (EMC ONE) when I read a blog post that caused my jaw to hit the floor. A few of our customer-facing engineers designed a "back-of-the-napkin" global solution with one our customers, and then jointly built a proof of concept.
And it worked. Exactly like they had written on the napkin. (Actually it was probably a whiteboard. Global architectures are tough to fit on a napkin). I gleaned what I could from the article and then ran a few questions by Subramanian Kartik, an EMC Distinguished Engineer, PhD, and buddy of mine. Kartik and Dave Cohen diagrammed the solution with a customer at EMC's Santa Clara lab.
It's worth describing how such a system can be built.
Requirements
When using napkins, whiteboards, and building proof-of-concepts, the requirements end up being fairly high-level and simple:
- the hardware architecture needs to be globally federated
- the customer wants to very easily manage this global architecture (e.g. private cloud), or they want to upload their programming queries and results to a service provider that is managing the deployment for them.
- the programmers want no underlying knowledge of hardware location or implementation. They just want to do big queries against all sensor output
- the programmers can be anywhere in the world, and if a certain set of programmers in a certain region start to generate a massive amount of queries, the architecture must be fluid enough to dynamically shift workloads.
Software Components
In order to satisfy the above requirements, the first thing drawn on the napkin was GemFire. As I wrote previously, GemFire is a geographically distributed database that is very fast due to the use of distributed in-memory database tables. Programmers can write queries on recent sensor data, and they can also subscribe to events from sensors (e.g. product was turned on/off, product is changing location).
GemFire is part of VMware/SpringSource.
In addition to the GemFire component, another critical addition to the architecture was vFabric Hyperic. This software can detect when the progamming load increases to the point of bottleneck. This detection can result in the addition of new servers/VMs and/or the shifting of workloads to different geographical locations.
These two software components of course join many of the other software components that must be present in order to manage a cloud deployment. I mention these two specifically because they are germaine to the global sensor discussion.
Hardware Components
The main hardware component is a vBlock. These vBlocks would be located at geographically strategic locations around the world. The VMware environment running within these vBlocks is of course the context in which the GemFire and vFabric Hyperic components run.
The geographical distribution of data blocks and workloads falls to the VPLEX device.
And finally, the long-term destination for the enormous amounts of historical sensor data is Atmos. GemFire can only realistically hold the most recent sensor data. An additional data store is required to hold ALL OF IT, and storage at global scale is what Atmos does.
This last piece of the hardware architecture is actually the piece that the customer holds most dear. Why? Because after the programmers have done their real-time queries on how their products are being globally-used, the entire historical body of sensor data can be re-purposed for other uses.
In particular, this information may have great monetary value to any number of global customers. And these customers are not the same ones that buy the products the sensors are embedded in. They may want access to environmental data, for example. Or travel patterns. Or exercise habits.
In other words (as Chuck likes to say), corporations that build global global sensor architectures might need a CFO to track the monetary value of their historical sensor data.
Field Innovation
The proof-of-concept happened at a customer site last week. The pieces worked together in the exact same architecture that was first envisioned.
I'm hoping that Kartik and others can keep us posted on the progress. I write a lot about innovation from the product development perspective. Our field team innovates in a much different way, and I love it in when they document the things that they are coming up with. In fact, last night the latest class of EMC Distinguished Engineers were inducted, and it seemed that half of the new inductees were either field-based talent or technologists from EMC's IT department.
They have to innovate by creatively combining the different products that are currently shipping.
And in this particular example, there were a wide set of technologies to choose from.
Steve
Twitter: @SteveTodd
Hi Steve,
Looks like it makes sense to add GreenPlum for historical data analysis. Am I right?
Cheers,
Ronaldo
Posted by: Account Deleted | November 18, 2010 at 09:21 AM
Hi Ronaldo,
Yes you are right, it makes a lot of sense!
Posted by: Steve Todd | November 18, 2010 at 01:03 PM