I visited UC Berkeley last week to hear about their AMP Lab for the very first time. For me it was more than just another visit. When I first started working in the industry my boss gave me a copy of the RAID paper from UC Berkeley and said "build this". This gave me a chance to design my very first software system that became the eventual core of the VNX and VNXe product lines.
It turns out that two of the authors of the original RAID paper (Patterson and Katz) sit in the AMP Lab surrounded by students and other faculty. The AMP Lab is a five year initiative that logically follows from the successful conclusion of the five-year RADLab effort.
I met with Michael Franklin (the Director of AMP Lab) and he said that the RADLab experience was squarely aligned to the advent of cloud computing. In the same way, AMP Lab has squarely positioned the faculty and the students for the advent of Big Data.
Algorithms, Machines, People
Michael shared the main problem that AMP Lab is trying to solve:
The normal application of current technology doesn’t enable users to obtain timely and cost-effective answers of sufficient quality to data driven questions.
This problem statement has caused the team to research new ways of combining the vectors of timeliness, cost, and quality. The focus is on three areas:
1. Algorithms: improve scale and quality of machine learning and analytics to increase value
2. Machines: use cloud computing to get value from Big Data and enhance data center infrastructure to cut costs of Big Data Management
3. People: Leverage human activity and intelligence.
It's the last item (people) that I found most interesting. The team at the AMP Lab is beginning to explore software stacks that call out to people. I've embedded a picture of one of these stacks below (taken from the CrowdDB project page).
For Big Data, some queries may best be satisfied by the crowd. This includes large data sets with incomplete data, queries over visual pictures that may not necessarily be tagged, or queries using "synonyms" (e.g. "IBM" and "Big Blue"). One of the more popular platforms for assigning work tasks to the crowd is Amazon's Mechanical Turk, and the AMP Lab has already started research with that platform.
I'd be interested in experimenting with this type of layered (the PowerPath stack comes to mind).
The portfolio of Big Data research going on at AMP Lab was cool. I walked out of there feeling pretty stoked.
Or should I say AMPed?
Steve
Twitter: @SteveTodd
Director, EMC Innovation Network
Comments