Yesterday was a culmination of years of preparation when the National Science Foundation visited MIT. The NSF is a branch of the US CyberInfrastructure Department, and their team is in the final phases of evaluating grant requests as part of their DataNet initiative.
The NSF will grant five awards of $20 million apiece (spread over five years) for proposals that target the following vision:
Chapter 3 (Data, Data Analysis, and Visualization) of NSF’s Cyberinfrastructure Vision for 21st Century Discovery presents a vision in which “science and engineering digital data are routinely deposited in well-documented form, are regularly and easily consulted and analyzed by specialists and non-specialists alike, are openly accessible while suitably protected, and are reliably preserved.” The goal of this solicitation is to catalyze the development of a system of science and engineering data collections that is open, extensible and evolvable.
Stuart Madnick and MacKenzie Smith of MIT have pulled together a proposal called DataSpace. They've also gathered together a team of of participants (should the proposal be funded). More information about the motivation behind DataSpace can be found by reading Dr. Madnick's interview with On Magazine.
The full-day session was put together by Stuart and MacKenzie. Having never participated in a grant process before I found it quite interesting to experience the flow of the meeting. In general, most of the DataSpace presenters fell into the following categories:
- Scientists: the DataSpace project would initially target neuroscience and oceanography.
- Digital Curators: Curators from Oregon State, Georgia Tech, Rice, and MIT discussed issues and solutions for the management of digital scientific data.
- Industry: Google, HP, Microsoft, and EMC described technologies and research directions that are applicable to DataSpace.
- Software: several team members presented software technologies and open source tools that were applicable to DataSpace.
- Program management: MIT staff discussed how the project (and therefore the grant money) would be managed.
The meeting ran for about ten hours, and I spoke for 5 minutes. In general my presentation stated that EMC has a set of technologies that are applicable to DataSpace and could contribute people, equipment, and lab space to the effort. If you pick apart the NSF quote there are a set of technologies that map quite nicely:
- "science and engineering digital data are routinely deposited in well-documented form". The term "well-documented" implies metadata, and object-based storage systems would provide strong binding between scientific content and metadata.
- "are openly accessible". If object-based access methods are used, the XAM standard is an open standard for access.
- "while suitably protected". In addition to protection capabilities like parity, mirroring, backup, remote copy, etc., time-stamping and content addressing can provide extra levels of protection and authentication.
- "and are reliably preserved". Data preservation is a key part of XAM, given that it has built-in support for migration over multiple generations of technology. EMC is also doing research on data provenance and long-term preservation; these research directions have a strong overlap with DataSpace.
The presentation to the NSF had a great mix of vision, people and technologies. I'm pleased to have my name on the proposal. These technologies are right up my alley, and it would be a privilege to work with such a diverse and talented team.