Stargazers will be looking heavenward tonight, but where's the data going?
Tonight is the peak night for getting a great look at the Perseid meteor showers (if you happen to live in the Northern Hemisphere). The Earth is currently hurtling through the debris from the comet known as Swift-Tuttle. On the morning of August 12th we'll pass through the densest amount of this debris. These meteors are called "Perseids" because they radiate outward from the constellation known as Perseus. Here are some tips for viewing the Perseids. Lie down in any large, flat area, preferably one that is not paved, and/or without orange lines running down the center.
When it comes to the study of space phenomena, the amount of data generated and stored is, well, astronomical. I've often wondered how scientists configure their storage systems to handle this amount of data.
One group of scientists that is on the job capturing data about meteors are these guys from NASA's Meteoroid Environment Office. The MEO is the "NASA organization responsible for meteoroid environments pertaining to spacecraft engineering and operations". I grabbed this picture of them from their website and I'm glad that they are working hard and wearing ties.
I started to dig around for the purpose of understanding where and how the world is storing the scientific data about outer space. Part of my motivation is to understand more about the re-use of scientific data as part of the MIT DataSpace proposal. I happened upon an effort called the Virtual Meteor Observatory (VMO), which has a goal of "improved access to astronomical data and computing resources".
In December of last year there was a meeting of the International Space Science Institute (ISSI) to discuss the architecture of a VMO. The meeting was attended by a variety of meteor observer teams, such as the Polish Fireball Network, the University of Western Ontario in Canada, and the IAU Meteor Center. The full report from the meeting can be found here.
The VMO has agreed upon a standard XML format for the description of meteoric data, which I've cut directly from the report and pasted here (for what it's worth I was unaware that Heidi Klum was involved in this effort):
<?xml version="1.0" encoding="utf-8"?>
<vmo xmlns="http://www.imo.net">
<fireball>
<time>2008-11-23T15:24:13</time>
<brightness>as the full moon</brightness>
<observer>Heidi Klum</observer>
<location_latitude>
35.24351
</location latitude>
<location_longitude>
-89.62907
</location longitude>
<country code>US</country code>
...
</fireball>
</vmo>
With the agreed-upon metadata format, the team then determined five different types of "data formats" to be stored in the VMO repository:
- Visual meteor observations
- Video and still camera data
- Forward and backward scatter radio observations
- Fireball observations
- Orbit data
Existing scientific software that generates data and metadata must be converted into the VMO format, and ultimately everything is stored in a PostgreSQL database. Discussions are ongoing on whether all scientific data is stored in one large database or if a distributed approach will be used.
Two thoughts that I had as I read through the information about this effort:
- The researchers are involved with the submission and ingest of scientific data, as well as the dissemination of the data for scientific analysis. This workflow is very similar to the standard workflow described by the Open Archival Information System specification (depicted in Figure 4-1). The researchers could benefit from their awareness of this standard.
- I was unable to find mention of the scalability limits of the system and requirements on the storage in general. Given that nearly all of the data being generated is fixed content (otherwise called reference information), then the XAM specification is a good candidate to abstract away the dependency on PostgreSQL.
Get out there tonight, find a safe piece of grass, and watch the show.
Steve
http://stevetodd.typepad.com
Twitter: @SteveTodd
EMC Intrapreneur
Comments