This is my third post about the digital archiving effort at the JFK Library. The effort is less than two years old and already a large amount of JFK's papers and photos have been digitally archived.
The software and hardware being used to process and store the information was donated by EMC, and the process of preserving the artifacts was designed by the archivists working at the JFK Library.
Part of their design process was a consideration of OAIS: an Open Archival Information System. OAIS is a standard that defines the creation of an archive. OAIS provides common terms and comparison points with other, similar archival systems. The team also considered a reference model known as "Trusted Digital Repository" document (see comments from JFK archivist James Roth below).
The attention given by JFK's archivists to OAIS allows me to write this article using terms that are familiar to archivists and digital curators throughout the world. As I've learned about OAIS I find myself focusing on what seems to be an acronym of primary importance: AIP.
The Archival Information Package.
One of the functions supported by an OAIS is the actual "preservation" of the information. The archival information package (AIP) models everything needed to preserve information over time. There are other types of packages that model submissions into the archive (SIPs) and dissemination out of the archive (DIPs). For this particular post I'll to focus on the AIP. The diagram below breaks the AIP into two fundamental pieces:
Click to enlarge
At a high level I view "Content Information" as the information being preserved, and "Preservation Description Information" as meta-data added by the archivists.
For a detailed description of the diagram please refer to the Reference Model for OAIS.
Given this very brief overview it's an interesting exercise to describe the AIP implementation chosen by the JFK Library archivists. A review of my previous post on the JFK folder numbering scheme would be helpful.
AIP = Folder
The JFK Library chose to associate an Archival Information Packet with each individual folder holding JFK's documents, pictures, and other items. All the content from the folder labeled "JFKPOF-001-001" is stored on EMC's infrastructure as an archival information packet. This fact allows us to ultimately understand the JFK archiving process and how it maps onto the software and hardware provided by EMC.
The diagram breaks down the archival information packet into "content information" and "preservation description information".
How is the JFK Library implementing the "content information" aspect of an AIP? Content information is made up of two things: data objects and Representation Information. The data objects are the "digital scans" that come from each physical object in the JFK folders. Think of data objects as the physical bits resulting from the scan. The Representation Information describes how to interpret (e.g. view) that information. For example, the representation information for JFK's scanned documents would be something along the lines of "image/tiff; 24 bits, uncompressed, 600 ppi".
Preservation Description Information
The second part of the Archival Information Package is the PDI. Think of the PDI as extra meta-data that is added by the archivists. The JFK Library implementation of PDI is as follows:
- Reference Information: this field needs to unambiguously define a persistent identifier for the AIP. In the case of the JFK library this is the JFK reference number (e.g. JFKPOF-001-001).
- Provenance Information: this field documents the history of the AIP. Evelyn Lincoln (JFK's personal secretary), for example, could be listed in the provenance information for JFK's presidential office files. Other provenance information could include the reference number itself, the person who conducted the scan, along with the scanning equipment and software used to digitize the information (e.g. Fujitsu Image Scanner, Documentum ApplicationXtender, etc).
- Context Information: this information allows the archivist to add additional information about documents related to this AIP. The JFK Library uses collections (e.g. the President's office files or National Security documents) and series (e.g. speeches) to provide additional context.
- Fixity Information: this information is used to authenticate the information. The JFK Library uses the JFK reference number as a cross-reference to validate the authenticity of scanned documents.
Best Effort & Dublin Core
I've already stated that the "Trusted Digital Repository" is not a standard, but the JFK archivists certainly gave it due consideration. TDR recommendations were implemented when they were realistic and attainable given the level of resources (people and equipment) available to do the archiving. In other words, recommendations were implemented where practicable.
I haven't come close to describing the entirety of the OAIS standard, but I've covered just enough to map the AIP onto the EMC hardware and software at the JFK Library. I will catalogue this infrastructure in an uncoming post.
For those of you familiar with OAIS, I'd love to receive your comments.
Many thanks again to the JFK archivists for their time....