I was perusing the Captiva announcement today and a quote from one of the leading archivists at JFK caught my eye:
Erica C. Boudreau, Archivist for the John F. Kennedy Presidential Library in Boston, said, “This new version of Captiva enables us to compress our scanned images using a lossless method, saving storage space and costs while maintaining high quality images for the historical documents that the JFK Library is capturing. The ability to quickly manipulate images and to process them using Captiva’s latest optical character recognition (OCR) engine produces more accurate results and saves our staff significant time.”
Several weeks ago some of my co-workers and I visited Erica's team at the JFK Library and Museum to ask some questions about their digital archive, find out what was working well, and ask what we could do better. The archive holds a massive amount of scanned documents from JFK's presidency. A key part of their ingest strategy relies on Captiva's automated ability to quickly scan documents and (a) immediately convert them to multiple image formats, and (b) perform automated optical character recognition (OCR).
The picture below highlights the steps that are taken before the multiple images are handed to the back-end content management system.
During our visit the JFK archivists were asked when they would be "finished" scanning documents. The answer, based on their current process, was measured in decades (well over a hundred years)! With only two scanners, there are only so many documents that can be pushed through the system.
Their process was greatly accelerated, however, when they upgraded their Captiva software and leveraged the ability to generate multiple image sizes on the fly. These images are then sent to the back-end system (Documentum) along with the OCR metadata. From that point on the images and meta-data can be accessed using Documentum's Digital Asset Manager. The image below shows has DAM can be leveraged to manage and display these images. These images are photographs (not scanned documents), but they highlight the benefit of a Captiva integration:
In order to learn more about how Captiva transforms scanned documents so quickly I read this benchmarking report by Wipro (registration required). The report is a fair comparison of Captiva's capabilities versus the competition. For some use cases Captiva comes out on top, and in other cases they don't. If you look at the performance comparisons, however, there is no contest. Captiva is built for speed.
When considering that over 200,000 documents have been scanned into the archive to date, performance truly counts.