The Capital and the Bay Home Page

Building the Digital Collection

Scanning the Printed Material

Paper-based printed documents in The Capital and the Bay: Narratives of Washington and the Chesapeake Bay Region, 1600-1900 were digitized by Systems Integration Group (SIG) of Lanham, Maryland. Each item was reproduced as facsimile page images. The image capture took place at the Library of Congress. In order to preserve the originals, bound works were scanned face-up in their bindings, one page at a time. The master or archival version of the textual pages (containing typography and line art) is a 300-dots-per-inch (dpi) bitonal image in the TIFF format, with ITU Group IV compression. Pages with printed halftone illustrations, finely detailed line drawings, or pages with significant color, including book covers, were captured as 8-bit grayscale or 24-bit color images, as appropriate, and stored in the JFIF image format (with JPEG compression). Bitonal text pages were scanned using the Minolta PS3000. Grayscale illustrations were scanned using the Toyo 4x5-inch studio camera with a Phase One Photophase Plus digital camera back.

The browser-display images for all document pages are in the GIF format. Library staff produce these images by creating scripts in Image Alchemy for processing batches of the master or archival images. When bitonal images are processed, gray tones are added and the resulting image is blurred to mimic grayscale. Then the image is reduced in scale to fit the typical display monitor and sharpened to enhance legibility. When the source image is grayscale, only rescaling and sharpening are undertaken to create the GIF image.

Scanning Oversized Foldouts

Some books contained foldouts that were scanned as uncompressed TIFF 8-bit grayscale or 24-bit color images, as appropriate, and then converted to MrSid images by Library staff. MrSid--multiresolution seamless image database--uses a wavelet compression technology made available to the Library of Congress by LizardTech of Seattle, Washington. This software for the storage and retrieval of large digital images is derived from the research efforts of Los Alamos National Laboratory, New Mexico. In contrast to other compression software that relies on tiling, MrSID gets all its sharp resolution from within a single compressed image and does not require any special hardware. File size does not matter. MrSID allows immediate access to any part of an image, of any size, at any resolution.

The unique feature of MrSID is its ability to decompress only that portion of the image requested by the user. The compression ratio is approximately 22:1 depending on image content and color depth. Because fast, easy access is provided through networks and the

Internet to vast amounts of geographic information, MrSID is ideal for viewing maps, orthophotos, terrain models, and satellite data.

Creating the Searchable Text

After the images were approved by National Digital Library Program staff, searchable texts were prepared offsite, where a subcontractor rekeyed the documents from the page images. These typescript materials were converted to machine-readable form at an accuracy rate of 99.95% and encoded with Standard Generalized Markup Language (SGML) according to the American Memory Document Type Definition (DTD). This DTD is a markup scheme that conforms to the guidelines of the Text Encoding Initiative (TEI), the work of a consortium of scholarly institutions. The online presentation of the texts also includes a version in HTML (HyperText Markup Language), produced by the Library in an automated process. Because it requires no special software, the HTML version is easier for most users to access.

The Capital and the Bay Home Page