George Washington Papers Home >> Building the Digital Collection

The George Washington Papers: Building the Digital Collection


Digitizing the Microfilm | Digitizing Original Materials | Digitizing Printed Material | Digitizing Text | Database Access | Online Release History


Digitizing the Microfilm

The George Washington Papers at the Library of Congress is the first manuscript collection to be digitized in its entirety from the Library's vast collection of microfilm produced by the Library of Congress Photoduplication Service. The George Washington Papers was microfilmed in 1964 as part of a larger project, the Presidential Papers Project, which was instituted by Congress in 1957. The goal of this program was to microfilm and disseminate the papers of presidents held by the Library of Congress. The 124-reel Washington collection, captured on 35 millimeter roll microfilm, was produced as a part of this program.

Microfilm collections of historical documents present a number of issues for digitization resulting from the quality of the microfilm being scanned. In addition, there are the issues of original document condition, a wide range of tonal values, document sizes, and document orientation on the microfilm. For optimal capture of detail, the Washington Papers microfilm was raster scanned from a duplicate negative microfilm, which was generated for this purpose. The negative can reduce the appearance in digital images of flaws, such as dust, which can be found on the scanning microfilm. The negative was printed directly from the archival microfilm and produced for scanning by both the scanning contractor, Preservation Resources, and the Library of Congress Photoduplication Service. Great care was taken in the duplication process in order to compensate for the high density range of the master microfilm.

The scanning was performed offsite by Preservation Resources in Bethlehem, Pennsylvania, under contract to the National Digital Library Program.

The digital images were produced in JPEG File Interchange Format (JFIF), a compressed grayscale format often used in digitizing historical manuscript documents because of its ability to capture and display a wide range of tonal variations from those in the document paper itself to diverse qualities of pencil and ink. This 8-bit grayscale capture was also found to suppress the bleedthrough typical of handwritten documents in the collection. Grayscale GIF images were then created for preview access online. The great majority of GIFs were created from grayscale TIFF images by Preservation Resources, digitizers of the George Washington Papers. National Digital Library Program staff created GIFs from delivered JPEGs for Series 2 and parts of Series 1 and 4. Four-bit grayscale GIF images provide maximum legibility since the JPEG archival image requires considerable time to download. All of the original capture "master" TIFF images, to which LZW lossless compression was applied, were transferred to the NDL via magneto-optical disks and now reside in the NDL digital file repository for American Memory.

The total number of digital images that compose the George Washington Papers is approximately 456,000: that is, 152,000 each of JPEG, GIF, and TIFF files. The complete collection of digital files occupies approximately 300 GB of server space.

In the George Washington Papers, the majority of booklike materials, such as letterbooks, account books, and the like, were originally filmed in open-book format with two pages to a frame. In digitization the frame was split into single-page images to improve visual access. In a few exceptions, such as in account books, in which loss of content meaning would result, the frame was not split. Splitting of two-page formats of booklike materials, which are uniform in presentation, does not compromise the viewer's sense of the original artifact. This is not the case in individual manuscript letters or memoranda. Individual manuscript leaves, originally folded to make two to four pages or writing surfaces, have not been split.

Custom cropping was applied to the varying formats in the Washington Papers, which range from journals, commonplace books, and account books to individual manuscripts mounted in bound volumes by conservators. Occasionally, a cropping margin does not exist on film, and the 1-inch margin cropping rule at the document for the digital image is unattainable. All available document and text captured on the microfilm appears in this digital collection.

Book or manuscript pages containing text not oriented for reading in the microfilm were re-oriented for reading as digital images. Pages containing texts oriented in a variety of directions were left in their original orientation.

Preservation Resources staff used Photoshop's "unsharp mask filter" tool to enhance ink-to-background contrast in the images of the manuscript volume pages in Series 1b.

Preservation Resources also produced digital images from the National Archives Records Administration's microfilm of letterbooks 28, 29, and 30 in Series 2 of the George Washington Papers, employing the same scanning, cropping, splitting, and orientation specifications described above. Negative format photostatic copies of these letterbooks had originally been microfilmed with the George Washington Papers. Digital images of these were replaced with images from National Archives' microfilm of these letterbooks.

Digitizing Original Materials

Most of the items from Series 9, the Addenda to the George Washington Papers, were scanned on an i2S Digibook scanner in the Information Technology Services Digital Scan Center at the Library of Congress. Oversize materials were scanned by an overhead Phase One camera. The original items were digitized as 300-dpi grayscale images, which were compressed using JPEG compression, producing images in the JPEG File Interchange Format (JFIF). GIF images were also created.

The digital images reflect the original physical condition of the Addenda items. Some of the manuscripts are discolored or have faded ink. Others may have tears, holes, and fold marks. Several documents received conservation treatment before digitization. The Digital Scan Center staff took great care in the handling of the manuscripts

Digitizing Printed Material

This collection reproduces page images and searchable texts from Donald Jackson and Dorothy Twohig, eds., The Diaries of George Washington, 6 vols. (Charlottesville: University Press of Virginia, 1976-79), a series of The Papers of George Washington. Copyright is held by the Rector and Visitors of the University of Virginia and use is by permission of the publisher. The publisher is not responsible for the correctness and completeness of the images and texts as they appear in this online collection.

The printed volumes of the Diaries, The Writings of Washington, and Letters, to Washington were digitized by Systems Integration Group (SIG) of Lanham, Maryland. Each volume was reproduced as facsimile page images. The image capture took place at the Library of Congress. The master or archival version of the textual pages (containing typography and line art) is a 300-dots-per-inch (dpi) bitonal image in the TIFF format, with ITU Group IV compression. For the Diaries, pages with printed halftone illustrations, finely detailed line drawings, and color frontispieces were captured as 8-bit grayscale or 24-bit color images, as appropriate, and stored in the JFIF image format, with JPEG compression. The browser-display images for all volume pages are in the GIF format. GIFs were created by National Digital Library Program staff from the master TIFFs and JPEGs. Searchable text for The Diaries of George Washington was created as described below.

Digitizing Text

Text transcriptions from The Writings of Washington from the Original Manuscript Sources, 1745-1799 (39 vols.; Washington, D.C.: Government Printing Office, 1931-44) and Letters to Washington and Accompanying Papers (5 vols.; Boston; New York: Houghton Mifflin and Company; Cambridge: Riverside Press, 1898), and The Diaries of George Washington (6 vols.; Charlottesville: University Press of Virginia, 1976-79) were converted at an accuracy rate of 99.95 and encoded with Standard Generalized Markup Language (SGML) according to the American Memory DTD. All text was translated with an OmniMark program to HTML 3.2 for indexing and viewing with Web browsers.

Linking from text transcriptions in The Writings of Washington and Letters to Washington to individual manuscript documents in the Washington Papers was accomplished by the insertion of a unique identifier from the encoded text into the bibliographic database record for the document images.

Database Access

Access to the George Washington Papers is through a database created from the printed Index to the microfilm edition of the George Washington Papers and through searchable text transcriptions noted above. Every record in the database contains the name of the author of the document, the associated date, and a link to the set of document images. In addition, three other fields capture appropriate information: the correspondence recipient's name, brief explanatory notes and a link to a transcription where available.

Online Release History

The George Washington Papers has been presented online in six releases from 1998 through 2000. The First Release of Series 2 was in February 1998 (about 28,000 images); the Second Release of Series 3 and 5 was in August 1998 (together, about 50,900 images); and the Third Release of the first installment of Series 4 and all of Series 6, 7, and 8 was in February 1999 (together, about 46,000 images). A Fourth Release consisted of an Update of Series 4 in June 1999 (about 86,600 images) and a Fifth Release of Series 1 in November 1999 (about 8400 images), which brought the total number of images online to approximately 219,900. The Sixth Release in September 2000 consisted of an update of Series 4 General Correspondence (about 84,200 images) and release of The Diaries of George Washington (6 vols.; Charlottesville: University Press of Virginia, 1976-79). The six-volume Diaries, which are not part of the Library's Washington Papers proper, consists of 5,818 page images, with searchable text for each volume. The Seventh Release completed the online presentation of the George Washington Papers with the addition of a selection from the Addenda to the George Washington Papers Series, forty-five items, totaling 224 images. The George Washington Papers online consists of approximately 65,000 items, which comprise 304,000 digital images, including both file formats, GIF and JPEG, and approximately 13,000 text transcriptions.


George Washington Papers Home >> Building the Digital Collection