Library of Congress Digital Preservation Tools and Services Inventory
This is a list of software tools and utilities designed, developed or used by the Library of Congress in its digital preservation program. By making this list available, the Library encourages others in the preservation community to share in, and take advantage of, the work and resources of the Library.
Tool Listing
Tools are listed alphabetically by the name of the tool.
BagIt
A format for transferring digital content. Content is packaged (the bag) along with a small amount of machine-readable text (the tag) to help automate the content's receipt, storage and retrieval. There is no software to install. A bag consists of a base directory containing the tag and a subdirectory that holds the content files. The tag is a simple text-file manifest, like a packing slip, that consists of two elements:
1. An inventory of the content files in the bag
2. A checksum for each file.
A slightly more sophisticated bag lists URLs instead of simple directory paths. A script then consults the tag, detects the URLs and retrieves the files over the Internet, ten or more at a time. This type of simultaneous multiple transfer reduces the overall data-transfer time. In another optional file, users can add content metadata.
- Developer: Library of Congress, California Digital Library
- Written in: n/a
- OS and run-time environment: n/a
- Application: n/a
- Documentation: Bagit Specification (PDF, 83 Kb)
- License: n/a
- Last tool update: 05/31/08
Bag Validator
The Bag Validator tool is a small Python script that validates a Bag, checking for files in the manifest that are missing from the disk, files on the disk that are not listed in the manifest, and duplicate entries in manifest.
- Developer: Library of Congress
- Written in: Python
- OS and run-time environment: Unix
- Application: n/a
- Documentation: Contact Leslie Johnston at lesliej [at] loc.gov for information
- License: n/a
- Last tool update: 06/20/08
Parallel Retriever
The Parallel Retriever implements a simple Python-based wrapper around wget and rsync, producing a package in the BagIt spec when given a "file manifest" and a "fetch.txt" file. It has been used to transfer content from several transfer partners hosting rsync and HTTP servers, at rates exceeding 200Mbps over Internet2. It was initially built specifically for Internet Archive rsync transfers, but was extended to support the BagIt spec, and HTTP as well as rsync.
- Developer: Library of Congress
- Written in: Python
- OS and run-time environment: Unix
- Application: n/a
- Documentation: Contact Leslie Johnston at lesliej [at] loc.gov for information
- License: n/a
- Last tool update: 08/05/08
VerifyIt
The VerifyIt tool is a script that verifies a MD5 Bag manifest using 11 parallel md5sum processes.