Main Page

From eotcd

Jump to: navigation, search


Project Name: Classification of the End-of-Term Archive: Extending Collection Development to Web Archives

Project Acronym: eotcd (eot=End-of-Term; cd=Collection Development)

The End-of-Term Web Archive (EOT Archive) is the result of a collaborative project of the Library of Congress, the US Government Printing Office, the Internet Archive, the University of North Texas (UNT) Libraries, and the California Digital Library. That project captured the entirety of the federal government’s public Web presence before and after the 2009 change in presidential administrations (Library of Congress, 2008). The result is an approximately 16 terabyte Web archive of government information that is replicated in repositories at the collaborating organizations, including UNT.

As Web archives become more available and accessible, many libraries will be collecting materials from these important information repositories. Librarians will need the capability to identify and select materials in accord with collection development policies. Additionally, libraries will need to characterize these materials using common metrics; however, such metrics do not exist, making it difficult for librarians to communicate the scope and value of these materials to administrators.

The eotcd project will utilize the EOT Archive to investigate innovative solutions to address these needs. Participants in this study will be 10 librarians who will serve as Subject Matter Experts in the area of collection development for government information. Tools built for the project will use open source platforms and will be publicly available. Research will be conducted concurrently in two work areas:

1. EOT Archive Classification
The materials in the 2008-2009 End-of-Term (EOT) Web Archive will be classified according to the Superintendent of Documents (SuDocs) Classification Numbering System. Classifying government information in accordance with SuDocs will allow librarians to use their existing collection development policies to select materials from the EOT Archive.
2. Web Archive Metrics
A set of metrics for materials in Web archives will be identified. These will enable characterization of materials in Web archives in units of measurement more familiar to libraries and their administrations.
3. Improving Access to the EOT Archive

Servers will be acquired to enable experiments that integrate new functionality into existing digital library access tools. New functionality will directly relate to the integration of knowledge acquired from the cluster analysis and findings from the classification and tagging exercises.

4. Researcher Needs Assessment

Interviews will be conducted with about 8 to 12 researchers to determine the type and range of research questions they study and to identify how the materials in the EOT Archive would assist them in their investigations. The findings from these interviews will inform a set of anticipated use cases describing how researchers’ needs could be addressed.

Helpful Information

Project Partners: The University of North Texas Libraries and the Internet Archive

Project Abstract:Abstract.pdf

Project Narrative: Narrative.pdf

Project Team

Subject Matter Experts

Government Documents Information

Personal tools