Tools for a Preservation-Ready Web
Lead Partner: Old Dominion University, Department of Computer Science
Additional Partner: Los Alamos National Library Digital Library Research and Prototyping Team
This project seeks to integrate preservation capabilities into standard Web practices. The project assumes that the core technologies for creating a “preservation-ready” web are in place; what is needed is a concerted, high-profile effort to instantiate the technologies in simple protocols, methodologies and software.
Objectives:
- Promote the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) as a mechanism for discovering resources in the deep web
- Continue to explore the benefits of using the mod_oai Apache module to support both efficient discovery of updates and resources to support normal web crawling, as well as preparing "preservation-ready" resources for harvest
- Apply the OAI-PMH to a variety of new applications, including the creation of an Apache module that allows for the harvesting of events directly from the Apache logs themselves
Highlights:
- Webcast: Thinking Differently about Web Page Preservation
- Presentation: Tools for a Preservation-Ready Web (ppt, 1.43 Mb) (2008)
- Paper: A Quantitative Evaluation of Dissemination-Time Preservation Metadata (Proceedings of ECDL 2008) (pdf, 322 Kb)
- Paper: Site Design Impact on Robots: An Examination of Search Engine Crawler Behavior at Deep and Wide Websites (D-Lib Magazine, March/April 2008)
- Paper: Integrating Preservation Functions Into the Web Server (pdf, 4.4 Mb) (2008)
- Paper: Lazy preservation: Reconstructing Websites by Crawling the Crawlers (Proceedings of the Eighth ACM International Workshop on Web Information and Data Management, November 2006)
- Michael Nelson of the TPRW project is a digital preservation pioneer