Skip to main content

Program Web Archiving

For Site Owners

The Library notifies each site owner that we would like to include their content in the archive (with the exception of government websites) prior to archiving. In some cases, the email asks permission to archive or to provide off-site access to researchers.

If a link to this page appeared in your web server logs, or in an email notification from the Library, it is because your website is being crawled by the Library of Congress. The Library has selected your site for inclusion in its historic collection of Internet materials. For general information about the web archiving program, visit our About page. To learn more about the web archiving of your site, visit our site owner FAQ.

The Library of Congress (or its agents) collects content from websites at regular intervals, primarily using the Heritrix External crawler, which is an open-source archival web crawler. The crawler begins with a "seed URL" - for instance, a homepage – and follows links on the page, and subsequent pages, downloading copies of content it finds that help make up that page, so that the content can be preserved. Our crawler is instructed to bypass robots.txt External in order to obtain the most complete and accurate representation of websites. Site elements that are sometimes excluded through robots.txt instructions that are vital to the reproduction of a site’s look, feel, and functionality include images, CSS, and JavaScript, to name a few. 

If crawling is impacting the performance of your site or you have other concerns, please contact the Library of Congress Web Archiving Team immediately at contact us. The Library would prefer to refrain from crawling any site areas that are not intended for the general public, such as administrative sections. The Web Archiving Team is happy to discuss ways to facilitate capture of desired elements and content by the Library of Congress or its agents or to mitigate crawling of these sections.

The Library hopes that site owners share its vision of preserving web materials. If you have questions, comments, or recommendations concerning the collection of your website by the Library of Congress, please do not hesitate to contact the Web Archiving Team.

 Back to top