The Library of Congress >> More Online Collections | |
BROWSE | SEARCH | TECHNICAL INFORMATION |
technical information |
TECHNICAL INFORMATIONMore about current efforts in the areas of national and international partnerships and efforts in the area of web capture can be found at www.loc.gov/webcapture. HarvestingThe Web sites were harvested by the Internet Archive. The harvesting depth varies according to the specifications of the curator. Information about the technical environment and tools used for harvesting web sites is available at www.loc.gov/webcapture/technical.html. Search Component and Record ContentsArchived Web sites were cataloged using the Metadata Object Description Schema (MODS). Preliminary keyword, title, and subject metadata were extracted from the archived Web sites to create preliminary MODS records that were subsequently reviewed and/or enhanced by catalogers who assigned controlled subjects from Library of Congress Subject Headings (LCSH) or Thesaurus of Graphic Materials (TGM). A Lucene search interface was developed to search the MODS records both within and across the archived collections. Collection-level:In addition, a MARC record for each collection is available in the Library of Congress Online Catalog so that the collection can be found along with other Library materials in the catalog. Metadata included in collection-level records in Library of Congress Online Catalog: 245 $a Collection title $h [electronic resource]. Web site level:MODS data included in record for each archived Web site: TITLE INFO NAME TYPE OF RESOURCE GENRE ORIGIN INFO (A single site may have multiple captures--the first and last dates of capture are recorded) <dateCaptured encoding="iso8601" point="start"> - Date of first capture of site; extracted by system from site LANGUAGE (languageTerm repeated for languages as needed) <languageTerm authority="iso639-2b" type="code"> - 3 letter code supplied by cataloger PHYSICAL DESCRIPTION (internetMediaType repeated for types as needed) <internetMediaType> - MIME type; supplied by system ABSTRACT NOTE SUBJECT (Subject repeated for subject headings and key words as needed) RELATED ITEM (Contains the collection title and the persistent ID for the collection) <titleInfo><title> - Collection Title; supplied by system IDENTIFIER (Contains the Resource ID for the Web site for single sites and for the resource page for a site with multiple captures) LOCATION ACCESS CONDITION RECORD INFO <recordCreationDate encoding="iso8601"> - Record creation date; supplied by system |
technical information |
The Library of Congress
>>
More Online Collections
March 6, 2008 |
Contact Us |