skip navigation

Library of Congress

Authorities & Vocabularies

The Library of Congress > Linked Data Service > Dataset Descriptions

Left nav image goes here

Dataset Descriptions

Available Datasets

The Linked Data Service is to provide access to commonly found standards and vocabularies promulgated by the Library of Congress. This includes data values and the controlled vocabularies that house them. Below you are descriptions of each included vocabulary and the ability to search the vocabularies individually or all of them simultaneously.

Library of Congress Subject Headings

Library of Congress Subject Headings (LCSH) has been actively maintained since 1898 to catalog materials held at the Library of Congress. By virtue of cooperative cataloging other libraries around the United States also use LCSH to provide subject access to their collections. In addition LCSH is used internationally, often in translation. LCSH in this service includes all Library of Congress Subject Headings, free-floating subdivisions (topical and form), Genre/Form headings, Children's (AC) headings, and validation strings* for which authority records have been created. The content includes a few name headings (personal and corporate), such as William Shakespeare, Jesus Christ, and Harvard University, and geographic headings that are added to LCSH as they are needed to establish subdivisions, provide a pattern for subdivision practice, or provide reference structure for other terms. This content is expanded beyond the print issue of LCSH (the "red books") with inclusion of validation strings.

*Validation strings: Some authority records are for headings that have been built by adding subdivisions. These records are the result of an ongoing project to programmatically create authority records for valid subject strings from subject heading strings found in bibliographic records. The authority records for these subject strings were created so the entire string could be machine-validated. The strings do not have broader, narrower, or related terms.

Library of Congress Name Authority File

The Library of Congress Name Authority File (NAF) file provides authoritative data for names of persons, organizations, events, places, and titles. Its purpose is the identification of these entities and, through the use of such controlled vocabulary, to provide uniform access to bibliographic resources. Names descriptions also provide access to a controlled form of name through references from unused forms, e.g. a search under: Snodgrass, Quintus Curtius, 1835-1910 will lead users to the authoritative name for Mark Twain, which is, "Twain, Mark, 1835-1910." Names may also be used as subjects in bibliographic descriptions, so they may be combined with controlled values from subject heading schemes, such as LCSH.

Library of Congress Names includes over 8 million descriptions created over many decades and according to different cataloging policies. LC Names is officially called the NACO Authority File and is a cooperative effort in which participants follow a common set of standards and guidelines.

Library of Congress Classification

The Library of Congress Classification (LCC) is a classification system that was first developed in the late nineteenth and early twentieth centuries to organize and arrange the book collections of the Library of Congress. Over the course of the twentieth century, the system was adopted for use by other libraries as well, especially large academic libraries in the United States. It is currently one of the most widely used library classification systems in the world.

The system divides all knowledge into twenty-one basic classes, each identified by a single letter of the alphabet. Most of these alphabetical classes are further divided into more specific subclasses, identified by two-letter, or occasionally three-letter, combinations. For example, class N, Arti>, has subclasses NA, Architecturei>; NB, Sculpturei>, ND, Paintingi>; as well as several other subclasses. Each subclass includes a loosely hierarchical arrangement of the topics pertinent to the subclass, going from the general to the more specific. Individual topics are often broken down by specific places, time periods, or bibliographic forms (such as periodicals, biographies, etc.). Each topic (often referred to as a caption) is assigned a single number or a span of numbers. Whole numbers used in LCC may range from one to four digits in length, and may be further extended by the use of decimal numbers. Some subtopics appear in alphabetical, rather than hierarchical, lists and are represented by decimal numbers that combine a letter of the alphabet with a numeral , e.g. .B72 or .K535. Relationships among topics in LCC are shown not by the numbers that are assigned to them, but by indenting subtopics under the larger topics that they are a part of, much like an outline. In this respect, it is different from more strictly hierarchical classification systems, such as the Dewey Decimal Classificationi>, where hierarchical relationships among topics are shown by numbers that can be continuously subdivided.

Library of Congress Children's Subject Headings

The Library of Congress Subject Headings Supplemental Vocabularies: Children’s Headings (LCSHAC) is a thesaurus which is used in conjunction with LCSH. It is not a self-contained vocabulary, but is instead designed to complement LCSH and provide tailored subject access to children and young adults when LCSH does not provide suitable terminology, form, or scope for children. LCSHAC records can be identified by the LCCN prefix "sj".

Library of Congress Genre/Form Terms

The Library of Congress Genre/Form Terms for Library and Archival Materials (LCGFT) is a thesaurus that describes what a work is versus what it is about. For instance, the subject heading Horror films, with appropriate subdivisions, would be assigned to a book about horror films. A cataloger assigning headings to the movie The Texas Chainsaw Massacre would also use Horror films, but it would be a genre/form term since the movie is a horror film, not a movie about horror films.

The thesaurus combines both genres and forms. Form is defined as a characteristic of works with a particular format and/or purpose. A "short" is a particular form, for example, as is "animation." Genre refers to categories of works that are characterized by similar plots, themes, settings, situations, and characters. Examples of genres are westerns and thrillers. In the term Horror films "horror" is the genre and "films" is the form.

LCGFT assumed its title in June 2010 and in May 2011 the LCCN prefix "gf" was implemented to identify Genre/form terms as part of the change to separate LCGFT from LCSH. The "gf" prefix is one way by which a record can be identified as a genre/form authority record. Further information about the LCGFT thesaurus and its relationship to the LCSH data set may be found in Library of Congress to Reissue Genre/Form Authority Records (Revised May 9, 2011) and in a FAQ on the topic.

Thesaurus for Graphic Materials

The Thesaurus for Graphic Materials is a tool for indexing visual materials by subject and genre/format. The thesaurus includes more than 7,000 subject terms to index topic shown or reflected in pictures, and 650 genre/format terms to index types of photographs, prints, design drawings, ephemera and other categories. New terms are added regularly. TGM is searchable through the Prints and Photographs Online Catalog (PPOC).

Preservation Events

Preservation events are actions performed on digital objects within a preservation repository.

Preservation Level Role

Preservation level roles are values that specify in what context a set of preservation options is applicable.

Cryptographic Hash Functions

A cryptographic hash function is a transformation that takes an input and returns a fixed-size string, which is called the hash value. Hash functions with this property are used for a variety of computational purposes, including cryptography. The hash value is a concise representation of the message or document from which it was computed. Cryptographic hash functions are used to do message integrity checks and digital signatures in various information security applications, such as authentication and message integrity. They may also be referred to as message digest algorithms or checksum algorithms.

MARC List of Relator Terms

Relator terms and their associated codes designate the relationship between a name and a bibliographic resource. The relator codes are three-character lowercase alphabetic strings that serve as identifiers. Either the term or the code may be used as controlled values.

MARC List of Countries

MARC Countries list identifies current national entities, states of the United States, provinces and territories of Canada and Australia, divisions of the United Kingdom, and internationally recognized dependencies. The list's codes are two- or three-character lowercase alphabetic strings that serve as identifiers. The MARC country codes are not the same as the ISO 3166 country codes, although the lists are entity-compatible so that a simple translation could relate codes for the same entity. The records for the codes contain references to the equivalent ISO 3166 codes.

The list contains over 350 discrete codes. This list is also searchable at: MARC Code List for Countries.

MARC List of Geographic Areas

Geographic Areas list identifies separate countries, first order political divisions of some countries, regions, geographic features, areas in outer space, and celestial bodies. The list's codes are one-to-seven lowercase alphabetic strings that serve as identifiers.

The list contains over 550 discrete codes. This list is also available at: MARC Code List for Geographic Areas.

MARC List of Languages

MARC List for Languages provides three-character lowercase alphabetic strings that serve as the identifiers of languages and language groups. The codes are usually based on the first three letters of the English form or, in some cases, vernacular form of the corresponding language name. The codes are varied where necessary to resolve conflicts and are not intended to be abbreviations of a language name. When the name of a language is changed in the list, the original code is generally retained.

The codes in this list are equivalent to those of ISO 639-2 (Bibliographic) codes and some codes from ISO 639-5, although the language name labels may differ. They are linked to the equivalent codes in ISO 639-2 and ISO 639-5 and the corresponding two-character codes in ISO 639-1.

The list contains over 480 discrete codes. It is also searchable at: MARC Code List for Languages.

ISO 639-1: Codes for the Representation of Names of Languages - Part 1: Two-letter codes for languages

ISO 639-1 is the first part of the ISO 639 international-standard language-code family. ISO 639-1 provides two-character lowercase alphabetic strings that serve as identifiers of languages. The list contains approximately 180 discrete codes. All ISO 639-1 languages also have ISO 639-2 three-character code representations. These codes are linked to codes for the same languages in ISO 639-2 and the MARC Language Codes.

ISO 639-2: Codes for the Representation of Names of Languages - Part 2: Alpha-3 Code for the Names of Languages

ISO 639-2 is part of the ISO 639 language code family, which provides also a two-character code set (ISO 639-1) for the representation of names of languages. ISO 639-2 contains codes for all languages contained in ISO 639-1 and several hundred additional languages. The ISO 639-2 (Bibliographic) codes were devised for use in bibliographic metadata, e.g., for libraries, information services, and publishers, and ISO 639-2 (Terminology) targets terminology, lexicography, and linguistic applications. The lists are the same except for 20 languages that have different Bibliographic and Terminology codes. The list contains over 500 discrete codes.

The ISO 639-2 (Bibliographic) codes are equivalent to the MARC Language Codes. The ISO 639-2 codes are linked to two-character codes for the same language in ISO 639-1, to the MARC Language Codes, and to equivalent codes for language groups in ISO 639-5.

ISO 639-5 Codes for the Representation of Names of Languages - Part 5: Alpha-3 Code for Language Families and Groups

ISO 639-5 provides three-character lowercase alphabetic strings that serve as identifiers for the representation of names of living and extinct language families and language groups. The list contains over 100 discrete codes.

The codes on this list include all of the codes for language groups in the MARC Language Code scheme and over 40 additional groups. The codes are linked to their equivalent codes on the MARC Language Code list and ISO 639-2.

Extended Date/Time Format

Extended Date/Time Format Datatypes Scheme collects the three different datatypes, one each pertaining to a EDTF level.