MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media

RECORD STRUCTURE

January 2000

CONTENTS

INTRODUCTION

MARC 21 is an implementation of the American national standard, Information Interchange Format (ANSI Z39.2) and its international counterpart, Format for Information Exchange (ISO 2709). These standards specify the requirements for a generalized interchange format that will accommodate data describing all forms of materials susceptible to bibliographic description, as well as related information such as authority, classification, community information, and holdings data. The standards present a generalized structure for records, but do not specify the content of the record and do not, in general, assign meaning to tags, indicators, or data element identifiers. Specification of these elements are provided by particular implementations of the standards. The following description of the MARC 21 record structure indicates the specific choices made for the MARC 21 implementation of the standards.

DEFINITIONS

Italicized terms within definitions are terms for which definitions are also provided.

base address of data.
A data element in the leader which specifies the first octet of the first variable field in the record and is equal to the sum of the lengths of the leader and the directory, including the field terminator at the end of the directory.
bibliographic level.
A data element in the leader of bibliographic records which provides additional information about the characteristics and components of the record, and is used in conjunction with the type of record data element of the leader.
blank (SP).
ASCII character 20(hex) (represented graphically in MARC 21 documentation as ASCII character 20 (hex) or #), which is used in indicators and data elements containing coded values (and occurs in data content). Generally, blank stands for "undefined," but in some instances it has been assigned a meaning. ASCII name is space.
character.
A member of a set of elements used for organization, control, or representation of data. In MARC 21 a character may be encoded using one or more than one octet, depending on the character set. All ASCII characters are encoded using one octet in the ASCII encoding and the Unicode UTF-8 encoding, thus a character is equivalent in length to an octet when an element's values are restricted to ASCII.
content designation.
The codes and conventions established explicitly by MARC 21 to identify and further characterize the data elements within a record and to support the manipulation of that data.
control field.
A variable field containing information useful or required for the processing of the record. Control fields are assigned tags beginning with two zeroes. Control fields with fixed length data elements are restricted to ASCII graphics.
control number.
An ASCII graphic character string uniquely associated with a record by the organization transmitting the record and located in a specific variable field.
data element.
A defined unit of information.
data element identifier.
A one-character code used to identify individual data elements within a variable field. The data element may be any ASCII lowercase alphabetic, numeric, or graphic symbol except blank.
data field.
A variable field containing bibliographic or other data. Data fields are assigned tags beginning with characters other than two zeroes. Data fields contain data in any MARC 21 character set unless a field-specific restriction applies.
delimiter.
ASCII control character 1F(hex) (represented graphically in MARC 21 documentation as ASCII control character 1F (hex) or $), which is combined with a data element identifier to make up the subfield code which precedes each individual data element within a variable field. The ASCII name for the delimiter is unit separator (US).
directory.
An index to the location of the variable fields (control and data) within a record. The directory consists of entries.
encoding level.
A data element in the leader of authority, bibliographic, classification, and holdings records which provides information about the fullness of the information and/or content designation in the record.
entry.
A field within the directory which gives the tag, length and starting character position of a variable field.
entry map.
A data element in the leader which specifies the structure of the entries in the directory. Always set to 4500 in MARC 21 records.
field.
A defined character string that may contain one or more data elements.
field terminator (FT).
ASCII control character 1E(hex) (represented graphically in MARC 21 documentation as FT or ^), which is used to terminate the directory and each variable field within a record. ASCII name for the field terminator is record separator (RS).
fill character.
ASCII graphical vertical bar ( | )(7C(hex)), which fills a required character position and has the meaning "no attempt has been made to code".
fixed field.
A field whose length does not vary. The term is occasionally used to refer to variable control fields, especially those that contain coded data such as fields 007 or 008.
identifier length
see subfield code length
indicator.
A data element associated with a data field that supplies additional information about the field. An indicator may be any ASCII lowercase alphabetic, numeric, or blank. Indicators are not used in control fields.
indicator count.
A data element in the leader which contains the number positions reserved for indicators in each variable data field. The indicator count is always set to 2 in MARC 21 records.
leader.
A fixed field that occurs at the beginning of each record and provides information for the processing of the record.
length.
A measure of the size of a data element, field, or record and is expressed in number of octets.
logical record length.
A data element in the leader which contains the length of the entire record, including itself and the record terminator.
octet.
A string of 8 bits. In some cases (e.g., ASCII) each octet represents a character; in other cases (e.g., Unicode) multiple octets may represent a character.
record.
A collection of data elements describing or identifying one or more units treated as one logical entity.
record length
see logical record length
record status.
A data element in the leader which indicates the relation of the record to a file (e.g., new, updated, etc.).
record terminator (RT).
ASCII control character 1D(hex) (represented graphically in MARC 21 documentation as RT or \), which is used as the final character of a record, following the field terminator of the last data field. ASCII name for the record terminator is "group separator" (GS).
space
see blank
starting character position.
The position, relative to the base address of data, of the first octet of the first character in the variable field referenced by the entry. The first character of the first field following the directory is numbered 0.
status
see record status
subfield code.
The two-character combination of a delimiter followed by a data element identifier. Subfield codes are not used in control fields.
subfield code length.
A data element in the leader which contains the sum of the lengths of the delimiter and the data element identifier used in the record. Always set to 2 in MARC 21 records.
tag.
A three character string used to identify or label an associated variable field. The tag may consist of ASCII numeric characters (decimal integers 0-9) and/or ASCII alphabetic characters (uppercase or lowercase, but not both).
type of record.
A data element in the leader that with bibliographic level, specifies the characteristics and defines the components of the record.
variable control field
see control field
variable data field
see data field
variable field.
A field whose length is determined for each occurrence by the length of data comprising that occurrence. There are two types of variable fields control fields and data fields.

STANDARDS

CHARACTER SETS

MARC 21 records are character encoded, including all lengths. In this section on Record Structure, elements may be specified as ASCII numeric characters, ASCII lowercase alphabetic characters, ASCII uppercase alphabetic characters, ASCII graphic symbol characters, ASCII control characters, ASCII blank character, ASCII graphic characters, or MARC 21 characters. The section on Character Sets defines the repertoire and encoding of each of these subsets of characters.

GENERAL RECORD STRUCTURE

The general structure of a record is represented schematically below.

Structure of a MARC 21 Record

  LEADER  DIRECTORY  FT  CONTROL_NUMBER_FIELD  FT

      CONTROL_FIELD_1  FT   ...   CONTROL_FIELD_n  FT

          DATA_FIELD_1  FT   ...   DATA_FIELD_n  FT  RT 

Each record begins with a leader, which is a fixed field containing information for the processing of the record. Following the leader is the directory, which is an index to the location of the variable fields (control and data) within the record. The fields following the directory are all variable fields. The first variable field is the control number field, which contains an ASCII graphic character string uniquely associated with the record by the organization transmitting the record. Following the control number field are the rest of the control fields, which contain information useful or required for the processing of the record. Following the control fields are data fields, which contain general data. A field terminator (FT), ASCII control character 1E(hex), is used to terminate the directory and each variable field within the record. A record terminator (RT), ASCII control character 1D(hex), is used as the final character of the record, following the field terminator of the last data field. These elements of the record are described in more detail in the following sections.

LEADER

The leader is the first field in the record and has a fixed length of 24 octets (character positions 0-23). Only ASCII graphic characters are allowed in the Leader. The structure of the leader as defined in MARC 21 is represented schematically below. The numbers indicate the character positions occupied by each part of the leader.

Structure of the Leader in MARC 21 Records

RECORD_LENGTH RECORD_STATUS TYPE_OF_RECORD IMPLEMENTATION-        
                                            DEFINED
  00-04          05             06              07-08 

      CHARACTER_CODING_SCHEME  INDICATOR_COUNT SUBFIELD_CODE_LENGTH
      09                       10               11

          BASE_ADDRESS_OF_DATA  IMPLEMENTATION-DEFINED  ENTRY_MAP
          12-16                 17-19                   20-23

Record length (character positions 00-04), contains a five-character ASCII numeric string equal to the length of the entire record, including itself and the record terminator. The five-character numeric string is right justified and unused positions contain zeroes (zero fill). The maximum length of a record is 99999 octets.

Record status (character position 05), contains an ASCII graphic character which indicates the relation of the record to a file (e.g., new, updated, etc.).

Type of record (character position 06), contains an ASCII graphic character which specifies the characteristics and defines the components of the record.

Implementation-defined (character positions 07-08). ANSI Z39.2 and ISO 2709 reserve character positions 07-08 for definition by a particular implementation. The individual MARC 21 formats define these character positions if needed. Positions may contain only ASCII graphic characters. Any position not defined contains a blank.

Character coding scheme (character position 09), contains a code that identifies the character coding scheme used in a record.

Indicator count (character position 10), contains one ASCII numeric character specifying the number of indicators occurring in each variable data field. In MARC 21 records, the indicator count is always 2.

Subfield code length (character position 11), contains one ASCII numeric character specifying the sum of the lengths of the delimiter and the data element identifier used in the record. In MARC 21 records, the subfield code length is always 2. The ANSI Z39.2 and ISO 2709 name for this data element is identifier length .

Base address of data (character positions 12-16), contains five ASCII numeric characters that specify the first character position of the first variable field in the record. It is equal to the sum of the lengths of the leader and the directory, including the field terminator at the end of the directory. The number is right justified and unused positions contain zeroes (zero fill).

Implementation-defined (character positions 17-19). ANSI Z39.2 and ISO 2709 reserve character positions 17-19 for definition by a particular implementation. The individual MARC 21 formats define these character positions is needed. Positions may contain only ASCII graphic characters. Any position not defined contains a blank.

Entry map (character positions 20-23), contains four single digit ASCII numeric characters that specify the structure of the entries in the directory.

Structure of an Entry Map in MARC 21 Record

  LENGTH OF        LENGTH OF            LENGTH OF  
  LENGTH-OF-FIELD  STARTING-CHARACTER-  IMPLEMENTATION-
  PART             POSITION PART        DEFINED PART     UNDEFINED
  20               21                   22               23

DIRECTORY

A directory entry in MARC 21 is made up of a tag, length-of-field, and field starting position. The directory begins in character position 24 of the record and ends with a field terminator. It is of variable length and consists of a series of fixed fields, referred to as "entries." One entry is associated with each variable field (control or data) present in the record. Each directory entry is 12 characters in length; the structure of each entry as defined in MARC 21 is represented schematically below. The numbers indicate the character positions occupied by the parts of the entry.

Structure of a Directory Entry in MARC 21 Records

    TAG     LENGTH_OF_FIELD     STARTING_CHARACTER_POSITION
    00-02   03-06               07-11

Tag (character positions 00-02), consists of three ASCII numeric characters or ASCII alphabetic characters (uppercase or lowercase, but not both) used to identify or label an associated variable field. The MARC 21 formats have used only numeric tags. The tag is stored only in the directory entry for the field; it does not appear in the variable field itself.

Length of field (character positions 03-06), contains four ASCII numeric characters which give the length, expressed as a decimal number, of the variable field to which the entry corresponds. This length includes the indicators, subfield codes, data and field terminator associated with the field. A field length number of fewer than four digits is right justified and unused positions contain zeroes (zero fill). MARC 21 sets the length of the length of field portion of the entry at four characters, thus a field may contain a maximum of 9999 octets.

Starting character position (character positions 07-11), contains five ASCII numeric characters which give the starting character position, expressed as a decimal number, of the variable field to which the entry corresponds relative to the base address of data of the record. A starting character position of fewer than five digits is right justified and unused positions contain zeroes (zero fill).

Order of entries Directory entries for control fields precede entries for data fields. Entries for control fields are sequenced by tag in increasing numerical order. Entries for data fields are arranged in ascending order according to the first character of the tag, with numeric characters preceding alphabetic characters. See Variable Fields below for order requirements for the fields to which the directory entries point.

VARIABLE FIELDS

The variable fields follow the leader and the directory in the record and consist of control fields and data fields. Control fields precede data fields in the record and are arranged in the same sequence as the corresponding entries in the directory. The sequence in which data fields are stored in the record is not necessarily the same as the order of the corresponding directory entries.

Control fields in MARC 21 formats are assigned tags beginning with two zeroes. They are comprised of data and a field terminator; they do not contain indicators or subfield codes. The control number field is assigned tag 001 and contains the control number of the record. Each record contains only one control number field (with tag 001), which is to be located at the base address of data.

Data fields in MARC 21 formats are assigned tags beginning with ASCII numeric characters other than two zeroes. Such fields contain indicators and subfield codes, as well as data and a field terminator. There are no restrictions on the number, length, or content of data fields other than those already stated or implied, e.g., those resulting from the limitation of total record length. The structure of a data field is shown schematically below.

Structure of a Variable Data Field in MARC 21 Records

  INDICATOR_1  INDICATOR_2  DELIMITER  DATA_ELEMENT_IDENTIFIER_1

      DATA_ELEMENT_1  ...  DELIMITER  DATA_ELEMENT_IDENTIFIER_n  

          DATA_ELEMENT_n  FT

Indicators are the first two characters in every variable data field, preceding any subfield code (delimiter plus data element identifier) which may be present. Each indicator is one character and every data field in the record includes two indicators, even if values have not been defined for the indicators in a particular field. Indicators supply additional information about the field, and are defined individually for each field. Indicator values are interpreted independently; meaning is not ascribed to the two indicators taken together. Indicators may be any ASCII lowercase alphabetic, numeric, or blank. A blank is used in an undefined indicator position, and may also have a defined meaning in a defined indicator position. The numeric character 9 is reserved for local definition as an indicator.

Subfield codes identify the individual data elements within the field, and precede the data elements they identify. Each data field contains at least one subfield code. The subfield code consists of a delimiter (ASCII 1F (hex)) followed by a data element identifier. Data element identifiers defined in MARC 21 may be any ASCII lowercase alphabetic or numeric character. In general, numeric identifiers are defined for data used to process the field, or coded data needed to interpret the field. Alphabetic identifiers are defined for the separate elements which constitute the data content of the field. The character 9 and the following ASCII graphic symbols are reserved for local definition as data element identifiers:

 ! " # $ % & ' ( ) * + , - . / : ; < = > ? { } _ ^ ` ~ [ ] \  

A data field may contain more than one data element, depending upon the definition of the field. The last character in a data field is the field terminator, which follows the last data element in the field.

DESIGN PRINCIPLES FOR MARC 21

A MARC 21 format is a set of codes and content designators defined for encoding a particular type of machine-readable record. The MARC 21 formats as a group serve as a vehicle for authority, bibliographic, classification, community information, and holdings data of all types. These formats are intended to be communication formats and are primarily designed to provide specifications for the exchange of information between systems. The following description of design principles repeats, in some cases, information given above but is given again for completeness.

Content Designation

The purpose of content designation is to identify and characterize the data elements which comprise a MARC record with sufficient precision to support manipulation of the data for a variety of functions. The MARC 21 formats have attempted to preserve consistency of content designation across formats where this is appropriate.

The MARC 21 content designation supports the sorting of data only to a limited extent. In general, sorting must be accomplished through the application of external algorithms to the data.

The MARC 21 formats provide for using content designation, e.g., tag values or indicators, to specify recommended display constants. A display constant is a term, phrase, and/or spacing or punctuation convention that may be system generated under prescribed circumstances to make a visual presentation of data in a record more meaningful to a user. The display constant text is not carried in the data, but may be supplied for display by the processing system.

Variable Fields and Tags

The data in a MARC 21 record is organized into fields, each identified by a three-character tag. Although ANSI Z39.2 and ISO 2709 allow both alphabetic and numeric characters, MARC 21 formats use only numeric tags. The tag is stored in the directory entry for the field, not in the field itself. Variable field tags are defined in blocks according to the first character of the tag, which, with some exceptions, identifies the general function of the field's data within a record. The type of information in the field is identified by the remainder of the tag. The meaning of these blocks depends upon the type of record.

The bibliographic format blocks are:

The authority format blocks are:

The classification format blocks are:

The community information format blocks are:

The holdings format blocks are:

Within some blocks of variable fields, parallels of content designation are preserved, e.g., bibliographic records (1XX, 4XX, 6XX, 7XX, 8XX), authority records (1XX, 4XX, 5XX, 7XX), classification records (70X-75X), and community information records (1XX, 4XX, 6XX, 7XX). The following meanings are generally given to the final two characters of the tag of fields in these blocks:

Note Fields

Rules have been developed for the MARC 21 formats that guide when a separate field should be defined for note data and when the data should be included in a general note field. For the MARC 21 bibliographic format, a specific 5XX note field is defined when at least one of the following is true:

  1. Categorical indexing or retrieval is required on the data defined for the note. The note is used for structured access purposes but does not have the nature of a controlled access point.
  2. Special manipulation of that specific category of data is a routine requirement. Such manipulation includes special print or display formatting or selection or suppression from display or printed product.
  3. Specialized structuring of information for reasons other than those given above, e.g., to support particular standards of data content when they cannot be supported in existing fields.

For the MARC 21 authority format, the specifications for notes are covered in the following two conditions:

  1. A specific note field is needed when special manipulation of that specific category of data is a routine requirement. Such manipulation includes special print or display formatting or selection or suppression from display or printed product.
  2. Multiple notes are generally not established to accommodate the same type of information for different types of authorities. Notes are thus not differentiated by or limited to subject, name, or series if the same information applies to more than one type.

Local Fields

Certain tags have been reserved for local implementation. The MARC 21 formats specify no structure or meaning for local fields. Communication of such fields between systems is governed by mutual agreements on the content and content designation of the fields communicated.

In general, any tag containing the character 9 is reserved for local implementation within the block structure. Specifically the 9XX block is reserved for local implementation as indicated above. The historical development of the MARC 21 formats has left one exception to this general principle: field 490 (Series Statement) in the bibliographic format. There are several obsolete fields with tags containing the character 9 (e.g., 009 (Physical Description Fixed-Field for Archival Collection) and 039 (Level of Bibliographic Control and Coding Detail)). The indicator value 9 and subfield 9 are also reserved for local implementation.

Repeatability

Theoretically, all fields, except 001 (Control Number) and 005 (Date and Time of Latest Transaction), and subfields may be repeated. The nature of the data, however, often precludes repetition (e.g., a bibliographic or community information record may contain only one field 245 (Title Statement); an authority or classification record may contain only one 1XX heading field). The repeatability or nonrepeatability of each field and subfield is specified in the MARC 21 formats.

Coded Data

In addition to content designation, the MARC 21 formats include specifications for the content of certain data elements, particularly those that provide for the representation of data by coded values. Coded values consist of fixed-length ASCII character strings. Individual elements within a coded-data field or subfield are identified by relative character position. Although coded data occur most frequently in the leader, directory, and variable control fields, any field or subfield may be defined for coded data.

Certain common values for codes used in coded data have been defined:

blank (ASCII 20 (hex))
Undefined (element not defined)
n
Not applicable (element not applicable to the item)
u
Unknown (record creator was unable to determine value)
z
Other (value other than those defined for the element)
|
Fill character (record creator has chosen not to provide information)

Historical exceptions to these definitions may occur in the formats. In particular, the blank has been defined as not applicable, or has been assigned a meaning.


MARC 21 HOME >> Specifications >> Record Structure

The Library of Congress >> Especially for Librarians and Archivists >> Standards
( 12/05/2007 )
Contact Us