PROPOSAL NO.: 2005-02

DATE: December 10, 2004
REVISED:

NAME: Definition of Subfield $y in Field 020 (International Standard Book Number) and Field 010 (Library of Congress Control Number) in the MARC 21 Formats

SOURCE: The MARC of Quality; Karen Anspach Consulting

SUMMARY: This paper proposes defining a new subfield $y for non-unique/non-applicable ISBN/LCCN in fields 020 and 010 respectively.

KEYWORDS: Field 020 (BD,AD,HD); Field 010 (all formats); International Standard Book Number (BD,AD,HD) ; Library of Congress Control Number (all formats)

RELATED: 2004-DP04 (June 2004)

STATUS/COMMENTS:

12/10/04 - Made available to the MARC 21 community for discussion.

01/15/05 - Results of the MARC Advisory Committee discussion - The proposer essentially withdrew the proposal. Procedurally, the proposal was rejected by the committee. The group decided to alter the definition of subfield $z in fields 010 and 020 to allow for non-unique/non-applicable ISBN/LCCNs. LC will work with the originator of the proposal to work out a revised definition of subfield $z. It was suggested that field 776 subfield $z could be used to include ISBNs for different manifestations.

03/02/05 - Results of LC/LAC/BL review - Agreed with the MARBI decisions.


Proposal No. 2005-02: Use of ISBNs and LCCNs in MARC 21 Bibliographic Records

1. BACKGROUND

The ISBN and LCCN are the most universally available identifiers for systems that manage and share bibliographic records. Because these numbers are ostensibly intended to uniquely identify specific manifestations of a work, they are used for a wide variety of automated processes to detect matching records in both bibliographic and non-bibliographic databases. When these numbers are not recorded accurately in records, they become undependable.

The MARC Advisory Committee discussed the issue of defining a new subfield for inappropriate ISBN and LCCN numbers in fields 020/010 in its meeting in June 2004 with Discussion Paper No. 2004-04 (Use of ISBNs and LCCNs in MARC 21 Bibliographic Records). Some participants felt that a new subfield was not necessary, however, they did maintain that more specific instructions on how to code incorrect numbers were needed in the formats. Consensus of the group was to consider Option 1 (Addition of a new MARC 21 subfield to identify LCCNs and ISBNs that do not relate to the manifestation being described) in a follow-up proposal.


2. DISCUSSION

2.1. Use of ISBNs and/or LCCNs in systems

Systems and processes using ISBNs and/or LCCNs as the primary identifier include:

Library automation system database creation and maintenance. The ISBN and LCCN are used during initial database loads, as well as for ongoing record import processes to determine whether a MARC 21 record is already present in a bibliographic database. In a typical scenario, the ISBN and LCCN are used to determine whether an incoming bibliographic record must be added to the database, or whether new holdings information from the incoming record can be added to an existing bibliographic record. In some cases no additional crosschecks are performed to catch differences between records that have the same ISBN or LCCN when loading records to library automation systems.

Virtual Catalogs. Virtual Union Catalogs must dedupe records 'on-the-fly' to eliminate duplicate titles held by multiple agencies from the search results. These systems must determine whether to display a single "virtual" bibliographic record linked to the many holdings of the participating libraries, or whether multiple records from the linked databases must be displayed. Because of their universality, the ISBN and LCCN are good choices for this deduping. Additional crosschecks may or may not be performed to catch differences between records that have the same ISBN or LCCN (such crosschecking could slow retrieval and display).

Union Catalogs. Like library automation systems, Union Catalogs use the LCCN and ISBN as key identifiers for both initial and ongoing record loading, to determine whether a record is already present in an union database. As above, the ISBN and LCCN are used to determine whether new holdings information can be added to an existing record, or whether a new bibliographic record must be added to the file. In this scenario, additional crosschecks are usually performed to catch differences between records that have the same ISBN or LCCN.

Record upgrading. OCLC and other record providers use the LCCN and ISBN to match a library’s records against their own database in order to return more complete records to the library, e.g., during non-MARC to MARC conversions or MARC upgrades. Additional crosschecks are usually done to catch differences between records that have the same ISBN or LCCN.

Copy cataloging and searching. Many institutions rely on the LCCN and ISBN to search databases for MARC 21 records to use for copy cataloging. This process may be manual or automated, and is often performed with little or no crosschecking to ensure that the record retrieved correctly matches the record in-hand.

Record enhancement. Some vendors provide services that "enrich" the data in a library’s database. For example, Summary notes (520), Contents notes (505), cover graphics, additional information about the author, etc., can be added to bibliographic records to enhance the display in a library's public access catalog. These vendors use the ISBN in the library’s bibliographic record as the match point against their own enhanced data.

Ordering processes. Acquisitions departments use ISBNs to place orders and to locate corresponding MARC 21 records on copy cataloging databases.

FRBR. The ISBN may be used in the FRBR structure to differentiate between various expressions/manifestations of a work.

In the absence of a single unique identifier for bibliographic records, ISBNs and LCCNs are relied upon in a wide variety of situations for the matching of bibliographic records. The fact that this matching depends more and more on automated processes that have few crosschecks and/or little human intervention makes the accuracy and use of these numbers critical.


2.2. Problems with the current handling of ISBNs and LCCNs

If ISBNs and LCCNs are to perform the above tasks properly, they must be entered accurately in records. Unfortunately, neither ISBNs nor LCCNs are completely dependable as unique identifiers, and they are becoming increasingly undependable. We are all aware of existing problems resulting from ISBNs and LCCNs inappropriately assigned to different resources by publishers, who mistakenly use a number supplied to them for one resource for another resource that is different enough to merit a separate bibliographic record. There may be little we can do to change publisher practice in this regard. Other common problems, however, are caused by inconsistent handling of LCCNs and ISBNs when they are entered in MARC 21 records. This issue is something we can address and resolve with the addition of a new subfield to the fields used for these numbers.

Problems with ISBN and LCCN assignment include:


2.3. ISBN

The ISBN.ORG website says that "The ISBN is a unique machine-readable identification number, which marks any [resource] unmistakably." AACR 1.8B says we are to include an ISBN that "applies to the item being described" in the bibliographic record for the item.

The cataloging rule also provides for optionally including other ISBNs that are found on the item, with their appropriate qualifications. Usually these other ISBNs also apply to the item (e.g., a paperback ISBN found on a hardcover book, when the paperback is exactly the same as the hardcover, except for its binding). However, this provision has also been extended by many cataloging agencies to allow the inclusion of ISBNs that appear on an item but do not apply to the item being described (e.g., an ISBN for an audiobook appearing on a book).

Currently, the MARC 21 Format for Bibliographic Data does not provide a means of distinguishing by subfield coding between ISBNs that uniquely apply to the item being described and those that do not, when this is known.

If a non-unique ISBN is entered in the 020$a in a bibliographic record, automated processes may consider that record identical to other, quite different records that also contain the same number in their 020$a. This can result in improperly matched records and the unfortunate overlay of one bibliographic record with a record for a different resource. In such a case, one of the records is lost from the database, and its holdings are incorrectly linked to whatever record is retained. Such improper overlaying affects cataloging, circulation, and public access.


2.4. Addition of a new subfield to identify an ISBN that is a valid number but does not uniquely apply to the manifestation being described

A new subfield $y could be defined for field 020 to be used for a valid ISBN that is found on an item, but is known to be not unique or not applicable to the item being described. The definition for 020$a could be clarified to restrict the use of the 020$a subfield to only those ISBNs that are valid and are known or assumed to be uniquely applicable to the item being described. Note that subfield $y is defined in field 022 (International Standard Serial Number) as "Incorrect ISSN" and defined as "an incorrect ISSN that has been associated with the continuing resource."

If a cataloging agency would use a single record for the manifestation(s) described, then subfield $a is used. Thus, in the case of the hardbound vs. paperbound manifestation, if the cataloging agency uses one record for both, subfield $a would be used. In the case where separate records would be required, e.g. audiobook and regular print, then subfield $y is used for the ISBN that is not applicable.

Definition of a separate subfield would allow library automation systems to be able to index and search on 020$y data for access and retrieval during searching and display functions, but would exclude this subfield when performing record-matching processes (i.e., an ISBN recorded in 020$y should be disregarded by automated matching processes). This would help in the problem of having one record overlay another on the basis of an inappropriate $a.

Field 020 is currently defined as follows:
  This field contains the International Standard Book Number (ISBN), the terms of availability, and any canceled or invalid ISBN. Each field 020 contains all the information relevant to one ISBN, or if no ISBN exists, relevant to one item. Field 020 is repeated for multiple numbers that refer to different manifestations of a work (e.g., ISBNs for the hard bound and paperback manifestations).
           
The following subfields are defined:
  $a   International Standard Book Number (NR)  
  $c   Terms of availability (NR)  
  $z   Cancelled/invalid ISBN (R)  
  $6   Linkage (NR)  
  $8   Field link and sequence number (R)  
           
Field 020 is defined in the MARC 21 Bibliographic, Authority and Holdings Formats.

2.4.1. Proposed changes to current 020 field

Field definition and scope. The field definition and scope could be revised as follows:
  This field contains an International Standard Book Number (ISBN), the terms of availability, and any canceled, invalid , non-unique and/or non-applicable ISBNs. Each field 020 contains all the information relevant to one ISBN, or if no ISBN exists, relevant to one item. Field 020 is repeated for multiple numbers that refer to different editions of a work (e.g., ISBNs for hard bound and paperback editions).
       
Definition of 020 $a. The subfield is currently defined as follows:
  Subfield $a contains a valid ISBN for the item. Parenthetical qualifying information, such as the publisher/distributor, binding/format, and volume numbers, is not separately subfielded.
       
Proposed change to the definition of 020$a. The subfield could be revised as follows:
  Subfield $a contains a valid ISBN found on an item when that ISBN uniquely applies to the item being described. Parenthetical qualifying information, such as the publisher/distributor, binding/format, and volume numbers, is not separately subfielded.
       
  Each valid, uniquely applicable ISBN found on an item is given in a subfield $a in a separate 020 field (e.g., an ISBN for a hardcover binding and another for a soft-cover binding of exactly the same version of a work, when both are found on the same item). The same ISBN cannot be given as unique and applicable (in subfield $a) in more than one, different, bibliographic record.
       
Definition of subfield $z. Subfield $z is currently defined as follows:
  Subfield $z contains a canceled or invalid ISBN and any parenthetical qualifying information. Each canceled/invalid ISBN is contained in a separate subfield $z. If no valid ISBN exists, subfield $z may be used alone in the record.
       
It could be revised to include guidance as to what constitutes a canceled or invalid ISBN:
  An ISBN is considered to be canceled when a publisher designates it as such. An ISBN is considered to be invalid when its length or structure is incorrect or its check digit does not agree with the formula for calculating such.
       

2.4.2. Definition of a new subfield $y

A new subfield $y (Non-unique/non-applicable ISBN) could be defined as follows:
  Subfield $y contains a valid ISBN found on an item when that ISBN is not unique or does not apply to the manifestation being described. Parenthetical qualifying information may be included in this subfield. Each non-unique/non-applicable ISBN found on an item is given in a separate subfield $y. If no valid, uniquely applicable ISBN is available, subfield $y may be used alone in the 020 field.
   
  An ISBN is considered to be non-unique to an item when it is known that it also appears on another item that is a different manifestation (e.g., different publisher), expression (e.g., large-print vs. regular print) or edition (e.g., earlier or later edition).
   
  An ISBN is considered to be non-applicable to an item when it really applies to another item that is a different manifestation (e.g., previous publisher), expression (e.g., a large-print ISBN appearing on a regular print book), or edition (e.g., one that is no longer in print). For example, the first of the following ISBNs is considered applicable in a record because the manifestation to which it belongs is described by the rest of the record, but the second ISBN is for a different manifestation of the work which needs a separate record:
     
    020 ## $a0416728901
    020 ## $y042350780X (large print)

2.5 LCCN

LC assigns different LCCNs to different manifestations when they are judged to need separate bibliographic records according to the guidelines developed for making this determination. However, when publishers secure an LCCN through LC's Preassigned Control Number program for a particular manifestation, they sometimes print the same LCCN in a subsequent manifestation that actually requires a separate record, and therefore a new, unique number. (The Preassigned Control Number Program enables LC to assign LCCNs prior to publication in order to facilitate cataloging and other book processing activities when the publisher prints the control number in the book.)

The MARC 21 Format for Bibliographic Data does not provide a means of distinguishing between LCCNs that are applicable to the manifestation being described and those that are not (i.e., LCCNs appearing on resources that do not match the records to which the LCCNs were originally assigned). Currently the 010$z is listed as used for invalid LCCN, but the instructions provided do not make it clear whether or not an LCCN that is found on a resource can be considered 'invalid' when that resource does not match the record to which LC originally assigned that LCCN. Because of the lack of clear instructions, current practice is inconsistent and some catalogers use 010$z for an LCCN that is found on a resource that does not match the record to which LC originally assigned that LCCN, while others use 010$a for these LCCNs.

The situation with recording inappropriate LCCNs is similar to that described above for ISBNs. If an LCCN that was originally assigned by LC to another manifestation is entered in the 010$a in a bibliographic record, automated processes may consider that record identical to other, quite different records that also contain the same number in an 010$a. This can result in improperly matched records and the unfortunate overlay of one bibliographic record with a record for a different resource. In such a case, one of the records is lost from the database, and its holdings are incorrectly linked to whatever record is retained. Such improper overlaying affects cataloging, circulation, and public access.

2.5.1. Addition of a new subfield to identify an LCCN that is not applicable to the manifestation being described

A new subfield $y (Non-applicable LCCN) could be defined in field 010 to be used for an LCCN found on a resource but originally assigned by LC to a manifestation other than the one being described. The definition for 010$a could be clarified to restrict the use of the 010$a subfield to only those LCCNs that are valid and uniquely assigned by LC to that single manifestation of a work. Definition of a separate subfield would allow library automation systems to be able to index and search on 010$y data for access and retrieval during searching and display functions, but would exclude this subfield when performing record-matching processes (i.e., an LCCNrecorded in 010$y should be disregarded by automated matching processes).

Field 010 is currently defined as follows:
  This field contains unique numbers that have been assigned to a bibliographic record by the Library of Congress. The control number for MARC records distributed by LC is an LC control number (LCCN).
   
  The LC control number is carried in field 001 (Control Number) in records distributed by LC's Cataloging Distribution Service and in field 010$a. An organization using LC records may remove the LC control number from field 001 and use field 001 for its own system control number.
   
  An LC record may contain field 010 with a cancelled or invalid control number of a previously-distributed record. A record may be cancelled because it is a duplicate record for the same item. The structure of the canceled/invalid control number is the same as that used by LC in field 001.

The following subfields are available:
  $a LC control number (NR)
  $b NUCMC control number (R)
  $z Cancelled/invalid LC control number (R)
  $8 Field link and sequence number (R)
   
Field 010 is defined in all MARC 21 formats.

2.5.2. Proposed changes to current 010:

Field definition and scope. The field definition and scope could be revised to include a statement about assignment of LCCNs:
  This field contains unique numbers that have been assigned to a bibliographic record by the Library of Congress. The control number for MARC records distributed by LC is an LC control number (LCCN). LC assigns different LCCNs to different manifestations when they are judged to need separate bibliographic records according to the guidelines developed for making this determination.
   
Definition of 010$a. Subfield $a is currently defined as follows:
  Subfield $a contains a valid LC control number (see explanation of structure of this number given below).
   
Proposed change to the definition of 010$a.
  Subfield $a contains a valid LC control number that is applicable to the manifestation being described number (see explanation of structure of this number given below).
   
  The same LCCN cannot be given as applicable (in subfield $a) in more than one, different, bibliographic record. An LCCN is considered to be applicable to the manifestation being described when it is known, or can safely be assumed, that the manifestation matches the record to which the LCCN was originally assigned.
   
Proposed change to subfield $z. Subfield $z is currently defined as follows:
  Subfield $z contains a canceled or invalid LC control number, including invalid NUCMC numbers. Each canceled/invalid LCCN is given in a separate subfield $z in a single 010 field.
   
It could be revised to include guidance as to what constitutes a canceled or invalid LCCN:
  An LCCN is considered to be canceled when LC designates it as such. An LCCN is considered to be invalid when its length or structure is incorrect.
   
Proposed 010 $y definition:
  Subfield $y contains an LCCN found on an item or inadvertently assigned to it when that LCCN is not applicable to the manifestation being described.
   
  Each non-applicable LCCN is given in a separate (repeatable) subfield $y in a single 010 field. If no valid or applicable LCCN is available for a resource, subfield $y may be used alone in the 010 field.
   
  An LCCN is considered to be not applicable to the manifestation being described when it is known that the manifestation does not match the record to which the LCCN was originally assigned.

3. CONCLUSIONS

Including a subfield $y in a record assumes that the cataloger will be aware that ISBNs and LCCNs are not always applicable to the manifestation in hand. For example, if a cataloger knows that a publisher has assigned the same ISBN to the first and second editions of a work or reuses an ISBN on a different work or manifestation of a work, then the cataloger will record the ISBN(s) from the items differently than if this fact is not known.

The fact that publishers do not assign ISBNs correctly is not a problem that a revision to the MARC 21 formats can address. Providing a new subfield for the handling of ISBNs and LCCNs for manifestations other than the one represented by that record will be a significant step forward in improving the accuracy of record-matching processes, whether automated or manual, in any system utilizing MARC 21 records.

The changes proposed will not "fix" ISBNs or LCCNs for other manifestations that are already present in the 020$a or 010$a of existing records. They will, however, provide catalogers and system vendors with a consistent method for handling this problem for new records, and for any database cleanup efforts.


4. PROPOSED CHANGES

4.1. ISBN

•Add subfield $y (Non-unique/non-applicable ISBN) as defined above.

•Revise field definition and scope, $a and $z as above.

4.2. LCCN

•Add subfield $y (Non-applicable LCCN) as defined above.

•Revise field definition and scope, $a and $z as above.


Go to:
Library of Congress Library of Congress
Library of Congress Help Desk ( 03/02/2005 )