DISCUSSION PAPER NO. 91

DATE: December 1, 1995
REVISED:

NAME: Machine Generation Flag in USMARC Authority Records

SOURCE: Cooperative Cataloging Council, Series Authority Record Task Group

SUMMARY: This paper discusses options for flagging USMARC authority records that have been created or modified by machine.

KEYWORDS: Authority format; Field 008/08 (Authority); Field 008/33 (Authority); Level of establishment; Machine-generation flag

STATUS/COMMENTS:

12/01/95 - Forwarded to the USMARC Advisory Group for discussion at the January 1996 MARBI meetings.

1/21/96 - Results of USMARC Advisory Group discussion - Several vendors and networks that did machine generation of authority records indicated that they marked the records with local content designation. It was asked if this was useful for internal systems but unnecessary in the communications environment? Would the value mean different things in different systems as the generation mechanisms may vary, producing authority records with various levels of completeness and standardization? It was indicated that the PCC may establish 3 levels of machine generated records, but they did not request that format to reflect these. The USMARC Advisory Group requested more information from the PCC concerning:


DISCUSSION PAPER NO. 91: Flagging Machine Generation

1.  BACKGROUND

This paper discusses flagging USMARC authority records that have
been created or modified by machine.  It presents the rationale
for indicating that an authority record was machine generated and
suggests several options for providing such a flag.  Options
suggested include ones that make use of new and existing
authority record data elements.  The need for such a flag is an
outgrowth of a national effort to increase the amount of
authority control provided in national data bases.


2.  DISCUSSION

In the spring of 1994 the Cooperative Cataloging Council (CCC)
established the Series Authority Record Task Group to define the
content and functional uses of series authority records.  The
creation of this group followed a two year period (1993 to 1994)
during which the Library of Congress considered making changes to
the amount of authority work done for monographic series titles. 
A September 1994 report from the Task Group made suggestions and
recommendations to the CCC about changes to the series authority
record which it said were needed to support the goals of the
Program for Cooperative Cataloging (PCC).  As a result of PCC
Executive Council review of the report at the ALA Midwinter
Conference in 1995, one of the recommendations from the Task
Group was forwarded to MARBI for consideration.

The recommendation forwarded to MARBI was that a data element
should be made available in the USMARC Format for Authority Data
for indicating that an authority record (for a series title or
any other heading type) was initially generated by machine.  The
Task Group suggested this because it believed that this
information was important in the context of computers being used
to generate some records in the National Authority File (NAF) so
that all headings used in access points could be under authority
control.  In their proposal they suggested that a new code in an
existing fixed-length data element (008/33 (Level of
establishment) could be used.


Current State of Machine Generation

Many library systems already provide for the automatic generation
of authority records for headings in authority controlled fields
in bibliographic records.  In most systems with this
functionality, authority records are created for any heading not
already covered by an existing authority record.  The content of
machine-generated authority records varies but some systems are
able to create records which contain as much information as a
human would supply when simple headings are involved.  Examples
of the kinds of data elements supplied by machine include the 1XX
(Heading) field, 670 (Sources Found Note), and certain control
information.  It is even possible for systems to provide some
cross references, although in most case this is left for humans
to provide.  Unfortunately, the USMARC Authority format does not
include any data element designed to indicate creation or
manipulation of a MARC record by machine.

Machine generation of authority records offers a means for
libraries to provide full authority control while reducing
individual effort.  Both time and cataloging resources can be
saved.  Even if an authority record is later updated by a human
to add references and other information, creation of a brief
record by machine from data already keyed in a bibliographic
records avoids rekeying and the cost connected to it.  When
multiplied by thousands of headings, the savings can be
significant.  Machine generation of an authority records from a
heading in a bibliographic record also guarantees a match between
the two.  System validation of headings in a bibliographic files
against an authority file is often part of the process.  With the
functionality of library systems expanding, machine generation
and manipulation of authority records is already widely
available.


Task Group Requirements

The flagging of machine-generated records could meet several
requirements.  The Task Group suggests that a machine generation
flag is needed for analytical purposes.  It would facilitate the
assessment of the effects of machine generation on the overall
character of authority files.  If defined adequately, it could
also help to improve software that generates authority records
automatically.  A data element to signal machine generation is
essential in order to identify records that have not been
reviewed and updated by a human.  In an environment where
authority data is shared, it would  allow systems to prioritize
authority records, giving, perhaps, greater value to records
created by human than to those created by machine.

The SAR Task Group is of the opinion that it is important to
identify machine-generated authority records as a distinctive
group.  They believe that in the future machine-generated
authority records will reach such a level of sophistication in
production that they will coexist with human-generated records in
resource files including the National Authority File.


3.  POSSIBLE OPTIONS

     a)    Make use of an existing fixed-length USMARC authority
           data element by validating a new value.  The CCC
           recommended using field 008/33 (Level of establishment)
           to indicate that the record was machine generated. 
           This would have the advantage of making use of an
           existing data element that could be easily and reliably
           coded by machine.  The disadvantage of using 008/33 is
           that the data element as currently defined relates to
           the heading in a 1XX field, not necessarily the entire
           record.  Even though a record may be machine generated,
           the heading might be "fully established" (one of the
           other currently-valid code defined for 008/33).  The
           use of a new code would eliminate the possibility of
           also coding one of the other aspects that is handled by
           008/33.  Field 008/29 (Reference evaluation) might be a
           more appropriate data element for which to define a new
           code.  It is assumed that in the case of
           machine-generated records, the need/evaluation of
           references is the area where catalogers would be likely
           to have the most concern.  In most cases, particularly
           if the heading field were generated from a
           bibliographic record, the 1XX field would be reliably
           authoritative.

     b)    Make use of an existing variable length USMARC data
           element.  Field 042 (Authentication Code) might be
           ideal for this purpose.  Since the data in this field
           is not often validated, it would result in the least
           change needed to implementations of the USMARC
           authority format.  A special code or codes could be
           used to identify the lack of human authentication for
           the record.  Field 040 (Cataloging Source) could also
           be used, although since none of the currently-defined
           subfields would be appropriate for a machine-generation
           flag, a new subfield would be needed.

     c)    Define one of the available (undefined) field 008
           positions (e.g., position 08) for a machine generation
           flag.  The advantages to this option are that it does
           not confuse or eliminate the coding possible in other
           fixed-length data elements or variable fields.  As a
           separate data element, several values could be defined
           to allow the quality/complexity of the machine
           generation to be specified more accurately (e.g.,
           machine generated 1XX only, or 1XX and 670, or 1XX,
           670, and obvious 4XX references based on computer
           algorithms).  If field 008/08 were undesirable for some
           reason, field 008 positions 18-27 and 30 are also
           currently undefined.


4.  QUESTIONS

The suggestion of defining or identifying an existing USMARC
Authorities data element to flag machine generation raises
several questions.

     1)    What function would the flag actually serve?  Would
           USMARC users be likely to really use the information
           about machine-generation to some end?  Some people
           worry that users would be doing a lot of coding that
           nobody would make much use of.

     2)    Would the machine-generation flag be permanent?  If
           not, changing the flag to some other value would
           further burden catalogers who must already update
           authority records for other purposes.

     3)    What assumptions are there behind machine-generated
           records?  Would a flag such as the one suggested in
           this paper imply certain characteristics in the record,
           for example, certain fields present, other lacking?

     4)    Is machine-generation a concern if quality is not
           affected?  Some have suggested that as many as 50% of
           authority records could be machine generated with equal
           content and quality because references are not
           involved.  If this is true, would such record be better
           off without the machine generated flag?

     5)    What is the analysis design behind the CCC request for
           a flag for machine generation.  What kinds of analyses
           are likely to be depended on it?

     6)    Is there a need to identify what pieces of an authority
           record were generated by machine, i.e., at the
           field/subfield level?  (NOTE: Some cataloging agencies
           use a locally defined subfield to indicate machine
           manipulation of access points.)

     7)    What are the implications of the existence of a
           machine-generated flag on existing authority files that
           contain machine-generated records.  None of the options
           can deal with the perhaps large number of
           machine-generated records that already exist.

     8)    How would a machine generation flag relate to other
           record-level flags in the Authority format? (record
           completeness in Leader/17; how heading was constructed
           in field 008/10 and /11; reference evaluation in field
           008/29; level of establishment in field 008/33).


Go to:


Library of Congress
Library of Congress Help Desk (09/03/98)