NAME: Changes to FTP File Label Specifications for Electronic Files of USMARC Records
SOURCE: Library of Congress
SUMMARY: This paper proposes changes, originally proposed by the participants in the European CoBRA FLEX Project 10164 for the file label that is used for files of USMARC records transferred via the File Transfer Protocol (FTP). Additional fields are proposed that have been deemed necessary for exchange of records in a variety of MARC formats.
KEYWORDS: FTP Label; File Transfer
RELATED: DP61 (Jan. 1993); 93-9 (June 1993); DP94 (Jan. 1996)
STATUS/COMMENTS:
5/6/96 - Forwarded to USMARC Advisory Group for discussion at the July 1996 MARBI meetings.
7/6/96 - Accepted with the following change: Proposal 2. The end-of-field marker may be either carriage return (X'0D') or carriage return followed by line feed (X'0D''0A'). Do not use number sign or the current X'1E'.
8/6/96 - Result of final LC reveiew - Agreed with MARBI decision.
PROPOSAL NO. 96-7: Changes to FTP File Label Specifications for Electronic Files 1. INTRODUCTION The European library community has been investigating the use of the Internet File Transfer Protocol (FTP) for the electronic exchange of bibliographic data. The European Commission's Libraries Programme through CoBRA (Computerized Bibliographic Record Actions) has funded the FLEX (File Label EXchange) Project 10164 to investigate the need for standards in this area, and "to suggest a suitable file labelling and naming format". The participants in the FLEX Project understand that without standardization in the way files are described within the label file, it would become increasingly difficult to exchange bibliographic information internationally. Because the USMARC specification for electronic file transfer has been widely reviewed by the USMARC community and is now in use by many exchange partners of bibliographic records, the FLEX project participants have proposed that the USMARC specification be used as the base specification. However, they have proposed some enhancements to that specification to take into account a European dimension for exchanging and processing bibliographic data. In addition, the FLEX Project participants have suggested a file naming convention for use when certain operating system constraints apply. 2. PROPOSED CHANGES See Attachment A for an FTP File label example. See Attachment B for revised definitions of the fields. Proposal 1. Change label file character set It is proposed that the character set of the label file conform to ISO 646-IRV or ASCII. (There are two differences between ISO 646- IRV and ASCII: 1) ISO 646 character position "24" is the universal currency symbol whereas this character is the "$" symbol in ASCII; 2) ISO 646-IRV character position "7E" is an overline or tilde whereas this character is the tilde in ASCII. These differences should not be problematic.) Proposal 2. Change the end-of-field character symbol from the current end-of-field marker (X'1E') to the number sign "#" (X'23'), followed by a carriage return (X'0D') or carriage return/line feed (X'0D''0A') depending on operating systems used. There was objection to using the USMARC end-of-field character (X'1E') in what was felt should be a text file. It is, therefore, proposed that the same end-of-field characters that are currently used in the diskette FTP file label specification be used in this file label specification. These characters can be supplied by any operating system. Proposal 3. Add optional field CID (Country Identifier) Field ORS (Originating System ID) is, in some cases, insufficient to identify the originating system. When necessary, the CID (Country Identifier) field would be used with the ORS field but its use would remain optional. The country identifier would be the two-character alpha code defined by ISO 3166 (Codes for the Representation of Names of Countries). Proposal 4. Make the FOR Field (Format) mandatory It is proposed that the existing FOR field (Format) be made mandatory to identify the structural format standard used for records in the file. For example, "M" = Z39.2 (or its equivalent ISO 2709), and "S" = SGML (ISO 8879). Proposal 5. Add optional field FQF (Format Qualifier) Field FOR (Format) is insufficient in itself to completely describe the format of the record file, (e.g., for identifying a particular tag set/specification for Z39.2 records or a particular DTD for SGML records. The FQF (Format Qualifier) field would be used in conjunction with the FOR (Format) field but its use would remain optional. It is proposed that the FQF field follow immediately after the FOR field in field sequence. The content of the FQF field would be taken from a list of formats (e.g., similar to the list of MARC format types in the Z39.50 Registered Record Syntaxes** and DTDs. For SGML files the DTD is indicated by the highest level tag in the document instance (or in the tag DOCTYPE in the DTD itself). **(http://lcweb.loc.gov/z3950/agency/objects/syntax.html) Examples: FQF USMARC FQF BOOK SYSTEM "iso12083-book.dtd" (DTD specified in ISO 12083) Proposal 6. Add optional fields CS<0-n> (Character Set<0-n>) To assist specifying character sets and character set variations, it is proposed that two sets of fields be added. The first are CS<0-n> (Character Set <0-n>) which specify the character sets found in the file. CS0 would specify the initial character set needed for processing the records in the file. This indicates, at least, the G0 set needed. It may indicate an 8-bit set in which case it is more than the G0 set. For USMARC, it can be specified as either ASCII (the G0 part of the USMARC character set) or as USMARC. CS1 indicates an additional set needed in the file; CS2 indicates another character set used in the file; etc. The content of each CS<0-n> would equate to a particular international standard character set identifier (e.g., extended Latin ISO 5426 - 1983), an ISO registration number (e.g., Registry #37), text (e.g., USMARC), or a reference to a private character set. If the field content represents a private character set then the reader should be pointed to the NOT field (Notes) for further information on processing requirements or the REP (Reply To) for a person to contact. An occurrence could specify an additional control set such as ISO 6630. The use of the CS0 fields is redundant for USMARC records. Once the USMARC format is defined (in FOR and FQF), the initial character set is implied. In the USMARC context, the use of the CS1-n fields are also redundant as character sets are specified in each record. (The absence of an 066 implies USMARC Roman is used.) The USMARC 066 field of a record identifies (implicitly or explicitly) all character sets used in the record. This may be different for other MARC formats, however. Likewise an SGML DTD indicates the character sets internally in the CHARSET tag (although it is not carried in a document instance that does not have the DTD attached). Example: CS0 USMARC Roman Proposal 7. Add optional fields CV<0-n> (Character Variation<0-n>) It is proposed that an additional field be used to provide information on variations to the character sets specified in CS<0-n>, if the sets noted 1) are not used strictly according to the standard, 2) have options for some positions that need to be specified, or 3) have additional characters in positions that are undefined in the standard. Example: CS0 ISO 646-Basic CV0 2/3=number sign; 7/14=umlaut CS1 ISO 5426 CV1 4/9 not used Proposal 8. Add optional field FDI (Final Destination Identification) The FDI field in intended to assist those organizations that exchange bibliographic information with a large recipient community in identifying the intended customer. The field would be used to contain the name or identifier of the final-destination database. An example requiring this method of identification would be a PUT transfer to a central customer point, and additional information is required by this central point to determine the final destination for the records. It is proposed that this field follow the ISS field (Issue). ATTACHMENT A FTP LABEL EXAMPLE DAT##19951221211236.0# RBF##1564# DSN##LOC.BOOKS.DIST.DATA.D951221# ORS##DLC# CID##US# DTS##19951222013000.0# DTR##1995122119951221# FOR##M# FQF##USMARC# DES##MUMS Books Daily DQ# CS0##USMARC# CS1##USMARC Hebrew# VOL##V21# ISS##I50# FDI##Hebraic Resource File--RS10# REP##NDMSO@LOC.GOV# NOT##Test set of Hebrew records# "#" at end-of-field in above example is not a space, but is a graphic character ("#") ATTACHMENT B PROPOSED CHANGES TO THE FTP FILE LABEL Below is a summary of the enhanced file label specification with changes indicated. [] indicates text to be deleted; <> indicates text to be added. Tag Element Name Description M/O F/V R/NR DAT Date Compiled YYYYMMDDHHMMSS.F M F NR RBF Number of Records Numeric M V NR DSN Data Set Name Alphanumeric M V NR ORS Origin. System ID Alphanumeric M V NR <CID Country ID Alphanumeric O F NR> DTS Date Sent YYYYMMDDHHMMSS.F O F NR DTR Dates of Records YYYYMMDDYYYYMMDD O F NR FOR Format Alphanumeric [O] <M> F NR <FQF Format Qualifier Alphanumeric O V NR> DES Description Alphanumeric O V R <CS0-n Character Set 0-n Alphanumeric O V NR> <CV0-n Char. Var. 0-n Alphanumeric O V NR> VOL Volume Alphanumeric O V R ISS Issue Alphanumeric O V R <FDI Final Dest. ID Alphanumeric O V NR> REP Reply to Alphanumeric O V R NOT Note Alphanumeric O V R DAT (Date compiled): Mandatory; Fixed length; Not repeatable. This is the date the originating system completed the compilation of the file of records. This is not the date of the creation of the records contained in the bibliographic file. The field is recorded according to Representation for Calendar Date and Ordinal Date for Information Interchange (ANSI X3.30) and Representations of Local Time of the Day for Information Interchange (ANSI X3.43). The date requires 8 numeric characters in the pattern yyyymmdd (4 for the year, 2 for the month, and 2 for the day; right justified and zero filled). The time requires 8 numeric characters in the pattern hhmmss.f (2 for the hour, 2 for the minute, 2 for the second, and 2 for a decimal fraction of the second, including the decimal point). The 24-hour clock is used. RBF (Number of records in file): Mandatory; Variable length; Non-repeatable. This element includes the number of logical records contained in the file of USMARC records. DSN (Data Set Name): Mandatory; Variable length; Not repeatable. The filename of the file of USMARC records (which is sent separately) for which this is a file label. ORS (Originating system ID): Mandatory; Variable length; Not repeatable. The name of the system that compiled the files of records. This could be a symbol (e.g., OCLC or NUC) or text. <CID (Country ID): Optional; Fixed length; Not repeatable. The country identifier of the system that compiled the files of records. The identifier would be taken from Codes for Representation of Names of Countries (ISO 3166).> DTS (Date sent): Optional; Fixed length; Not repeatable. This is the date of transmission of the file of USMARC records. The field is recorded according to _Representation for Calendar Date and Ordinal Date for Information Interchange_ (ANSI X3.30) and Representations of Local Time of the Day for Information Interchange (ANSI X3.43). The date requires 8 numeric characters in the pattern yyyymmdd (4 for the year, 2 for the month, and 2 for the day; right justified and zero filled). The time requires 8 numeric characters in the pattern hhmmss.f (2 for the hour, 2 for the minute, 2 for the second, and 2 for a decimal fraction of the second, including the decimal point). The 24-hour clock is used. DTR (Dates of records): Optional; Fixed length; Not repeatable. This includes inclusive dates of last transaction of the records in the file, i.e. the first and last date recorded in the 005 fields of the file of records. The field is recorded according to _Representation for Calendar Date and Ordinal Date for Information Interchange_ (ANSI X3.30). The date requires 16 numeric characters in the pattern yyyymmddyyymmdd (4 for the year, 2 for the month, and 2 for the day for each date; right justified and zero filled). FOR (Format): <Mandatory>; Fixed length; Not repeatable. This element designates the format of the records, generally M for <Z39.2 or ISO 2709> (MARC) <, S for ISO 8867 (SGML)>. <FQF (Format qualifier): Optional; Variable length; Not repeatable. This element provides additional description of the format of the record file. For example, it may identify a particular tag set/specification for MARC records or a particular DTD for SGML records. For MARC formats, the content of the FQF field may be text or a code from the list: Z39.50 Registered Record Syntaxes. For DTDs, the content is the identifier in the DTD DOCTYPE field.> DES (Description of records): Optional; Variable length; Repeatable. This element describes the records. The data could be coded or describe a product name. (For example, OCLC uses B for Bibliographic describing a data type; CDS may use a product name, such as MDS-Books All.) <CS0-n (Character set <0-n>): Optional; Variable length; Not repeatable. These fields specify the character sets (control and/or graphic) needed for processing the record data file. The field content is text indicating a particular set (e.g., ISO 646-IRV, ISO Registry #37, USMARC, or a reference to a private character set). CS0 indicates at least the G0 set and CS<1-n> indicate other sets in the file.> <CV0-n (Character variation <0-n>): Optional; Variable length; Repeatable. These fields are used in conjunction with the CS fields and contain a textual description of the variations from the set specified in the corresponding CS field. Variations may be because the set noted 1) is not used strictly according to the standard, 2) has options for some positions that need to be specified, or 3) has additional characters in positions that are undefined in the standard.> VOL (Volume): Optional; Variable length; Repeatable. This may be used if it is desirable to assign a volume number when distribution of records is by subscription. Each file within a subscription year may be given a volume and issue number. ISS (Issue): Optional; Variable length; Repeatable. This may be used if it is desirable to assign a volume and issue number when distribution of records is by subscription. Each file within a subscription year may be given a volume and issue number. It may be combined with Volume (e.g., V1402). <FDI (Final destination ID): Optional; Variable length; Not repeatable. This field would contain the name or identifier of the final-destination database.> REP (Reply to): Optional; Variable length; Repeatable. This field contains an address given as a contact for problems/questions in transmission. It may include an Internet or postal address. NOT (Note): Optional; Variable length; Repeatable. This field contains textual information or messages about the file.