DISCUSSION PAPER NO. 109

DATE: May 6, 1998
REVISED:

NAME: Identifying transliteration schemes in USMARC formats

SOURCE: Library of Congress

SUMMARY: The paper discusses the need to identify transliteration scheme in MARC records at the field level and proposes a possible technique.

KEYWORDS: Field 008/07 (Transliteration scheme)(AD); Transliteration; Non-roman data

RELATED: DP100 (June 1997); DP111 (June 1998)

STATUS/COMMENTS:

5/6/98 - Forwarded to the USMARC Advisory Group for discussion at the June 1998 MARBI meetings.

7/29/98 - Results of USMARC Advisory Group discussion - There were only a few minutes left in the meeting to discuss this paper. It appears that the main identified use of the transliteration marker is for Wade-Giles, which some participants felt did not need an explicit solution. With respect to linking transliteration-related fields, while for the end user there is little need, systems use the links to pair fields in cataloger displays, and to maintain records, e.g., if a field is deleted, its linked transliteration needs to be considered for deletion. Also catalogers often need the links for maintenance work as they are sometimes not very familiar with the languages. The specificity of indicating transliteration schemes on an international basis was considered a problem. There was not much consensus on the need, but because of the time constraints, discussion should continue at the next meeting

DISCUSSION PAPER NO. 109: Transliteration identification

1 INTRODUCTION

Some agencies need to provide transliterations for certain data in MARC records -- titles, descriptive data from the item, author names, etc. Often it is cataloging practice to provide transliterations of some (or most) descriptive data in a record because (1) the transliterated form of a name or data element appears on the item being cataloged, (2) a complete alphabetic sorting of multilingual entries for displays in a single alphabet is needed, and (3) there are systems and communications clients/servers in which only the roman script is supported. In countries where the roman script is the vernacular, for example, many systems and programs and much equipment may not accommodate non-roman scripts, so romanization is used.

1.1 Notes

The following discussion is from a roman script perspective, to simplify the presentation. The same analysis pertains to records constructed from, for example, a cyrillic perspective, where the basic script is cyrillic and the transliteration included may be into cyrillic from roman or other scripts.

The examples in this paper give in the field 066 and subfield $6 the name of scripts rather than the MARC codes for the scripts, as specified in the format. Also diacritics and special characters have been normalized to ASCII or omitted.

2 DISCUSSION

2.1 Current Practice

Currently, data in the roman script is directly tagged in the format in the regular fields and the data in non- roman scripts is embedded in 880 fields with indirect tagging. The non-roman script data is sometimes unique data in the record (i.e., it is NOT the non-roman form of roman data) and sometimes it has a romanized correspondent in the regular fields, and the two are linked. Without the 880 field (see DP 111), non-roman data would appear in directly tagged fields, and when there is corresponding roman data there would be a repetition of the field with the same tag and content designation. Identification of the romanization scheme could be useful under either model: 880 fields or directly tagged fields.

In Authority records the romanized forms of headings are recorded in the regular data fields as 1XX, 4XX, and 5XX headings. The romanization scheme for the 1XX uniform heading can be indicated at a general level (international standard, national standard, local standard, conventional, etc.) in field 008/07. The particular transliteration (e.g., Wade-Giles or Pinyin) cannot be indicated for the 1XX, nor can transliteration be indicated for the 4XX and 5XX headings.

Transliteration schemes cannot be indicated in any other formats, although non-roman and romanized data may be encoded in the Bibliographic and Community Information formats, since the 880 field is defined for both. At this time it is less crucial to identify the romanization scheme for romanized data in the Bibliographic and Community formats, but a technique that can be generalized across all the formats is desirable.

2.2 Need

Different cataloging agencies may prefer one transliteration over another as the standard for their catalog or a decision might be made to change from one scheme to another. Both of these situations would benefit from designation of transliteration scheme at the field level. For example, changing from Wade-Giles to Pinyin has been under discussion for many years. In the name authority file, for Chinese headings the 1XX uniform heading is transliterated using Wade-Giles and one of the 4XX fields contains the Pinyin transliteration of the heading, but the Pinyin heading is not identified among the 4XX fields. In the future, there is a need to identify the Pinyin 4XX so it can be switched with the Wade-Giles heading form. Also as more international exchange of data takes place, catalogers will have available records to copy that use various schemes, and they could be more efficiently used if the schemes were identified. In some cases it might be possible to achieve automatic machine conversion of transliteration, or switching of headings and references to the heading with the preferred transliteration if the transliteration is identified by machine.

2.3 Subfield $6 Technique

One technique for identifying transliteration would be through the $6 subfield. The $6 subfield is currently only defined in the Authority, Bibliographic, and Community formats to be used with the 880 field, defined as follows:

    $6<linking tag>-<occurrence number>/<script
          code>/<field orientation code>

It could be extended to include transliteration:

    $6<linking tag>-<occurrence number>/<script
          code>/<field orientation code>/<transliteration
          scheme>

In the absence of a need for the first, second, or third items, the subfield would contain only the transliteration scheme:

    $6///<transliteration scheme>

Authority record example (without field 880):
    008/7  b       [national standard]
    040 $beng
    110  10$aSoviet Union.$bPosolstvo (Egypt).$bMaktab al-sihafal
    410  10$aRussia (1923-  U.S.S.R.).$bPosolstvo (Egypt).$bMaktab
           al-sihafal.$6///<translit. scheme for Cyrillic>
           $6///<translit. scheme for Arabic>

DP 111 discusses ceasing to use the 880 field, making is less important to link roman and non-roman corresponding pairs. It also indicates that explicitly marking one script in a field, which is all the subfield $6 accommodates, is not useful for processing. Scripts are "self" identifying by the codes used for the characters in the field and the escape sequences that change character sets in a field. Thus the script coding will probably become "obsolete" in the future. The field direction might still be needed in some cases. Subfield $6 might primarily be used in the future for the directional flag and the transliteration scheme.

2.4 Linking Transliterated to Vernacular Data

If there is still a need to link the non-roman and transliterated roman data, the now-generalized field linking subfield $8 should be used if the 880 ceases to be used. It would be formatted:

    $8<linking number>\g

where code "g" is the field link type, indicating the reason for the link is alternate graphic representation (or transliteration).

Authority record example (without field 880):
    066  ##$alatin$bextended latin$ccyrillic
    100  1#$81\g$aZemtsovskii, I. I.$q(Izalii Iosifovich)
    100  1#$81\g$a<name in Cyrillic with initials≫$q(<qualifier in
           Cyrillic≫)
    400  1#$82\g$aZemtsovskii, Izalii Iosifovich
    400  1#$82\g$a<name in Cyrillic≫
    400  1#$aZemtsovskiy, I.

The 1XX link is unnecessary but could be included. An alternative would be to code the alternate graphic form of the 1XX as a 4XX, in which case the link between the 1XX and the 4XX might be desirable, but the 4XX linking might not be necessary:

    066  ##$alatin$bextended latin$ccyrillic
    100  1#$81\g$aZemtsovskii, I. I.$q(Izalii Iosifovich)
    400  1#$81\g$a<name in Cyrillic with initials>$q(<qualifier in
           Cyrillic>)
    400  1#$aZemtsovskii, Izalii Iosifovich
    400  1#$a<name in Cyrillic>
    400  1#$aZemtsovskiy, I.

2.5 Detail on Transliteration Scheme

The level at which the transliteration scheme should to be identified needs to be studied. There are many -- perhaps hundreds -- of transliteration schemes, besides the ALA tables standardly used in American libraries. Going to the specific level would be difficult, but the potential usefulness of the general level, as it is now specified in the 008, is questionable and needs to be examined.

Authority record example (without field 880):
    008/7  b? or g?     [national standard? conventional?]
    040 $beng
    151  ##$aMoscow (Russia)
    451  ##$aMoskva (Russia)$6///<translit. scheme A for Cyrillic>
    451  ##$aMoscova (Russia)$6///<translit. scheme B for Cyrillic>
    451  ##$aMo-ssu-k'o (Russia)$6///<translit. scheme C for Cyrillic>

2.6 Bibliographic Records

In Bibliographic records, the transliteration scheme can be indicated for fields as needed. The inclusion of transliterated fields for much of the non-roman data in records has been common since many systems are not able to display the non-roman fields. As systems are enabled to handle multiple scripts, the fields transliterated will probably decrease. A number of factors affect the decision, but the need to identify the transliteration scheme would be related to the type of field and whether the field data was also represented by an authority file record with transliteration information. The technique for identifying transliteration could be used for any field but be useful in only a small number of cases. An example from the Bibliographic format of field 880 illustrates the variety of possible transliterated fields.

Bibliographic format example (with field 880):
    066  ##$cchinese
    100  1#$6880-01//Wade-Giles$aShen, Wei-pin.
    245  10$6880-02//Wade-Giles$aHung Jen-kan /$ccShen Wei-pin chu.
    250  ##$6880-03//Wade-Giles$aTi 1 pan.
    260  ##$6880-04//Wade-Giles$aShang-hai :$bShang-hai jen min 
           ch`u pan she :$bHsin hua shu tien Shang-hai fa hsing 
           so fa hsing,$c1982.
    300  ##$a136 p., [1] leaf of plates :$bill. ;$c19 cm.
    490  1#$6880-05//Wade-Giles$aChung-kuo chin tai shih ts`ung shu
    504  ##$aIncludes bibliographical references.
    600  10$6880-06//Wade-Giles$aHung, Jen-Kan,$d1822-1864.
    651  #0$aChina$xHistory$yTaiping Rebellion, 1850-1864.
    650  #0$aRevolutionists$zChina$xBiography.
    830  #0$6880-07//Wade-Giles$aChung-kuo chin tai shih ts`ung 
           (Shanghai, China)
    880  1#$6100-01/chinese$a<Chinese characters>
    880  10$6245-02/chinese$a<Chinese characters>
    880  ##$6250-03/chinese$a<Chinese characters>
    880  ##$6260-04/chinese$a<Chinese characters>
    880  1#$6490-05/chinese$a<Chinese characters>
    880  10$6600-06/chinese$a<Chinese characters>
    880  #0$6830-07/chinese$a<Chinese characters>

2.6 Summary

The following subfield changes might be defined across all formats to enable specification of transliteration scheme and link transliterated data to the corresponding vernacular where needed.

    $6<linking tag>-<occurrence number>/<script
      code>/<field orientation code>/<transliteration scheme>

- If use of field 880 is discontinued, use subfield $8 for linking alternate script in directly tagged fields, with code "g" defined as alternate graphic representation (transliteration), when needed:

    $8<linking number>\g

- What are the uses for explicitly identified transliteration schemes? Are the ones identified above valid?

- Is there a need to link corresponding transliteration-related fields, such as non-roman and transliterated roman data?

- Could the $6 subfield be reformatted to not require the slashes for no longer used data?

- At what level should schemes be identified for the uniform heading in authority records (in 008/07)? for other fields (in subfield $6)? in other formats?

- What is the impact:

Go to:

Library of Congress

Library of Congress Help Desk (09/03/98)