CJK Compatibility Database

Use the CJK Compatibility Database to quickly and conveniently replace non-MARC21 characters with MARC21 equivalents, or a missing character symbol.

Non-MARC21 characters and their MARC21 equivalents

The Library of Congress database will soon be upgraded to Unicode compatibility. RLG’s Union Catalog and OCLC’s WorldCat databases are now also Unicode compatible. Chinese, Japanese and Korean (CJK) scripts are input into these systems using Microsoft input method editors (IMEs).

The Unicode character set includes several hundred duplicate CJK characters (for example, 路, F937, and 路, 8DEF), as well as many others that represent close variants (for example, 步, 6B65, and 歩, 6B69). Generally, one of these variants is a MARC21 character, while the other is not.

Only MARC21 characters can be displayed in USMARC records. However, sometimes the most logical way to create a character using a Microsoft IME produces a non-MARC21 character. For example, if one creates the common character 李 by keying 이 in the Korean IME, the result is a non-MARC21 character (F9E1). One must key in 리 to create the valid MARC21 form, 李, 674E.

The character 歩, 6B69, is created with the Japanese IME. But the Japanese form of this character is not a valid MARC21 form. The valid MARC21 equivalent, 步, 6B65, can only be created by the Korean or Chinese IME.

Only MARC21 characters can be displayed properly in a MARC21 bibliographic record. Therefore, a non-MARC21 character in a bibliographic record must be replaced by its MARC21 equivalent.

The CJK Compatibility Database

The CJK Compatibility Database includes more than 450 non-MARC21 Chinese, Japanese and Korean characters, Hangul syllables and diacritic marks, matched with their MARC21 equivalents. The list of characters in the database was initially identified by LC staff, and was supplemented by entries in a similar database at Yale University. Characters that do not have a MARC21 equivalent are matched with the missing character symbol 〓.

The database is intended to enable catalogers to quickly and conveniently replace a non-MARC21 character with its MARC21 equivalent. Directions are given below.

The entire list may be viewed by clicking on the tab entitled Browse all entries. The list gives the Unicode value for each character, along with other information that may be helpful in identifying the characters and describing how the MARC21 character may be input.

Updating This Database

The database is a cooperative undertaking, and is intended for the use of all CJK catalogers. If you encounter a non-MARC21 character in the course of your work, please report it to us so that it can be added to the database. Notify Young Ki Lee, Senior Cataloging Specialist, Korean/Chinese Team, Library of Congress, at ylee@loc.gov.

Directions

Replace a non-MARC21 character with its valid MARC21 equivalent by following these steps:

1) Copy the invalid character from your bibliographic record

2) Open the CJK Compatibility Page

3) Paste the invalid character in the white box and use the index "Invalid character"

4) Click "Submit"

Another screen will then appear with the valid alternative

5) Copy the valid alternative character or missing character symbol

6) Paste the valid alternative into your bibliographic record

Note: Characters can also be found by inputting the UTF of the valid MARC21 character or the UTF of the non-MARC21 variant.

Browse all entries

Try:     金    鶴    娳    歩