Skip to Content

The MeSH Translation Maintenance System:
Structure, Interface Design, and Implementation

Stuart J. Nelson, MD [1], Michael Schopen, MD [2], Allan G. Savage [1],
Jacque-Lynne Schulman [1], and Natalie Arluk [1]

1. National Library of Medicine, Bethesda, MD, USA
2. Deutsches Institut für Medizinishe Dokumentation und Information, Köln,
Deutschland

ABSTRACT

The National Library of Medicine (NLM) produces annual editions of the Medical Subject Headings (MeSH®). Translations of MeSH are often done to make the vocabulary useful for non-English users. However, MeSH translators have encountered difficulties with entry vocabulary as they maintain and update their translation.. Tracking MeSH changes and updating their translations in a reasonable time frame is cumbersome. NLM has developed and implemented a concept-centered vocabulary maintenance system for MeSH. This system has been extended to create an interlingual database of translations, the MeSH Translation Maintenance System (MTMS). This database allows continual updating of the translations, as well as facilitating tracking of the changes within MeSH from one year to another. The MTMS interface uses a Web-based design with multiple colors and fonts to indicate concepts needing translation or review. Concepts for which there is no exact English equivalent can be added. The system software encourages compliance with the Unicode standard in order to ensure that character sets with native alphabets and full orthography are used consistently.

BACKGROUND

The National Library of Medicine (NLM) has produced the MEDLINE database since 1966. The MEDLINE database includes over 10 million literature citations of articles written in 41 languages. Each article is indexed with Medical Subject Headings by an individual who, after reading the article in its original language, assigns the descriptors to indicate what the article is about.

International MEDLARS Centers, including those in Germany, Japan, Brazil, and France, as well as other national medical information centers have long produced translations of MeSH to make the vocabulary useful for non-English users. These translations were generally not in machine-readable form and varied in frequency of appearance. Some translations were issued annually and others irregularly. Translations of MeSH have been made into German, Japanese, Portuguese, Spanish, French, Russian, Polish, Romanian, Arabic, Greek, Dutch, Thai, Turkish, Swedish, Italian, Chinese, and other languages. The translations enable users not facile in English to identify articles that are of sufficient potential interest to warrant reading.

CONCEPT STRUCTURE OF MeSH

In the year 2000, MeSH changed from a database of descriptors and terms, to one consisting of descriptors, concepts, and terms. A descriptor is now viewed as a class of concepts, and a concept as a class of synonymous terms within a descriptor class. By using the concept as a key object in the new structure, appropriate non-synonymous relationships could be represented separately, and differences between usages and meanings clarified and disambiguated. The descriptor class consists of one or more concepts closely related to each other in meaning, or of non-synonymous concepts best lumped together in one class for the purposes of indexing, retrieval, and organization of the literature. Putting these non-synonymous concepts together into one descriptor class does not alter the traditional function of entry vocabulary, that of pointing the user (whether individual or system) to the appropriate main heading (descriptor). Rather, it points out explicitly that this was a choice for the intended purpose of the vocabulary, rather than a confusion about the meaning of a term.

For example, under the old term-based system, the descriptor CYTOPLASMIC GRANULES had a non-synonymous (narrower) entry term, Secretory Granules. After establishment of the new concept-oriented system, it happened that a new descriptor SECRETORY VESICLES was created. Secretory Granules was moved to this new heading as a (related) subordinate concept, better represented by the new descriptor. Other non-synonymous subordinate concepts were also created under this new descriptor class. The multiple, non-synonymous concepts represent slight differences in meaning, yet they are all grouped together under SECRETORY VESICLES for purposes of retrieval.

Old Main Heading: CYTOPLASMIC GRANULES
 Old Entry Term: Secretory Granules (narrower)
New Descriptor: SECRETORY VESICLES [Descriptor Class]
  Preferred Concept: Secretory Vesicles [preferred term]
  Subordinate Concept: Secretory Granules (related)
  Subordinate Concept: Condensing Vacuoles (related)
  Subordinate Concept: Zymogen Granules (narrower)
  Subordinate Concept: Dense Core Vesicles (narrower)

The change in data structure allows a greater degree of organization. Each descriptor class has a preferred concept. The term that names the preferred concept (the preferred term of the preferred concept) is referred to as the descriptor or as the main heading. Each of the subordinate concepts also has a preferred term, as well as a labeled (broader, narrower, related) relationship to the preferred concept. Terms meaning the same (naming the same concept) are grouped together in the concept record.

TRANSLATION DATABASE

Therefore, there was need to extend the MeSH maintenance system to encompass an interlingual database of translations. The MeSH Translation Maintenance System allows continual updating of the translations, as well as facilitating tracking of the changes within MeSH from one year to another. In this way translations can be made on new headings as they are created rather than waiting until after they are published once each year, greatly facilitating currency. Given the new MeSH structure, it was relatively easy to conceive of a method for supporting the work of translators. Translated terms can be included in the MeSH maintenance environment as an extension of the current MeSH database. Translators can use the MTMS to manage their translations.

Translated terms are provided as synonyms to existing concepts. For non-synonymous entry terms that are not present in the English version, but useful in the language of the translation, the translator creates a new concept. The concept would, of course, belong to a descriptor class, that of the main heading for which it was an appropriate entry term. In this case of a concept class for which there was no English synonym, a definition of the concept is required, so that translators using other languages can have the ability to include their terms in that concept class.

DESIGN OF THE INTERFACE

In order to avoid the difficulties of trying to maintain multiple disparate clients, the interface was designed to be Web-based, and it contains a variety of security measures to limit participation to authorized individuals. Java servlets, running on the Web server, enables the transmission of the submitted information to the database server.

Privileges for translators are limited to insertion of terms in their own language, and to creation of new subordinate concepts. In the case of creating a new subordinate concept, the required submission of a definition (in English) of the new concept supports both the translation of that term into other non-English languages, and enables proper maintenance when that descriptor class is edited by the MeSH staff.

While the translator has the ability to browse MeSH descriptors, the translation interface has been designed for direct editing of concepts and terms only. There is two different ways that the user can access the concepts and terms from the interface: (1) by searching the MeSH Tree Structures for descriptor names or (2) by searching for term names. For each method, there are two different language modes available: an English version, and a translated version that appears in each users own language.

The interface uses color, boldface, and italic fonts in the display to convey the current status of the various descriptors, concepts, and terms. In this way the user can quickly determine at a glance which MeSH terms are new, which still need to be translated, and which translated terms are waiting supervisor review and final approval.

A special module of the interface was designed for translators’ supervisors, to enable them to review and authorize terms and concepts for each translator in their group.

MANAGING THE MAINTENANCE ENVIRONMENT

For each language to be incorporated into the MeSH maintenance environment, there will be a team of translators and a supervisor. The supervisor will coordinate, review, and authorize the work of that group of translators for that language. Once the supervisor authorizes the work, a member of the MeSH Section at the National Library of Medicine will conduct a final review and quality control of all changes before they are approved to become an official part of MeSH.

To institute the translation database requires the agreement and cooperation of the translators. Once that is obtained, previous translations can be loaded into the database from the UMLS Metathesaurus. Translations that have not been previously included in the UMLS Metathesaurus will be dealt with on an individual basis. After the translations are loaded, translators would then be able to review areas in which the mapping from one term to another might be problematic, and to find the descriptors in MeSH for which there was no translated term. The display of translated terms in the concept structure allows finer shades of meaning to be fully represented.

CHARACTER SET ISSUES

The character set used depends on the operating system and the coding scheme it uses for the language. Knowing the coding scheme, it is often possible to find a set of fonts (or glyphs) to make the character appear as it should in the specific written language. However, the coding schemes are not unique, and far from universal. Unless the scheme is understood properly, sorting and presenting material in an orthographic manner becomes quite difficult.

The best longterm solution to the character set problem is one that correctly represents languages with their native alphabets and full orthography. Unicode appears to be one means of achieving that goal. It provides a unique number for every character, no matter what the platform, program, or language. The MeSH database has been converted to Oracle version 8I, a database management system that supports the use of Unicode. Java, which supports the servlet and the MeSH client used at the NLM, is fully Unicode compliant. We encourage translators to submit their terms in Unicode, though we must make provisions for those who are unable to do so.

When the source file of terms in another language is loaded into the MTMS, Oracle 8I converts the coding (Unicode or otherwise) for each character to UTF-8 (Unicode Transformation Format-8), which is how they are stored in memory. The Web server, in conjunction with the MTMS application and the IE Browser, is also configured to UTF-8 encoding. In this way, a consistent character set with native alphabets and full orthography is used and conforms to a universal standard.

CONCLUSION

When searching for information about a potentially relevant topic, it is often easier to use the language with which one has the most facility. Translations of MeSH are valuable to persons not facile in English. The creation of the MeSH Translation Maintenance System enables correct mappings from one language to another to be maintained and enables translators to stay current with MeSH as it continues to be enhanced. The Web-based interface, closely managed maintenance environment, and adherence to modern standards, all provide a robust platform for an interlingual database of translations.

As part of the UMLS® project, the UMLS Metathesaurus® was created. It is updated on a regular basis. It is a large database of naming information encompassing terms and concepts from more than 50 biomedical vocabularies and classifications, including MeSH. A number of translations of MeSH are included in the UMLS Metathesaurus. The translations into German, provided by DIMDI, French (INSERM), Portuguese (BIREME), Spanish (BIREME), Russian (Central Medical Library, Moscow), Italian (Istituto Superiore di Sanita), and Finnish (the Finnish Medical Society) are included in the 2003 Metathesaurus. Without specific links between the translated terms and English terms in MeSH, maintaining appropriate representation in the Metathesaurus required considerable effort on the part of the translators or bilingual editors. While an original translation of all of MeSH might link a term with the appropriate subject heading, with modifications of MeSH the linkage of that translated term might no longer be appropriate. This situation arose because of non-synonymous entry terms.

Last updated: 17 September 2004
First published: 17 September 2004
Metadata| Permanence level: Permanence Not Guaranteed