Lister Hill National Center for Biomedical Communications Home Page National Library of Medicine
CSB Home
CSB Staff Directory
CSB Projects
  • Journal Descriptor Indexing (JDI)

Journal Descriptor Indexing

The Journal Descriptor Indexing (JDI) project is part of NLM's Indexing Initiative (http://ii.nlm.nih.gov) which has as its objective to investigate methods whereby automatic indexing methods partially or completely substitute for current indexing practices.

JDI is a novel approach to fully automated indexing based on NLM's practice of maintaining a subject index to journal titles using a set of 122 MeSH terms, known as JDs (journal descriptors) corresponding to biomedical specialties. For example, the Journal of Pediatric Surgery is indexed by the JDs Pediatrics and Surgery. The JDI system associates JDs with words in titles and abstracts in a three-year training set of about 1.3 million MEDLINE records. Each record "inherits" the JDs from the journal in the record. A word in the training set can then be described by a list of JDs ranked according to the number of co-occurrences between the word and the JDs. Text as input to the JDI system can be indexed based on averaging the word-JD co-occurrences for the words in the text that are also in the training set, ranking the JDs in decreasing order of these averages. For example, JDI of the phrase "appendectomy in children" would result in Surgery and Pediatrics as the top two JDs indexing this text. Normally, JDI is used for indexing documents which are MEDLINE citations (titles and abstracts of journal articles).

JDI is being used by an in-house natural language processing (NLP) tool in the Semantic Knowledge Representation project (http://skr.nlm.nih.gov), known as SemRep, specifically an adaptation, SemGen, that identifies gene interaction predications from MEDLINE citations. JDI increases accuracy by identifying citations in the molecular genetics domain before NLP begins.

JDI has been extended to performing Semantic Type (ST) indexing. STs are a set of 135 categories in the Semantic Network in NLM's Unified Medical Language System (http://www.nlm.nih.gov/research/umls). Concepts in the UMLS Metathesaurus are assigned one or more STs which form an "isa" link from the concept to the ST. The set of UMLS Metathesaurus concepts assigned to an ST can be regarded as an "ST document" and therefore undergo JD indexing. Similarity can be measured between the JD indexing of some text to be indexed and the JD indexing of each of the ST documents. Accordingly, the STs (corresponding to the ST documents) can be ranked in decreasing order of similarity to the text being indexed, resulting in a ranked list of ST indexing terms for this text.

ST indexing has potential application in Word Sense Disambiguation (WSD). If the senses of an ambiguous word are expressed by STs, ST indexing can be performed on the context surrounding the word (phrase, sentence, abstract) in the expectation that in the ST indexing of the context, the correct STs for the word will rank higher than the other candidate STs for the word. For example, the ambiguity "transport" has two meanings: "Biological Transport" assigned the ST Cell Function and "Patient transport" assigned the ST Health Care Activity. JDI-based ST indexing can index text containing "transport" and determine which of these STs receives a higher score for that text, which then returns the associated meaning, presumed to apply to the ambiguity itself. JDI-based ST indexing was used in WSD experiments to automatically disambiguate forty-five ambiguous strings from NLM's WSD Test Collection (http://wsd.nlm.nih.gov).

JDI was reviewed by the Board of Scientific Counselors in October 1999 as part of their review of the Indexing Initiative, and was cited for its role in SemGen as part of the review of the Semantic Knowledge Representation project in September 2003.