Skip Navigation

Image Indexing Initiative (III)

Principal Investigator: Charles Sneiderman MD PhD, OHPCC

Image Indexing Initiative (III) is exploring the value of controlled vocabularies in describing biomedical images and other non-textual objects. Multimedia objects are essential to the following four areas of research identified in the NLM Long Range Plan for the 21st Century : Resources and Infrastructure; Health Information for the Underserved and Diverse Populations; Support for Clinical and Public Health Systems; Support for Genomic Science. NLM’s Large Scale Vocabulary Test (LSVT), conducted a decade ago to assess the adequacy of the UMLS Metathesaurus in health information systems, did not specifically address issues of non-textual data (4). Studies of both intramural investigators and extramural medical informaticians suggest that multiple component vocabularies of UMLS may be needed to provide sufficient scope and depth necessary to describe multimedia objects used in biology and medicine (1-3).

The development of online multimedia biomedical databases for patient care, teaching, and research and the development of multimedia scientific publication confront the Library of the future with issues of classification and retrieval of these objects. Current LHC progress in automated and semi-automated indexing in biomedicine depends on either the syntactic and semantic characteristic of text itself or physical characteristics of electronic images such as color or texture. To date, we have not attempted any substantial evaluation of retrieval of multimedia objects based on associated text, physical characteristics, or combinations of these methods. Extramural investigators are exploring methods for caption searches in online full biomedical text (5).

III has developed a web-based tool for biomedical content experts to index their own images and other multimedia objects. The tool (I3) presents to a user those UMLS concepts automatically matched from their own text descriptions pre-processed by interactive SKR MetaMap. The user is asked to rate these matches and to suggest any additional terms not produced by the automated matching using the method of LSVT and our previous research. The tool interfaces with an open source relational database (MySQL) so that the data can be readily analyzed and matches can be linked to the appropriate image for comparison with other methods of retrieval. It has now been integrated with HEALLocal, a multimedia database system developed by the Health Education Assets Library project supported with NLM funds(6).

Current activities involve collaboration with investigators in other branches of the Lister Hill Center (Cognitive Science, Computer Science, and Communications Engineering), and in the future may involve inter-institute, interagency and private sector involvement as we search for suitable test collections and evaluators. I3 has been modified and included in another web tool (ITI) for the Image Text Integration Project of CEB. ITI is a tool for subject matter experts to classify biomedical images.

These tools are still in development. This year accompanying text from the captions and mentions from selected clinical specialty journals in the PubMed Open Access collection was used for a trial of classification and indexing by a group of volunteer physicians from the LHNCBC staff (1) The tools are also being used to evaluate the adequacy of current indexing of a collection of cardiology teaching images. These trials and other with ITI and I3 in focused subject areas with appropriate subject matter experts will be used to create a standard by which future algorithms for image indexing can be compared.

In the future these datasets will be used to refine automated image content analysis algorithms (Dr. Antani, CEB) and statistical and semantic natural language processing algorithms (Dr. Demner-Fushman, CEB and Dr. Rindflesch, CGSB).

  1. Sneiderman CA, Demner-Fushman D, Fung KW, Bray B. UMLS-based Automatic Image Indexing. AMIA Annu Symp Proc. 2008 Nov 6:1141.
  2. Woods JW, Sneiderman CA, Hameed K, Ackerman MJ, Hatton CW. Using UMLS metathesaurus concepts to describe medical images: dermatology vocabulary. Comput Biol Med. 2006 Jan;36(1):89-100.
  3. Lowe HJ, Antipov I, Hersh W, Smith CA, Mailhot M. Automated semantic indexing of imaging reports to support retrieval of medical images in the multimedia electronic medical record. Methods Inf Med. 1999 Dec;38(4-5):303-7.
  4. Humphreys BL, McCray AT, Cheh ML. Evaluating the coverage of controlled health data terminologies: report on the results of the NLM/AHCPR large scale vocabulary test. J Am Med Inform Assoc. 1997 Nov-Dec;4(6):484-500.
  5. Hearst MA, Divoli A, Guturu H, Ksikes A, Nakov P, Wooldridge MA, Ye J. BioText Search Engine: beyond abstract search. Bioinformatics. 2007 Aug 15;23(16):2196-7.
  6. Candler CS, Uijtdehaage SH, Dennis SE. Sharing digital teaching resources: breaking down barriers by addressing the concerns of faculty members. Acad Med. 2003 Mar;78(3):286-94.