Semi-Automatic Indexing

Semi-Automatic Indexing

Home

The Indexing Initiative team has been working on various elements of semi-automatic indexing. Semi-automatic indexing involves the indexing of a document via the Medical Text Indexer (MTI) and providing recommendations for possible indexing terms to a human indexer indexing an article.

The goal of the MTI is to become a "black box" type application referenced once new articles have electronically arrived at the Office of Computer and Communications Systems (OCCS) online database or, Data Creation and Maintenance System (DCMS). The electronic articles are stored in the database as bibliographic citations. MTI processes the title and abstract from the approximately 4,000 newly arrived citations each night. The resulting MTI recommended MeSH terms are then sent to DCMS where they subsequently are made available to the Indexers in a special pane of their indexing window in DCMS. There the Indexers may click on terms they would like to include in the indexing for the article. This is why it is called "semi-automatic" indexing; MTI is only an advisor.

Picture of a possible interface between the Indexing Initiative and the DCMS

Picture of a possible interface between the Indexing Initiative and the DCMS

Activities:

Monitoring of MTI Production and Usage
DCMS Indexer Recommendation Assessment Experiment
MTI User-Centered Evaluation (TBD)
MARS & MTI Test Bed

Monitoring of MTI Production and Usage

To support the evaluation of MTI utility for the library's indexers, logging of indexer accesses has been ongoing since shortly after the MTI recommendations were made available to the indexers through DCMS (September 2002). When the maintenance of the database of recommendations moved to DCMS (September 2004), new logs were started that provided more information about who, and when indexers requested MTI recommendations.

The following report summarizes the information gleaned from those logs: MTI Utility, November 1, 2005

(469kb)

DCMS Indexer Recommendation Assessment Experiment

Recent advances in the quality of indexing achieved by the Medical Text Indexer, MTI and the work towards integrating that system with DCMS have suggested the need to perform an evaluation of that indexing by Medline indexers. The experiment described here constitutes an assessment of whether indexing terms suggested by MTI would facilitate the work of the indexers. We sometimes refer to this list of subject headings found by MTI as MTI results.

The Experiment

This experiment was conducted by the core team of the NLM's Indexing Initiative with the assistance of volunteer indexers from the Bibliographic Services Division (BSD), and technical support from the DCMS development team in the OCCS. DCMS is the Citation Capture and Maintenance System used daily by the Medline indexers. We measured the usefulness of the MTI suggested indexing terms by presenting such terms to indexers actually performing indexing. As a pre-test for the system we ran a trial of the objective analysis on the MTI results and the indexer selected subject headings from a previous issue of some journals that were recently indexed by some of our volunteer indexers. This analysis was used to select the proper parameters and level of filtering to use in the experiment. Following a kick-off meeting on March 14, 2002, the experiment was conducted from March 19 through April 11, 2002.

Details and Results

The following paper describes the experiment in great detail and identifies the results of the completed test:
A MEDLINE Indexing Experiment Using Terms Suggested by MTI, June 2002

(510kb)

MTI User-Centered Evaluation (TBD)

The Indexing Initiative project at the National Library of Medicine has the primary objective to investigate methods whereby automated subject heading indexing methods partially or completely substitute for current manual indexing practices. This effort will look at designing and testing a method for evaluating whether that objective has been met by the Medical Text Indexer (MTI). A subsequent project will conduct the evaluation to do the actual measurement.

Purpose

The Indexing Initiative will be considered a success if methods can be designed and implemented that result in retrieval performance that is equal to or better than the retrieval performance of systems based principally on humanly assigned index terms.

Recent research efforts for evaluating information retrieval (IR) systems have focused on user reactions to the process and the results. They attempt to measure user satisfaction with a system's results rather than just the technical ability of the system to retrieve documents.

To better judge the efficacy of the Medical Text Indexer to support and enhance users' access to national databases of biomedical information, this project shall design a process to evaluate the retrieval performance from the point of view of the end users of this outcome of the Indexing Initiative. The resulting evaluation will be used in assessing the feasibility of adopting the system in a production environment.

User-centered Evaluation

The ultimate goal of any IR system is user satisfaction, regardless of the underlying technology. Such satisfaction is determined by numerous factors beyond the technical ability of a system to deliver topically relevant documents. The conclusion reached by many investigators is that a more user-oriented notion of retrieval system evaluation is needed in order to address these issues (Harman, 1992; Su, 1992; and Gluck, 1996), and recent system development in IR is often assessed with the user in mind (Jose, Furner, and Harper, 1998, for example).

The Indexing Initiative has considered possible approaches to the design of a user-oriented evaluation study. Several studies serve as a guide in this regard. Hersh, Pentecost, and Hickam (1996) report on an interesting, task-oriented evaluation strategy in a biomedical setting, which focuses on the user's information need. Methodologies are being developed in the context of the TREC experiments (Beaulieu, Robertson, and Rasmussen 1996) which provide a means of accommodating the user in formal IR experiments. Surveys of the type reported in Lindberg, et al. 1993b can provide valuable insight into the impact that an IR system has on the professional activities of users.

Because the Indexing Initiative System is a subject indexing system and not strictly an information retrieval system, the documented techniques for retrieval system evaluation are not directly applicable to the proposed evaluation. In developing an evaluation design, this project will establish a methodology for evaluating user satisfaction with retrieval results and comparing that satisfaction when the search facilitating resources used by the retrieval engine, such as subject headings, are different.

Operational Assumptions

The evaluation will be conducted with retrieval against test sets with indexing produced by several schemes. These schemes include, but are not limited to,

humanly assigned MeSH terms,
automatically assigned MeSH terms produced by the Medical Text Indexer, and
inverted indexes of text words.

The platform for the evaluation will be the NLM Gateway. It will serve as the base retrieval system and provide an interface for the evaluation subjects.

MARS & MTI Test Bed

The purpose of this Test Bed was to develop a test environment where the Indexing Initiative team could test out "real-time" processing and handling of the Office of Computer and Communications Systems (OCCS) XML (eXtensible Markup Language) formatted data files. The idea was to use the existing Medical Article Record System (MARS) automated data flow to OCCS as a platform to simulate the interaction with OCCS and allow the Indexing Initiative team to finalize development of a system that will eventually interface directly with OCCS.

Link to MARS & MTI Test Bed Page

Last Modified: October 09, 2007

ii-public

Links to Our Sites

MetaMap Public Release

NEW: Distributable version of the actual MetaMap program.

Indexing Initiative (II)

Investigating computer-assisted and fully automatic methodologies for indexing biomedical text. Includes the NLM Medical Text Indexer (MTI).

Semantic Knowledge Representation (SKR)

Develop programs to provide usable semantic representation of biomedical text. Includes the MetaMap and SemRep programs.

MetaMap Transfer (MMTx)

Java-Based distributable version of the MetaMap program.

Word Sense Disambiguation (WSD)

Test collection of manually curated MetaMap ambiguity resolution in support of word sense disambiguation research.

Medline Baseline Repository (MBR)

Static MEDLINE Baselines for use in research involving biomedical citations. Allows for query searches and test collection creation.

Lister Hill Center Homepage Link - Image of Lister Hill Center

Lister Hill National Center for Biomedical Communications

U.S. National Library of Medicine

National Institutes of Health

Department of Health and Human Services