Home Projects Publications Presentations Repositories Photo Gallery Career Staff Favorites
  • Turning The Pages Online
  • MyMorph
  • Medical Article Records GROUNDTRUTH (MARG)
  • MD on Tap
  • AnatQuest
Links to Feeds:
PublicationsRSS  RSS
CEB NewsRSS  RSS

Last updated: June 23, 2008

CEB Projects

Print this Print this  E-mail this E-mail this

Medical Article Record System (MARS)

Project Member(s): Daniel Le, Jong Woo Kim, Susan Hauser, Chan Moon, Joseph Chow, Loc Tran, In Cheol Kim, George Thoma

Data entry for the thousands of bibliographic databases around the world from information in journal articles continues to be heavily manual. At the National Library of Medicine (NLM) we are automating the production of bibliographic records for MEDLINE, NLM's premier database used by clinicians and researchers worldwide. As a first step we have developed a system called MARS (for Medical Article Record System) that involves scanning and converting by optical character recognition (OCR) the abstracts that appear in journal articles, while keyboarding the remaining fields (e.g., article title, authors, affiliations, etc). This system has been in production since 1996 and employs a team of professionals to process 600 articles daily.

An example of 3 different journal article layout types.

A second generation system is now being designed which automatically extracts the remaining fields. This system employs scanning and OCR as well, in addition to modules that automatically zone the scanned pages, identify the zones as particular fields, and reformat the field syntax to adhere to MEDLINE conventions. The work in developing the second generation system consists of developing algorithms to detect page zones (page segmentation), automatically labeling these zones by field name (article title, author, affiliation, abstract), and then automatically reformatting the zone text syntax. The system relies on a database to keep track of the workflow as well as serve as a repository for data extracted from the scanned page to be used by subsequent processes.

Thoma GR. Automating the production of bibliographic records for MEDLINE. (HTMLDOCPDF).
Internal R&D report, CEB, LHNCBC, NLM; September 2001; 92.



 

National Institutes of Health (NIH)National Institutes of Health (NIH)
9000 Rockville Pike
Bethesda, Maryland 20892

U.S. Dept. of Health and Human ServicesU.S. Dept. of Health
and Human Services

USA.gov Website