Skip Navigation
Lister Hill Center Logo  

Search Tips
About the Lister Hill Center
Innovative Research
Publications and Lectures
Training and Employment
LHNCBC: Document Abstract
Year: 2003Adobe Acrobat Reader
Download Free Adobe Acrobat Reader
LHNCBC-2003-013
Correcting OCR Text by Association with Historic Datasets
Hauser SE, Schlaifer J, Sabir TF, Demner-Fushman D, Thoma GR
Proc. SPIE Electronic Imaging. 2003 Jan;5010: 84-93.
le to select a correct affiliation for the author 43% of the time with a false positive rate of 6%, a true negative rate of 44% and a false negative rate of 7%. MEDLINE citations with United States affiliations typically include the zip code. In addition to using author names as clues to correct affiliations, we are investigating the value of the OCR text of zip codes as clues to correct USA affiliations. Current work includes generation of an author/affiliation/zipcode table from the entire MEDLINE database and development of a daemon module to implement affiliation selection and matching for the MARS system using both author names and zip codes. Preliminary results from the initial version of the daemon module and the partially filled author/affiliation/zipcode table are encouraging.
PDF