Year: 2003 |
Download Free Adobe Acrobat Reader |
LHNCBC-2003-013 |
Correcting OCR Text by Association with Historic Datasets |
Hauser SE, Schlaifer J, Sabir TF, Demner-Fushman D, Thoma GR |
Proc. SPIE Electronic Imaging. 2003 Jan;5010: 84-93. |
le to select a correct affiliation for the author 43% of the time with a false positive rate of 6%, a true negative rate of 44% and a false negative rate of 7%. MEDLINE citations with United States affiliations typically include the zip code. In addition to using author names as clues to correct affiliations, we are investigating the value of the OCR text of zip codes as clues to correct USA affiliations. Current work includes generation of an author/affiliation/zipcode table from the entire MEDLINE database and development of a daemon module to implement affiliation selection and matching for the MARS system using both author names and zip codes. Preliminary results from the initial version of the daemon module and the partially filled author/affiliation/zipcode table are encouraging. |
PDF |