Skip Navigation
Lister Hill Center Logo  

Search Tips
About the Lister Hill Center
Innovative Research
Publications and Lectures
Training and Employment
LHNCBC: Document Abstract
Year: 2007Adobe Acrobat Reader
Download Free Adobe Acrobat Reader
LHNCBC-2007-043
Combining Static Classifiers and Class Syntax Models for Logical Entity Recognition in Scanned Historical Documents
Mao S, Mansukhani P, Thoma GR
Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, Minnesota, June 2007, pp. 1-8.
Class syntax can be used to 1) model temporal or locational evolvement of class labels of feature observation sequences, 2) correct classification errors of static classifiers if feature observations from different classes overlap in feature space, and 3) eliminate redundant features whose discriminative information is already represented in the class syntax. In this paper, we describe a novel method that combines static classifiers with class syntax models for supervised feature subset selection and classification in unified algorithms. Posterior class probabilities given feature observations are first estimated from the output of static classifiers, and then integrated into a parsing algorithm to find an optimal class label sequence for the given feature observation sequence. Finally, both static classifiers and class syntax models are used to search for an optimal subset of features. An optimal feature subset, associated static classifiers, and class syntax models are all learned from training data. We apply this method to logical entity recognition in scanned historical U.S. Food and Drug Administration (FDA) documents containing court case Notices of Judgments (NJs) of different layout styles, and show that the use of class syntax models not only corrects most classification errors of static classifiers, but also significantly reduces the dimensionality of feature observations with negligible impact on classification performance.
PDF