Skip Navigation
Lister Hill Center Logo  

Search Tips
About the Lister Hill Center
Blue Arrow
Blue Arrow
Blue Arrow
Blue Arrow
Blue Arrow
Innovative Research
Blue Arrow
Blue Arrow
Blue Arrow
Blue Arrow
Blue Arrow
Publications and Lectures
Blue Arrow
Blue Arrow
Blue Arrow
Training and Employment
Blue Arrow
Blue Arrow
LHNCBC: Document Abstract
Year: 2004Adobe Acrobat Reader
Download Free Adobe Acrobat Reader
LHNCBC-2004-034
Using Symbolic Knowledge in the UMLS to Disambiguate Words in Small Datasets with a Naive Bayes Classifier
Leroy G, Rindflesch TC
Medinfo. 2004 Sept.;2004: 381-385.
Current approaches to word sense disambiguation use and combine various machine-learning techniques. Most refer to characteristics of the ambiguous word and surrounding words and are based on hundreds of examples. Unfortunately, developing large training sets is time-consuming. We investigate the use of symbolic knowledge to augment machine-learning techniques for small datasets. UMLS semantic types assigned to concepts found in the sentence and relationships between these semantic types form the knowledge base. A naive Bayes classifier was trained for 15 words with 100 examples for each. The most frequent sense of a word served as the baseline. The effect of increasingly accurate symbolic knowledge was evaluated in eight experimental conditions. Performance was measured by accuracy based on 10-fold cross-validation. The best condition used only the semantic types of the words in the sentence. Accuracy was then on average 10% higher than the baseline; however, it varied from 8% deterioration to 29% improvement. In a follow-up evaluation, we noted a trend that the best disambiguation was found for words that were the least troublesome to the human evaluators.
PDF