Skip Navigation
Lister Hill Center Logo  

Search Tips
About the Lister Hill Center
Innovative Research
Publications and Lectures
Training and Employment
LHNCBC: Document Abstract
Year: 2004Adobe Acrobat Reader
Download Free Adobe Acrobat Reader
LHNCBC-2004-071
Gene Terms and English Words: An Ambiguous Mix
Sehgal AK, Srinivasan P, Bodenreider O
Proceedings of the SIGIR 2004 Workshop on Search and Discovery in Bioinformatics 2004.
Continuing technical advances have made it possible for large-scale genetic analysis of experiments where data for thousands of genes can be produced at a time. Recognizing gene terms in biomedical text is crucially important for applications of higher level information. There are however many challenges associated with this task. One difficult aspect is negotiating the various kinds of ambiguity in gene and protein nomenclature. In this research we look at one of the most challenging kinds in which gene terms are also common English words. For example, TRAP, ART, ACT are all gene symbols that also have English meanings. This kind of ambiguity makes retrieval of relevant information more difficult. We describe IR-based ranking methods applied to document sets retrieved for ambiguous gene terms in LocusLink and present our results. We fing that using summary and product information from LocusLink records in addition to the gene term performs the best in terms of re-ranking the retrieved documents.
PDF