Skip Navigation
Lister Hill Center Logo  

Search Tips
About the Lister Hill Center
Blue Arrow
Blue Arrow
Blue Arrow
Blue Arrow
Blue Arrow
Innovative Research
Blue Arrow
Blue Arrow
Blue Arrow
Blue Arrow
Blue Arrow
Publications and Lectures
Blue Arrow
Blue Arrow
Blue Arrow
Training and Employment
Blue Arrow
Blue Arrow
LHNCBC: Document Abstract
Year: 2004Adobe Acrobat Reader
Download Free Adobe Acrobat Reader
LHNCBC-2004-075
Knowledge-intensive and Statistical Approaches to the Retrieval and Annotation of Genomics MEDLINE Citations
Aronson AR, Demner-Fushman D, Humphrey SM, Ide NC, Kim W, Liu H, Loanne RR, Mork JG, Smith LH, Tanabe LK, Wilbur WJ, Xie N
Proc TREC 2004, 503-11
Retrieving and annotating relevant information sources in the genomics literature are difficult but common tasks undertaken by biologists. The research presented here addresses these issues by exploring methods for retrieving MEDLINE citations that answer real biologists' information needs and by addressing the initial tasks required to annotate MEDLINE citations having genomic content with terms from the Gene Ontology (GO). We approached the retrieval task using two methods: aggressive, knowledge-intensive query expansion and text neighboring. Our approaches to the triage subtask for annotation consisted of traditional machine learning (ML) methods as well as a novel ML algorithm for thematic analysis. Finally, we used a statistical, n-gram heuristic to decide which of the GO hierarchies should be used to annotate a given MEDLINE citation.
PDF