Version 2.5.2.0 CRISP Logo CRISP Homepage Help for CRISP Email Us

Abstract

Grant Number: 1R01LM006649-01A1
Project Title: AUTOMATED KNOWLEDGE EXTRACTION FOR BIOMEDICAL LITERATURE
PI Information:NameEmailTitle
PUSTEJOVSKY, JAMES jamesp@cs.brandeis.edu

Abstract: It is becoming increasingly difficult for biologists to keep pace with information being published within their own fields, let alone biology as a whole. The ability to rapidly access specific and current biomedical information as well as to quickly gain an overview of current knowledge in a given field is becoming more difficult while at the same time more important. Traditional methods of keeping up with advances are therefore becoming inadequate. This project will involve a unique collaboration between a computational linguist at Brandeis University and two biologists at Tufts University School of Medicine. We propose to make use of recent advances in the computational analysis of text to organize and summarize the biological literature. Building on our previous language technology research at Brandeis, we propose to integrate the domain knowledge of the National Library of Medicine's Unified Medical Language System (UMLS) with Brandeis' semantic lexicon, CoreLex, toward the development of normalized structured representations of the semantic content of abstracts in the Medline database. These data structures, called lexical webs, accelerate the availability of information in a richly hyperlinked index that facilitates rapid navigation and information access. Automated analysis of biological abstracts will be combined with information derived from sequence databases to provide an up-to-date and comprehensive database of information regarding known genes and proteins. The results of this analysis will be used to construct a web accessible database organized on a gene-by-gene basis. Other unique aspects of this database will be the visualization of motifs and features extracted from Medline abstracts through the generation of annotated structure-function maps of proteins and genes, and the construction of gene-specific semantic indexes to the relevant biological literature. This system, called MedStract, will reduce the time required for biomedical researchers to find information of interest and should facilitate the development of new research insights.

Public Health Relevance:
This Public Health Relevance is not available.

Thesaurus Terms:
abstracting /text searching, artificial intelligence, informatics, information retrieval, semantics
Internet, computer assisted sequence analysis, computer system design /evaluation, information system analysis, molecular biology information system, nucleic acid sequence, protein sequence, syntax, vocabulary development for information system

Institution: BRANDEIS UNIVERSITY
415 SOUTH ST, MS #116
WALTHAM, MA 024549110
Fiscal Year: 1999
Department: COMPUTER SCIENCE
Project Start: 01-MAR-1999
Project End: 28-FEB-2002
ICD: NATIONAL LIBRARY OF MEDICINE
IRG: BLR


CRISP Homepage Help for CRISP Email Us