Skip Navigation
Lister Hill Center Logo  

Search Tips
About the Lister Hill Center
Innovative Research
Publications and Lectures
Training and Employment
LHNCBC: Document Abstract
Year: 2001Adobe Acrobat Reader
Download Free Adobe Acrobat Reader
LHNCBC-2001-007
Web Page Downloading and Classification
Tran LQ, Moon CW, Le DX, Thoma GR
Proc. 14th IEEE Symposium on Computer-Based Medical Systems: IEEE Computer Society. 2001 Jul;:321-6.
This paper describes the processes of downloading and classifying Web-based articles in online medical journals as a preliminary step to extracting bibliographic data to populate MEDLINE_, the widely used database of the National Library of Medicine (NLM). The processes are combined to develop an automated system named "Web Page Downloading and Classification". The system downloads the Web pages using Microsoft's Windows Internet API tool called WinInet, and a combination of several Artificial Intelligence (AI) techniques including the Breadth-First search algorithm and the Constraint Satisfaction method. The Breadth-First search algorithm and the Constraint Satisfaction method are then used to traverse the Web page's links, identify these pages as abstract, full text, PDF or image files, recognize and generate the successors of the downloading pages.
PDF