text-only page produced automatically by LIFT Text Transcoder Skip all navigation and go to page contentSkip top navigation and go to directorate navigationSkip top navigation and go to page navigation
National Science Foundation Home National Science Foundation - Computer & Information Science & Engineering (CISE)
 
Computer & Information Science & Engineering (CISE)
design element
CISE Home
About CISE
Funding Opportunities
Awards
News
Events
Discoveries
Publications
Advisory Committee
Career Opportunities
See Additional CISE Resources
View CISE Staff
CISE Organizations
Computing and Communication Foundations (CCF)
Computer and Network Systems (CNS)
Information & Intelligent Systems (IIS)
Proposals and Awards
Proposal and Award Policies and Procedures Guide
  Introduction
Proposal Preparation and Submission
bullet Grant Proposal Guide
  bullet Grants.gov Application Guide
Award and Administration
bullet Award and Administration Guide
Award Conditions
Other Types of Proposals
Merit Review
NSF Outreach
Policy Office
Additional CISE Resources
Subscribe to receive special CISE announcements
Assistant Director's Presentations
CISE Distinguished Lecture Series
Contact CISE OAD
Other Site Features
Special Reports
Research Overviews
Multimedia Gallery
Classroom Resources
NSF-Wide Investments


Event
How to Crawl the Web

December 12, 2000 12:00 PM  to 
December 12, 2000 1:00 PM
NSF, Room 110, Arlington, VA

Lecturer: Hector Molina-Garcia

A crawler collects large numbers of web pages, to be used for building an index or for data mining. Crawlers consume significant network and computing resources, both at the visited web servers and at the site(s) collecting the pages, and thus it is critical to make them efficient and well behaved. In this talk I will discuss how to build a "good" crawler, addressing questions such as:

How can a crawler gather "important" pages only?

How can a crawler efficiently maintain its collection "fresh"?

How can a crawler be parallelized?

I will also summarize results from an experiment conducted on more than half million web pages over 4 months, to estimate how web pages evolve over time.

View the presentation slides. Hector Garcia-Molina

(http://www-db.stanford.edu/people/hector.html) is the Leonard Bosack and Sandra Lerner Professor in the Departments of Computer Science and Electrical Engineering at Stanford University, Stanford, California. From August 1994 to December 1997 he was the Director of the Computer Systems Laboratory at Stanford. From 1979 to 1991 he was on the faculty of the Computer Science Department at Princeton University, Princeton, New Jersey. His research interests include distributed computing systems and database systems. He received a BS in electrical engineering from the Instituto Tecnologico de Monterrey, Mexico, in 1974. From Stanford University, Stanford, California, he received in 1975 a MS in electrical engineering and a PhD in computer science in 1979. Garcia-Molina is a Fellow of the ACM, received the 1999 ACM SIGMOD Innovations Award, and is a member of the President's Information Technology Advisory Committee (PITAC).

This event is part of Distinguished Lecture Series.

Meeting Type
Lecture

Contacts
Michael J. Pazzani, mpazzani@nsf.gov

NSF Related Organizations
Directorate for Computer & Information Science & Engineering

 



Print this page
Back to Top of page
  Web Policies and Important Links | Privacy | FOIA | Help | Contact NSF | Contact Webmaster | SiteMap  
National Science Foundation Computer & Information Science & Engineering (CISE)
The National Science Foundation, 4201 Wilson Boulevard, Arlington, Virginia 22230, USA
Tel:  (703) 292-5111, FIRS: (800) 877-8339 | TDD: (800) 281-8749
Last Updated:
July 27, 2005
Text Only


Last Updated: July 27, 2005