Search Architecture: What Is Under the Hood

Science.gov has two search options. The basic search allows users to search a term or phrase with one query which goes out to over 50 authoritative databases from 13 federal agencies. For more targeted results, the advanced search allows users to select one or more specific databases and limit the query to just title, author or date range.

Science.gov automatically includes an additional source in its searches. This source is called Science.gov Websites and it is an index of over 2100 websites selected and submitted by the federal agencies. Re-indexed weekly, the Science.gov Websites source enhances the content by providing significant breadth and currency as an "interagency database."

The federated basic and advanced searches send the query, in parallel, to all of the databases in Science.gov. The top 100-200 results from each target database are returned to Science.gov in real-time; Science.gov then applies its relevancy algorithm to the cumulative results and displays the results in ranked order. Once returned, the results may be sliced in multiple ways. On the left there is a column of topic clusters created on-the-fly during the search process. Users may drill down into a specific topic related to their query to narrow the original results. There are also author and date clusters. Drop down boxes are available to limit the results to one source only, or to sort the results by date, title, or author instead of by relevance.

To view results, users may click on the linked title, which takes them directly to the record or document at the target source, or mark the checkboxes next to each title to narrow the results down to those of interest. This smaller result set can then be printed, emailed or downloaded into citation management software. Science.gov offers Alerts on the user's topic(s) of interest. These Alerts, emailed on a routine basis as selected by the user – daily, weekly or monthly – highlight database updates on the selected topics.

Science.gov's federated search architecture offers some key advantages when compared with existing crawler-based search engines. Science.gov can fully access databases and the results are inherently as current as the individual data sources due to the real-time search.