- THOMAS Home
- About THOMAS
- Bills, Resolutions
- Congressional Record
- Presidential Nominations
- Treaties
- Committee Reports
- Government Resources
- For Teachers
- Help
Related Resources at the Library
The THOMAS system uses InQuery search software developed at the University of Massachusetts. InQuery employs a relevance-ranking algorithm for searching and displays the most relevant items first on the results list. Documents whose content matches the search terms(s) are retrieved and assigned a "weight" based on an InQuery algorithm.
In general, InQuery calculates the weight of each term for a given document by dividing the number of times the term appears in the document (term frequency) by the number of documents in which the term appears (inverse document frequency) -- that is, the "uniqueness" of the term in the entire database is considered. In a legislative database, the word "Act" is not a unique term and would be accorded a lower weighting factor than words occurring fewer times in the database but more times in an individual document. A factor is also added to compensate for the size of the document. Thus, a short document containing 10 instances of a term will be given a higher weight than a much longer document with 10 instances of the same search term. The weights of all the terms in the search statement are then averaged to give an overall weight to the document and rank it in the search results. The 250 or so stopwords maintained in a THOMAS stopword list are not weighted and are not reflected in search results.
Words entered in the search box are weighted according to the following criteria (in every case, the uniqueness of the word in the database is also factored in -- inverse document frequency -- as well as the number of times the word(s) occur in relationship to the length of the document):
Single-Word Searches
If only one search term is entered, the more instances of that word in the document and the more relevant the document will be considered. Documents with the occurrence of the search term in the title will be considered most relevant.
Multiple-Word Searches
Ranked in Order of Relevance
- If more than one word is entered for the search, documents containing instances of those words as a phrase--that is, adjacent to each other (discounting "stopwords" or "noisewords") in the order entered--are considered most relevant. Documents having the occurrence of the search phrase in the title are given additional weight.
- When more than one word is entered, and the words occur near, but not next to, each other (for example, within a "window" of 30 words) and not necessarily in the same order as entered, the document is ranked less relevant than if the words occur as an exact phrase but more relevant than if the words occur singly, with no proximity.
- Documents of yet lesser relevance are those in which all words entered appear singly and not in proximity to each other.
- Documents of least relevance will be those which, if more than one word is entered, contain the occurrence of less than all of the words.
- Documents which InQuery considers of NO relevance -- containing NO instances of any form of the search words -- will not appear on the INQUERY results list, even though there might be bills that the searcher considers germane. For example, if the searcher enters the query: capital punishment and the document speaks not of capital punishment, but instead ONLY of the death penalty, that document will not be returned, even though it is well within the scope of the user's intended search. Future refinements of InQuery search algorithms by employing a legislative thesaurus may overcome this problem.
The searcher who wishes to use offical subject terms (index terms) to overcome this problem may often identify a more complete set of relevant bills by searching in the THOMAS Bill Summary & Status files, using the searching by subject term option. Records for bills returned in that search will have links back to the full text of the bill in the Bill Text files.
Ranked Results of a Sample Search
Consider the search:
defense appropriations
Documents with the exact phrase "defense appropriations" appearing in the title or numerous times in the text will appear at the top of the results list -- i.e., they are considered most relevant.
Documents that DO NOT contain the phrase "defense appropriations" but a phrase such as "monies appropriated for the staff of the Department of Defense" -- which contains the words, or variants of the word (i.e., appropriated) but not next to each other, as an exact phrase -- will rank lower on the list.
Documents that contain any form of the word "defense" and any form of the word "appropriations" (but not near each other, as explained above) will rank even lower on the results list.
Finally, not all documents in the results set will necessarily contain BOTH words. A document having occurrences of EITHER a form of the word "defense" OR a form of the word "appropriations" will appear lowest on the list. Documents with neither any form of the word "defense" nor any form of the word "appropriations" will not appear on the list, even though the searcher might consider them relevant. For example, a document containing numerous instances of the phrases "monies set aside for the military purposes" or "funds supporting the Army, Navy and Marines," but neither the word "defense" nor "appropriations," will not appear on the InQuery results list, but may be within the scope of the user's intended search. The searcher may improve search results by using synonyms in another search query or use a subject (index) term search in the THOMAS Bill Summary & Status files.
Since the default number of results (the "hit" list) is set at 100, ALL of the documents which meet all of the above criteria may not be displayed. (The searcher may adjust the maximum number of bills to be retrieved as high as 2000.) However, those displayed will be the more germane to the search query than those not displayed.
For best results, the searcher should use the most unique search words possible to express his/her concept, avoiding common words that are likely to appear frequently in the database -- e.g., "bill" or "act" or "Congress."
For example, the search
brady handgun
is preferable to the search
brady bill
or simply
brady